Aug-02-2021, 08:55 PM
The idea is to automate download on o2tvseries. It's a website for film shows. We must automate the submission of a Captcha form in the process. The mechanize library helps with filling the form. Where we need to input the Captcha's content, Webdriver in Selenium helps with screenshotting and saving the Captcha picture locally. Meanwhile, Pytesseract helps with getting the picture's string for submission (about 75% accuracy).
The issue now is that after submission, a page with response
Here's the code:
The issue now is that after submission, a page with response
Error: Captcha Does Not Match!
is what I get even when the Captcha matches (it often does). A video file is supposed to be playing on the page after submission as that means everything is OK. I don't know if it's a cookie/sessions or Webdriver thing. Having a hard time getting the right Captcha input successfully submitted.Here's the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
from urllib.request import Request, urlopen from bs4 import BeautifulSoup from PIL import Image, ImageFilter, ImageEnhance import requests import pytesseract import http.cookiejar as cookielib from io import BytesIO from selenium import webdriver from http import cookiejar import mechanize req = Request(url, headers = { 'User-Agent' : 'Mozilla/5.0' }) web_byte = urlopen(req) soupy = BeautifulSoup(web_byte, "html.parser" ) links = soupy.find_all( 'a' ) newDLLink = links[ 14 ].get( 'href' ) secondReq = Request(newDLLink, headers = { 'User-Agent' : 'Mozilla/5.0' }) secondWeb_byte = urlopen(secondReq) captchaSoupy = BeautifulSoup(secondWeb_byte, "html.parser" ) captchaSeries = captchaSoupy.find_all( 'img' ) newCaptchaSeries = captchaSeries[ 0 ].get( 'src' ) #Webdriver driver = webdriver.Chrome(executable_path = r "C:\Program Files\ChromeDriver for Selenium\chromedriver.exe" ) pytesseract.pytesseract.tesseract_cmd = r "C:\Program Files\Tesseract-OCR\tesseract.exe" driver.get(newDLLink) element = driver.find_element_by_xpath( "//img[@alt='CAPTCHA Code']" ) location = element.location size = element.size png = driver.get_screenshot_as_png() driver.quit() #Pillow im = Image. open (BytesIO(png)) left = location[ 'x' ] top = location[ 'y' ] right = location[ 'x' ] + size[ 'width' ] bottom = location[ 'y' ] + size[ 'height' ] im = im.crop((left, top, right, bottom)) im = ImageEnhance.Sharpness(im) im = im.enhance( 0.0 ) im = im. filter (ImageFilter.MinFilter( 3 )) im.save( 'study_img.png' ) #Pytesseract image_to_string = pytesseract.image_to_string(im) image_to_string = image_to_string[ 0 : 5 ] #Mechanize br = mechanize.Browser() cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) br.set_handle_robots( False ) br.addheaders = [( 'User-agent' , 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36' )] br. open (newDLLink) br.select_form(nr = 0 ) br[ 'captchainput' ] = image_to_string response = br.submit() #This returns `Error: Captcha Does Not Match!` read_response = response.read() |