Python Forum

Full Version: Automating Captcha form submission with Mechanize
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
The idea is to automate download on o2tvseries. It's a website for film shows. We must automate the submission of a Captcha form in the process. The mechanize library helps with filling the form. Where we need to input the Captcha's content, Webdriver in Selenium helps with screenshotting and saving the Captcha picture locally. Meanwhile, Pytesseract helps with getting the picture's string for submission (about 75% accuracy).

The issue now is that after submission, a page with response Error: Captcha Does Not Match! is what I get even when the Captcha matches (it often does). A video file is supposed to be playing on the page after submission as that means everything is OK. I don't know if it's a cookie/sessions or Webdriver thing. Having a hard time getting the right Captcha input successfully submitted.
Here's the code:

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from PIL import Image, ImageFilter, ImageEnhance
import requests
import pytesseract
import http.cookiejar as cookielib
from io import BytesIO
from selenium import webdriver
from http import cookiejar
import mechanize

url = "https://o2tvseries.com/Kevin-Can-Fuck-Himself/Season-01/Episode-06/index.html"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req)
soupy= BeautifulSoup(web_byte, "html.parser")
links = soupy.find_all('a')
newDLLink = links[14].get('href')

secondReq = Request(newDLLink, headers={'User-Agent': 'Mozilla/5.0'})
secondWeb_byte = urlopen(secondReq)
captchaSoupy = BeautifulSoup(secondWeb_byte, "html.parser")
captchaSeries = captchaSoupy.find_all('img')
newCaptchaSeries = captchaSeries[0].get('src')

#Webdriver
driver = webdriver.Chrome(executable_path=r"C:\Program Files\ChromeDriver for Selenium\chromedriver.exe")
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
driver.get(newDLLink)
element = driver.find_element_by_xpath("//img[@alt='CAPTCHA Code']")
location = element.location
size = element.size
png = driver.get_screenshot_as_png()
driver.quit()

#Pillow
im = Image.open(BytesIO(png))
left = location['x']
top = location['y']
right = location['x'] + size['width']
bottom = location['y'] + size['height']
im = im.crop((left, top, right, bottom))
im = ImageEnhance.Sharpness(im)
im = im.enhance(0.0)
im = im.filter(ImageFilter.MinFilter(3))
im.save('study_img.png')

#Pytesseract
image_to_string = pytesseract.image_to_string(im)
image_to_string = image_to_string[0:5]

#Mechanize
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36')]
br.open(newDLLink)
br.select_form(nr=0)
br['captchainput'] = image_to_string
response = br.submit()
#This returns `Error: Captcha Does Not Match!`
read_response = response.read()
Quote:CAPTCHA technology authenticates that a real person is accessing the web content to block spammers and bots that try to automatically harvest email addresses or try to automatically sign up for access to websites, blogs or forums. CAPTCHA blocks automated systems, which can't read the distorted letters in the graphic.
Sounds like it is successfully doing its job.
(Aug-02-2021, 09:21 PM)Yoriz Wrote: [ -> ]
Quote:CAPTCHA technology authenticates that a real person is accessing the web content to block spammers and bots that try to automatically harvest email addresses or try to automatically sign up for access to websites, blogs or forums. CAPTCHA blocks automated systems, which can't read the distorted letters in the graphic.
Sounds like it is successfully doing its job.

Looks like I'm close with the code. And whatever is impeding the right Captcha inputs from being submitted isn't the Captcha itself, probably the Mechanize library or Selenium.