Python Forum
Automating Captcha form submission with Mechanize
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Automating Captcha form submission with Mechanize
#1
The idea is to automate download on o2tvseries. It's a website for film shows. We must automate the submission of a Captcha form in the process. The mechanize library helps with filling the form. Where we need to input the Captcha's content, Webdriver in Selenium helps with screenshotting and saving the Captcha picture locally. Meanwhile, Pytesseract helps with getting the picture's string for submission (about 75% accuracy).

The issue now is that after submission, a page with response Error: Captcha Does Not Match! is what I get even when the Captcha matches (it often does). A video file is supposed to be playing on the page after submission as that means everything is OK. I don't know if it's a cookie/sessions or Webdriver thing. Having a hard time getting the right Captcha input successfully submitted.
Here's the code:

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from PIL import Image, ImageFilter, ImageEnhance
import requests
import pytesseract
import http.cookiejar as cookielib
from io import BytesIO
from selenium import webdriver
from http import cookiejar
import mechanize

url = "https://o2tvseries.com/Kevin-Can-Fuck-Himself/Season-01/Episode-06/index.html"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req)
soupy= BeautifulSoup(web_byte, "html.parser")
links = soupy.find_all('a')
newDLLink = links[14].get('href')

secondReq = Request(newDLLink, headers={'User-Agent': 'Mozilla/5.0'})
secondWeb_byte = urlopen(secondReq)
captchaSoupy = BeautifulSoup(secondWeb_byte, "html.parser")
captchaSeries = captchaSoupy.find_all('img')
newCaptchaSeries = captchaSeries[0].get('src')

#Webdriver
driver = webdriver.Chrome(executable_path=r"C:\Program Files\ChromeDriver for Selenium\chromedriver.exe")
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
driver.get(newDLLink)
element = driver.find_element_by_xpath("//img[@alt='CAPTCHA Code']")
location = element.location
size = element.size
png = driver.get_screenshot_as_png()
driver.quit()

#Pillow
im = Image.open(BytesIO(png))
left = location['x']
top = location['y']
right = location['x'] + size['width']
bottom = location['y'] + size['height']
im = im.crop((left, top, right, bottom))
im = ImageEnhance.Sharpness(im)
im = im.enhance(0.0)
im = im.filter(ImageFilter.MinFilter(3))
im.save('study_img.png')

#Pytesseract
image_to_string = pytesseract.image_to_string(im)
image_to_string = image_to_string[0:5]

#Mechanize
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36')]
br.open(newDLLink)
br.select_form(nr=0)
br['captchainput'] = image_to_string
response = br.submit()
#This returns `Error: Captcha Does Not Match!`
read_response = response.read()
Reply
#2
Quote:CAPTCHA technology authenticates that a real person is accessing the web content to block spammers and bots that try to automatically harvest email addresses or try to automatically sign up for access to websites, blogs or forums. CAPTCHA blocks automated systems, which can't read the distorted letters in the graphic.
Sounds like it is successfully doing its job.
Reply
#3
(Aug-02-2021, 09:21 PM)Yoriz Wrote:
Quote:CAPTCHA technology authenticates that a real person is accessing the web content to block spammers and bots that try to automatically harvest email addresses or try to automatically sign up for access to websites, blogs or forums. CAPTCHA blocks automated systems, which can't read the distorted letters in the graphic.
Sounds like it is successfully doing its job.

Looks like I'm close with the code. And whatever is impeding the right Captcha inputs from being submitted isn't the Captcha itself, probably the Mechanize library or Selenium.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Automating Facebook Posts mieciow 2 1,371 Aug-07-2023, 12:34 PM
Last Post: Gaurav_Kumar
  How to Prevent Double Submission in Django mactron 1 1,410 Jul-31-2023, 06:52 AM
Last Post: Gaurav_Kumar
  Parsing html page and working with checkbox (on a captcha) straannick 17 11,280 Feb-04-2021, 02:54 PM
Last Post: snippsat
  web.py: implementing a captcha kintarowonders 0 2,222 Feb-24-2019, 02:40 AM
Last Post: kintarowonders
  Mechanize and BeautifulSoup read not correct hours vaeVictis 5 4,411 Jan-15-2019, 01:27 PM
Last Post: metulburr
  Captcha from other site GoTo95 1 2,883 Nov-13-2018, 03:48 PM
Last Post: j.crater
  Click on unusual class button using mechanize Ask Question Coto 1 3,842 Feb-18-2018, 07:27 AM
Last Post: metulburr
  Click on button with python mechanize torlkius 3 18,583 Jan-03-2018, 02:29 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020