Automating Captcha form submission with Mechanize

Dexty · Aug-02-2021, 08:55 PM

The idea is to automate download on o2tvseries. It's a website for film shows. We must automate the submission of a Captcha form in the process. The mechanize library helps with filling the form. Where we need to input the Captcha's content, Webdriver in Selenium helps with screenshotting and saving the Captcha picture locally. Meanwhile, Pytesseract helps with getting the picture's string for submission (about 75% accuracy).

The issue now is that after submission, a page with response Error: Captcha Does Not Match! is what I get even when the Captcha matches (it often does). A video file is supposed to be playing on the page after submission as that means everything is OK. I don't know if it's a cookie/sessions or Webdriver thing. Having a hard time getting the right Captcha input successfully submitted.
Here's the code:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from PIL import Image, ImageFilter, ImageEnhance
import requests
import pytesseract
import http.cookiejar as cookielib
from io import BytesIO
from selenium import webdriver
from http import cookiejar
import mechanize
 
url = "https://o2tvseries.com/Kevin-Can-Fuck-Himself/Season-01/Episode-06/index.html"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req)
soupy= BeautifulSoup(web_byte, "html.parser")
links = soupy.find_all('a')
newDLLink = links[14].get('href')
 
secondReq = Request(newDLLink, headers={'User-Agent': 'Mozilla/5.0'})
secondWeb_byte = urlopen(secondReq)
captchaSoupy = BeautifulSoup(secondWeb_byte, "html.parser")
captchaSeries = captchaSoupy.find_all('img')
newCaptchaSeries = captchaSeries[0].get('src')
 
#Webdriver
driver = webdriver.Chrome(executable_path=r"C:\Program Files\ChromeDriver for Selenium\chromedriver.exe")
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
driver.get(newDLLink)
element = driver.find_element_by_xpath("//img[@alt='CAPTCHA Code']")
location = element.location
size = element.size
png = driver.get_screenshot_as_png()
driver.quit()
 
#Pillow
im = Image.open(BytesIO(png))
left = location['x']
top = location['y']
right = location['x'] + size['width']
bottom = location['y'] + size['height']
im = im.crop((left, top, right, bottom))
im = ImageEnhance.Sharpness(im)
im = im.enhance(0.0)
im = im.filter(ImageFilter.MinFilter(3))
im.save('study_img.png')
 
#Pytesseract
image_to_string = pytesseract.image_to_string(im)
image_to_string = image_to_string[0:5]
 
#Mechanize
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36')]
br.open(newDLLink)
br.select_form(nr=0)
br['captchainput'] = image_to_string
response = br.submit()
#This returns `Error: Captcha Does Not Match!`
read_response = response.read()

**Yoriz** · Aug-02-2021, 09:21 PM

Quote:CAPTCHA technology authenticates that a real person is accessing the web content to block spammers and bots that try to automatically harvest email addresses or try to automatically sign up for access to websites, blogs or forums. CAPTCHA blocks automated systems, which can't read the distorted letters in the graphic.

Sounds like it is successfully doing its job.

Dexty · Aug-03-2021, 01:02 PM

(Aug-02-2021, 09:21 PM)Yoriz Wrote:
Quote:CAPTCHA technology authenticates that a real person is accessing the web content to block spammers and bots that try to automatically harvest email addresses or try to automatically sign up for access to websites, blogs or forums. CAPTCHA blocks automated systems, which can't read the distorted letters in the graphic.
Sounds like it is successfully doing its job.

Looks like I'm close with the code. And whatever is impeding the right Captcha inputs from being submitted isn't the Captcha itself, probably the Mechanize library or Selenium.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Automating Facebook Posts	mieciow	2	2,462	Aug-07-2023, 12:34 PM Last Post: Gaurav_Kumar
	How to Prevent Double Submission in Django	mactron	1	3,253	Jul-31-2023, 06:52 AM Last Post: Gaurav_Kumar
	Parsing html page and working with checkbox (on a captcha)	straannick	17	15,993	Feb-04-2021, 02:54 PM Last Post: snippsat
	web.py: implementing a captcha	kintarowonders	0	2,685	Feb-24-2019, 02:40 AM Last Post: kintarowonders
	Mechanize and BeautifulSoup read not correct hours	vaeVictis	5	5,735	Jan-15-2019, 01:27 PM Last Post: metulburr
	Captcha from other site	GoTo95	1	3,494	Nov-13-2018, 03:48 PM Last Post: j.crater
	Click on unusual class button using mechanize Ask Question	Coto	1	4,519	Feb-18-2018, 07:27 AM Last Post: metulburr
	Click on button with python mechanize	torlkius	3	20,176	Jan-03-2018, 02:29 PM Last Post: metulburr

Automating Captcha form submission with Mechanize

User Panel Messages

Announcements