Nov-27-2019, 01:49 PM
Hi,
I found out that 2 approaches to web scrapping is returning totally different output:
What i mean is that when i use code below:
If I would like to use pattern like:
Someone made first approach some time ago and i am wondering how to follow exactly this path, and why my approach is returning captcha and first approach is avoiding it?
I found out that 2 approaches to web scrapping is returning totally different output:
What i mean is that when i use code below:
headers = { 'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'accept-encoding':'gzip, deflate, sdch, br', 'accept-language':'en-GB,en;q=0.8,en-US;q=0.6,ml;q=0.4', 'cache-control':'max-age=0', 'upgrade-insecure-requests':'1', 'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' } response = requests.get(url,headers=headers) parser = response.content soup = BeautifulSoup(parser, "html.parser") print(soup)i get returned full code from the website, BUT:
If I would like to use pattern like:
r = requests.get(page) content = (r.text) soup = BeautifulSoup(content, 'html.parser') print(soup)it would redirect from the provided URL to captcha solver site and then it woult return code from captcha website
Someone made first approach some time ago and i am wondering how to follow exactly this path, and why my approach is returning captcha and first approach is avoiding it?