Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
prevent getting blocked
#1
HI,

Actually, I am scrapping a website in python using selenium and beautiful soup, and I am able to scrape it as well but many times it happens that while I am running the code, the website shows a blank screen and it displays a message as well, something like unusual activity.
I have used sleep time around 10 sec but nothing will work. I have heard about IP rotation but not able to use it in my algorithm.
anticipating for suggestions.
Reply
#2
But why you want to go with IP rotation approach? As per my knowledge on IP rotation is used to distribute the same set of IP address(es) to different devices. Can you mention the site in which you are actually trying to scrap the thing? Is it going blank only when you try to access the site through the script?
Reply
#3
Basically I want to get data from the Bloomberg site, and that blank screen thing is happening so from the StackOverflow site I have checked that if we send requests from different IP then there is no chance for getting blocked.
Reply
#4
some very basic things you could do;
1. add random sleep interval
2. add detailed headers that mimic a real browser and don't tell you are bot/script
3. ip/proxy rotation. If you are using different ip/proxies it may be wise to sync ip rotation with headers/UserAgent/OS info you provide. If site is serious about blocking scrapers it may find it suspicious/unacceptable to have same ip provide different UserAgent/OS info in a short period of time
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to Prevent Double Submission in Django mactron 1 1,446 Jul-31-2023, 06:52 AM
Last Post: Gaurav_Kumar
  Can urlopen be blocked by websites? peterjv26 2 3,392 Jul-26-2020, 06:45 PM
Last Post: peterjv26

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020