Python Forum

Full Version: prevent getting blocked
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
HI,

Actually, I am scrapping a website in python using selenium and beautiful soup, and I am able to scrape it as well but many times it happens that while I am running the code, the website shows a blank screen and it displays a message as well, something like unusual activity.
I have used sleep time around 10 sec but nothing will work. I have heard about IP rotation but not able to use it in my algorithm.
anticipating for suggestions.
But why you want to go with IP rotation approach? As per my knowledge on IP rotation is used to distribute the same set of IP address(es) to different devices. Can you mention the site in which you are actually trying to scrap the thing? Is it going blank only when you try to access the site through the script?
Basically I want to get data from the Bloomberg site, and that blank screen thing is happening so from the StackOverflow site I have checked that if we send requests from different IP then there is no chance for getting blocked.
some very basic things you could do;
1. add random sleep interval
2. add detailed headers that mimic a real browser and don't tell you are bot/script
3. ip/proxy rotation. If you are using different ip/proxies it may be wise to sync ip rotation with headers/UserAgent/OS info you provide. If site is serious about blocking scrapers it may find it suspicious/unacceptable to have same ip provide different UserAgent/OS info in a short period of time