Using selenium for web scrapping

DoubleLucker · (This post was last modified: Jun-24-2017, 02:12 PM by DoubleLucker.)

Hello everyone! I'm using python for web-scrapping and have walked, as I think, in a dead-end. I am not seeking for help here. I've asked previously some questions on stack overflow (those were about other errors), couldn't get much help, so decided to ask somewhere else. Hope, you can help me.

I'm web scrapping this kn?ngs.ru/view/34146983/ (i'm not allowed to post clickable links yet, have to switch ? to .). It is an ad of a certain group near my region, there are 350-370 in total, so my project is not really big, but I have to finish it soon and I'm literally at the end, because everything works apart from one thing.

As you can see on the site, there is a blue button, that reveals telephones (it's in Russian, but telephone numbers are understandable in any language). I am using headless firefox in selenium to press that button and gather information from it. The main problem which causes more problems is getting banned after several requests. So I decided to use a different user-agent. Now there are two things, I've noticed, and I don't truly understand why exactly this happens, but:
1) If I'm using crawler bots as us user-agent (somebody said that I shouldn't, but it was before I knew that it could harm, and also is this really that bad to use them for such a small task?) I can't press this button, literally nothing happens and the site tells me that there's been a mistake during pressing. The advantage of these bots though is that they can get me through the whole process without getting banned (probably because I don't press that button, I don't get banned? I can't check that, so I don't know)
2) If i'm using anything but these bots I can press buttons and get information, but I get banned pretty soon. So I found out about fake-user-agent, especially about random user-agent. Now i'm switching these agents whenever I get banned. I can also get a random bot user-agent, so again I have to switch it and once more check for picking an already blacklisted user-agent (if that's even possible to get on a black-listed one). And now the problem itself is that I get an exception without any message at a random step of my process. I can probably someday get it done without exceptions, but that's not an option. The code and the exception are below:

def switch():
   ua = UserAgent()
   profile = webdriver.FirefoxProfile()
   profile.set_preference("general.useragent.override", "%s"%ua.random)
   browser = webdriver.Firefox(profile)
   browser.get(link)
   soup = BeautifulSoup(browser.page_source.encode('utf-8'), "html.parser")
   return soup, browser
   
display = Display(visible=0, size=(800, 600))
display.start()
   
ua = UserAgent()

profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "%s"%ua.random)
browser = webdriver.Firefox(profile)
   

browser.get(link)
soup = BeautifulSoup(browser.page_source.encode('utf-8'), "html.parser")
while (soup.find('title').get_text() == 'Проверка пользователя'):
   soup, browser = switch()      
button = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.ID, "show_phone")))
button.click()
phones = browser.find_elements_by_class_name('card__phones-list-item')
cont = ', '.join([phone.text for phone in phones])
while(cont == ''):
   soup, browser = switch()
   while (soup.find('title').get_text() == 'Проверка пользователя'):
       soup, browser = switch()
   button = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.ID, "show_phone")))                                          
   button.click()
   phones = browser.find_elements_by_class_name('card__phones-list-item')
   cont = ', '.join([phone.text for phone in phones])
browser.quit()
display.stop()

When I first got this exception: ElementNotInteractableException, I found out that it can be connected with different times of loading of web-pages, so I set up a wait time (I tried different times, impllict and explicit). Still I meet this exception sometimes.

So I'd like to ask you for help here. Thank you.

Also, I have probably another option, using jquery to somehow get information from that button, I can introduce the code I found on the html, if somebody can help me translate it into python and tell me how exactly I'm supposed to use it, because I don't understand that at all.

Would be also great if someone can connect with me in PM, I think this problem should be easily resolved.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Problem with scrapping Website	giddyhead	1	1,632	Mar-08-2024, 08:20 AM Last Post: AhanaSharma
	python web scrapping	mg24	1	333	Mar-01-2024, 09:48 PM Last Post: snippsat
	How can I ignore empty fields when scrapping	never5000	0	1,397	Feb-11-2022, 09:19 AM Last Post: never5000
	Suggestion request for scrapping html table	Vkkindia	3	2,038	Dec-06-2021, 06:09 PM Last Post: Larz60+
	web scrapping through Python	Naheed	2	2,628	May-17-2021, 12:02 PM Last Post: Naheed
	Website scrapping and download	santoshrane	3	4,332	Apr-14-2021, 07:22 AM Last Post: kashcode
	Newbie help with lxml scrapping	chelsealoa	1	1,867	Jan-08-2021, 09:14 AM Last Post: Larz60+
	Scrapping Sport score	laplacea	1	2,264	Dec-13-2020, 04:09 PM Last Post: Larz60+
	How to export to csv the output of every iteration when scrapping with a loop	efthymios	2	2,297	Nov-30-2020, 07:46 PM Last Post: efthymios
	Web scrapping - Stopped working	peterjv26	2	3,088	Sep-23-2020, 08:30 AM Last Post: peterjv26

Using selenium for web scrapping

User Panel Messages

Announcements