Python Forum
Thread Rating:
  • 3 Vote(s) - 2.33 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Using selenium for web scrapping
#1
Hello everyone! I'm using python for web-scrapping and have walked, as I think, in a dead-end. I am not seeking for help here. I've asked previously some questions on stack overflow (those were about other errors), couldn't get much help, so decided to ask somewhere else. Hope, you can help me.

I'm web scrapping this kn?ngs.ru/view/34146983/ (i'm not allowed to post clickable links yet, have to switch ? to .). It is an ad of a certain group near my region, there are 350-370 in total, so my project is not really big, but I have to finish it soon and I'm literally at the end, because everything works apart from one thing.

As you can see on the site, there is a blue button, that reveals telephones (it's in Russian, but telephone numbers are understandable in any language). I am using headless firefox in selenium to press that button and gather information from it. The main problem which causes more problems is getting banned after several requests. So I decided to use a different user-agent. Now there are two things, I've noticed, and I don't truly understand why exactly this happens, but:
1) If I'm using crawler bots as us user-agent (somebody said that I shouldn't, but it was before I knew that it could harm, and also is this really that bad to use them for such a small task?) I can't press this button, literally nothing happens and the site tells me that there's been a mistake during pressing. The advantage of these bots though is that they can get me through the whole process without getting banned (probably because I don't press that button, I don't get banned? I can't check that, so I don't know)
2) If i'm using anything but these bots I can press buttons and get information, but I get banned pretty soon. So I found out about fake-user-agent, especially about random user-agent. Now i'm switching these agents whenever I get banned. I can also get a random bot user-agent, so again I have to switch it and once more check for picking an already blacklisted user-agent (if that's even possible to get on a black-listed one). And now the problem itself is that I get an exception without any message at a random step of my process. I can probably someday get it done without exceptions, but that's not an option. The code and the exception are below:

def switch():
   ua = UserAgent()
   profile = webdriver.FirefoxProfile()
   profile.set_preference("general.useragent.override", "%s"%ua.random)
   browser = webdriver.Firefox(profile)
   browser.get(link)
   soup = BeautifulSoup(browser.page_source.encode('utf-8'), "html.parser")
   return soup, browser
   
display = Display(visible=0, size=(800, 600))
display.start()
   
ua = UserAgent()

profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "%s"%ua.random)
browser = webdriver.Firefox(profile)
   

browser.get(link)
soup = BeautifulSoup(browser.page_source.encode('utf-8'), "html.parser")
while (soup.find('title').get_text() == 'Проверка пользователя'):
   soup, browser = switch()      
button = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.ID, "show_phone")))
button.click()
phones = browser.find_elements_by_class_name('card__phones-list-item')
cont = ', '.join([phone.text for phone in phones])
while(cont == ''):
   soup, browser = switch()
   while (soup.find('title').get_text() == 'Проверка пользователя'):
       soup, browser = switch()
   button = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.ID, "show_phone")))                                          
   button.click()
   phones = browser.find_elements_by_class_name('card__phones-list-item')
   cont = ', '.join([phone.text for phone in phones])
browser.quit()
display.stop()
When I first got this exception: ElementNotInteractableException, I found out that it can be connected with different times of loading of web-pages, so I set up a wait time (I tried different times, impllict and explicit). Still I meet this exception sometimes.

So I'd like to ask you for help here. Thank you.

Also, I have probably another option, using jquery to somehow get information from that button, I can introduce the code I found on the html, if somebody can help me translate it into python and tell me how exactly I'm supposed to use it, because I don't understand that at all.

Would be also great if someone can connect with me in PM, I think this problem should be easily resolved.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Problem with scrapping Website giddyhead 1 1,632 Mar-08-2024, 08:20 AM
Last Post: AhanaSharma
  python web scrapping mg24 1 333 Mar-01-2024, 09:48 PM
Last Post: snippsat
  How can I ignore empty fields when scrapping never5000 0 1,397 Feb-11-2022, 09:19 AM
Last Post: never5000
  Suggestion request for scrapping html table Vkkindia 3 2,038 Dec-06-2021, 06:09 PM
Last Post: Larz60+
  web scrapping through Python Naheed 2 2,628 May-17-2021, 12:02 PM
Last Post: Naheed
  Website scrapping and download santoshrane 3 4,332 Apr-14-2021, 07:22 AM
Last Post: kashcode
  Newbie help with lxml scrapping chelsealoa 1 1,867 Jan-08-2021, 09:14 AM
Last Post: Larz60+
  Scrapping Sport score laplacea 1 2,264 Dec-13-2020, 04:09 PM
Last Post: Larz60+
  How to export to csv the output of every iteration when scrapping with a loop efthymios 2 2,297 Nov-30-2020, 07:46 PM
Last Post: efthymios
  Web scrapping - Stopped working peterjv26 2 3,088 Sep-23-2020, 08:30 AM
Last Post: peterjv26

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020