Python Forum

Full Version: Web scraping cookie in URL blocks selenium
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey guys,

i want to scrape data on a one-page website (just one URL). I am pretty new to programming. I am sorry for that ;)
So i guess it is a website generated the whole data via Java. I want to scrape some data from the Website. So navigation to reach my wanted page to scrape is already my problem.

I worked with selenium, now i stuck that the website is adding a cookie to my URL and blocks further search.
it adds: jsessionid=89F10E908FEB575216C17BE0432E19B9 to the URL.
How can i block the website form that? Or how can i remove it without changing my website?
If i refresh is still in the URL, if i load a new page (trigger search again, its gone!
Or else, how do i have to name the button to be able to click it now with selenium?
Currently, its: ( driver.find_element(By.CLASS_NAME, 'search').click() )

I am using selenium, because i have to interact with a Java button in text (There is Text in a table sheet to click. Than you see the page with data)
<button class="linkBeg" name="showBeg" type="submit" value="0d7fb682-e537-4732-8441-7fc50d9a3a6c" title="Detailansicht">
Bauer, Oliver </button>
but i can't manage to "click" or trigger that button with MechanicalSoup. If there is a way to do it with Scrapy or BS/MechaniclSoup i would be also happy to know.

Many thanks in advance!
Joerg
Don't use find_by_class use the find_by_xpath or find_by_selector options. You can also clear cookies or use selenium in "incognito" mode Difficult to give good advice without an example code though
Thanks for your answer,

I looked around and the issue was i didn't paused my code, so it had been to many requests to quickly.
I did set it on rest one second after each request, and now it's all working.