page impossible to scrap? :O - Printable Version

page impossible to scrap? :O - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: page impossible to scrap? :O (/thread-21529.html)

page impossible to scrap? :O - zarize - Oct-03-2019

Hi guys,

i have found page where i cant scrap the shortest flight

https://www.skyscanner.net/transport/flights/mpm/tyoa/191008/191015/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=1&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=home#/

how it come? it is possible to block site from scrapping?

page = 'https://www.skyscanner.net/transport/flights/mpm/tyoa/191008/191015/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=1&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=home#/'
r = requests.get(page)
content = (r.text)
soup = BeautifulSoup(content, 'html.parser')
test = soup.find_all(class_='BpkTicket_bpk-ticket__paper__2gPSe BpkTicket_bpk-ticket__main__J31fH BpkTicket_bpk-ticket__main--padded__WIbjx BpkTicket_bpk-ticket__main--horizontal__2MgwA BpkTicket_bpk-ticket__paper--with-notches__19yQc'):
print(test)

I guess flight seeker sites works with some kind of refresh data, hence, its not visible in requests? am i right? In this case i would need some sleep/wait function, right?

RE: page impossible to scrap? :O - buran - Oct-03-2019

If they use JavaScript you may need to use Selenium
Check our tutorial - https://python-forum.io/Thread-Web-scraping-part-2
look for God dammit JavaScript, why do i not get all content and next

RE: page impossible to scrap? :O - zarize - Oct-03-2019

Thanks buran,

as always helpful! :)

now time to learn captcha solving! :D