Python Forum

Hi guys,

i have found page where i cant scrap the shortest flight

https://www.skyscanner.net/transport/fli...ref=home#/

how it come? it is possible to block site from scrapping?

page = 'https://www.skyscanner.net/transport/flights/mpm/tyoa/191008/191015/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=1&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=home#/'
r = requests.get(page)
content = (r.text)
soup = BeautifulSoup(content, 'html.parser')
test = soup.find_all(class_='BpkTicket_bpk-ticket__paper__2gPSe BpkTicket_bpk-ticket__main__J31fH BpkTicket_bpk-ticket__main--padded__WIbjx BpkTicket_bpk-ticket__main--horizontal__2MgwA BpkTicket_bpk-ticket__paper--with-notches__19yQc'):
print(test)

I guess flight seeker sites works with some kind of refresh data, hence, its not visible in requests? am i right? In this case i would need some sleep/wait function, right?

If they use JavaScript you may need to use Selenium
Check our tutorial - https://python-forum.io/Thread-Web-scraping-part-2
look for God dammit JavaScript, why do i not get all content and next

Thanks buran,

as always helpful! :)

now time to learn captcha solving! :D

zarize

buran

zarize