[Hlep]Scrap webiste - Printable Version

[Hlep]Scrap webiste - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: [Hlep]Scrap webiste (/thread-11686.html)

[Hlep]Scrap webiste - mr_byte31 - Jul-21-2018

Hi All,

I have a website that I need to collect some info from it.
I tried to use simple code like this :

import urllib.request
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0"
url= 'https://www.biopharmcatalyst.com/calendars/historical-catalyst-calendar'
x  = urllib.request.Request(url,headers=headers)
html = urllib.request.urlopen(x,timeout=10).read()

it didn't work. they python program hang !

I tried this as well :

import requests
url= 'https://www.biopharmcatalyst.com/calendars/historical-catalyst-calendar'
url_get = requests.get(url)

it also didn't work !!!

any idea what is the problem ?

RE: [Hlep]Scrap webiste - gontajones - Jul-21-2018

You can use the requests module. Requests
For an advanced scrapping I suggest you to use the beatifulsoup module. BeautifulSoup

To see the content of the requests in your second script, use .text:

import requests
url = 'https://www.biopharmcatalyst.com/calendars/historical-catalyst-calendar'
url_get = requests.get(url)
print(url_get.text)

RE: [Hlep]Scrap webiste - Larz60+ - Jul-21-2018

for further reading:
Web Scraping part1
Web Scraping part2