Python Forum
URLLIB.REQUEST Not Working - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: URLLIB.REQUEST Not Working (/thread-5088.html)



URLLIB.REQUEST Not Working - hallofriends - Sep-18-2017

import urllib.request
try:
    url = 'https://www.whoscored.com/Players/5583/Show/Cristiano-Ronaldo'

    # now, with the below headers, we defined ourselves as a simpleton who is
    # still using internet explorer.
    headers = {}
    headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
    req = urllib.request.Request(url, headers = headers)
    resp = urllib.request.urlopen(req)
    respData = resp.read()
    print(respData)

    saveFile = open('withHeaders.txt','w')
    saveFile.write(str(respData))
    saveFile.close()
except Exception as e:
    print(str(e))
The above code is not returning the source of the page properly. It is returning as

Error:
b'<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe src="/_Incapsula_Resource?CWUDNSAI=9&xinfo=1-67208763-0%200NNN%20RT%281505746830921%2010%29%20q%280%20-1%20-1%20-1%29%20r%280%20-1%29%20B12%284%2c316%2c0%29%20U2&incident_id=500000570207585028-516670650047529873&edet=12&cinfo=04000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 500000570207585028-516670650047529873</iframe></body></html>'



RE: URLLIB.REQUEST Not Working - Larz60+ - Sep-18-2017

Use requests, see https://python-forum.io/Thread-Web-Scraping-part-1
and/or https://python-forum.io/Thread-Web-scraping-part-2