Python Forum
Download entire web pages and save them as html file with urllib.request - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Download entire web pages and save them as html file with urllib.request (/thread-11519.html)



Download entire web pages and save them as html file with urllib.request - fyec - Jul-13-2018

I can save multiple web pages with using these codes; however, I cant see a proper website view after saving them as html. For example, the texts in table are slipped and images can't be seen. I need to download entire pages just as we do save as in any web browser so that I can see a proper view.
import urllib.request

url= 'https://asd.com/asdID='
for i in range(1, 5):
    print('     --> ID:', i)
    newurl = url + str(i)
    f = open(str(i)+'.html', 'w')
    page = urllib.request.urlopen(newurl)
    pagetext = str(page.read())
    f.write(pagetext)
    f.close()



RE: Download entire web pages and save them as html file with urllib.request - gontajones - Jul-13-2018

You have to analyze the html in your browser (view source code):

1 - Check from where it is getting the images files.
2 - Check all the CSS and JS that are being loaded to proper render the page file.
2.1 - Check if the table content is coming from JS (maybe JS is generating the content dynamically).


RE: Download entire web pages and save them as html file with urllib.request - Larz60+ - Jul-13-2018

that's because images, css, and other needed parts of the page are not part of the web page itself.
you will need to scrape these as well.
Read the scraping tutorial here:
Web scraping Part1
Web Scraping Part2