Python Forum

Full Version: Download entire web pages and save them as html file with urllib.request
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I can save multiple web pages with using these codes; however, I cant see a proper website view after saving them as html. For example, the texts in table are slipped and images can't be seen. I need to download entire pages just as we do save as in any web browser so that I can see a proper view.
import urllib.request

url= 'https://asd.com/asdID='
for i in range(1, 5):
    print('     --> ID:', i)
    newurl = url + str(i)
    f = open(str(i)+'.html', 'w')
    page = urllib.request.urlopen(newurl)
    pagetext = str(page.read())
    f.write(pagetext)
    f.close()
You have to analyze the html in your browser (view source code):

1 - Check from where it is getting the images files.
2 - Check all the CSS and JS that are being loaded to proper render the page file.
2.1 - Check if the table content is coming from JS (maybe JS is generating the content dynamically).
that's because images, css, and other needed parts of the page are not part of the web page itself.
you will need to scrape these as well.
Read the scraping tutorial here:
Web scraping Part1
Web Scraping Part2