Python Forum
Download entire web pages and save them as html file with urllib.request
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Download entire web pages and save them as html file with urllib.request
#1
I can save multiple web pages with using these codes; however, I cant see a proper website view after saving them as html. For example, the texts in table are slipped and images can't be seen. I need to download entire pages just as we do save as in any web browser so that I can see a proper view.
import urllib.request

url= 'https://asd.com/asdID='
for i in range(1, 5):
    print('     --> ID:', i)
    newurl = url + str(i)
    f = open(str(i)+'.html', 'w')
    page = urllib.request.urlopen(newurl)
    pagetext = str(page.read())
    f.write(pagetext)
    f.close()
Reply
#2
You have to analyze the html in your browser (view source code):

1 - Check from where it is getting the images files.
2 - Check all the CSS and JS that are being loaded to proper render the page file.
2.1 - Check if the table content is coming from JS (maybe JS is generating the content dynamically).
Reply
#3
that's because images, css, and other needed parts of the page are not part of the web page itself.
you will need to scrape these as well.
Read the scraping tutorial here:
Web scraping Part1
Web Scraping Part2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  FTP Download of Last File jland47 4 287 Mar-16-2024, 09:15 AM
Last Post: Pedroski55
  Open/save file on Android frohr 0 278 Jan-24-2024, 06:28 PM
Last Post: frohr
  Make entire script run again every 45 mo NDillard 0 292 Jan-23-2024, 09:40 PM
Last Post: NDillard
  how to save to multiple locations during save cubangt 1 509 Oct-23-2023, 10:16 PM
Last Post: deanhystad
  Need to replace a string with a file (HTML file) tester_V 1 698 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  urllib can't find "parse" rjdegraff42 6 1,970 Jul-24-2023, 05:28 PM
Last Post: deanhystad
  save values permanently in python (perhaps not in a text file)? flash77 8 1,118 Jul-07-2023, 05:44 PM
Last Post: flash77
  download a file from a URL JayManPython 7 1,238 Jun-28-2023, 07:52 AM
Last Post: JayManPython
  Save and Close Excel File avd88 0 2,839 Feb-20-2023, 07:19 PM
Last Post: avd88
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 877 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020