Python Forum
Download entire web pages and save them as html file with urllib.request
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Download entire web pages and save them as html file with urllib.request
#1
I can save multiple web pages with using these codes; however, I cant see a proper website view after saving them as html. For example, the texts in table are slipped and images can't be seen. I need to download entire pages just as we do save as in any web browser so that I can see a proper view.
1
2
3
4
5
6
7
8
9
10
11
import urllib.request
 
for i in range(1, 5):
    print('     --> ID:', i)
    newurl = url + str(i)
    f = open(str(i)+'.html', 'w')
    page = urllib.request.urlopen(newurl)
    pagetext = str(page.read())
    f.write(pagetext)
    f.close()
Reply
#2
You have to analyze the html in your browser (view source code):

1 - Check from where it is getting the images files.
2 - Check all the CSS and JS that are being loaded to proper render the page file.
2.1 - Check if the table content is coming from JS (maybe JS is generating the content dynamically).
Reply
#3
that's because images, css, and other needed parts of the page are not part of the web page itself.
you will need to scrape these as well.
Read the scraping tutorial here:
Web scraping Part1
Web Scraping Part2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  download a file from a URL JayManPython 8 4,268 Dec-24-2024, 08:47 AM
Last Post: Penelope58
  Read TXT file in Pandas and save to Parquet zinho 2 1,261 Sep-15-2024, 06:14 PM
Last Post: zinho
  FTP Download of Last File jland47 4 1,925 Mar-16-2024, 09:15 AM
Last Post: Pedroski55
  Open/save file on Android frohr 0 1,115 Jan-24-2024, 06:28 PM
Last Post: frohr
  Make entire script run again every 45 mo NDillard 0 902 Jan-23-2024, 09:40 PM
Last Post: NDillard
  how to save to multiple locations during save cubangt 1 1,294 Oct-23-2023, 10:16 PM
Last Post: deanhystad
  Need to replace a string with a file (HTML file) tester_V 1 1,922 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  urllib can't find "parse" rjdegraff42 6 6,343 Jul-24-2023, 05:28 PM
Last Post: deanhystad
  save values permanently in python (perhaps not in a text file)? flash77 8 2,674 Jul-07-2023, 05:44 PM
Last Post: flash77
  Save and Close Excel File avd88 0 6,475 Feb-20-2023, 07:19 PM
Last Post: avd88

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020