Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Download entire web pages and save them as html file with urllib.request
#1
I can save multiple web pages with using these codes; however, I cant see a proper website view after saving them as html. For example, the texts in table are slipped and images can't be seen. I need to download entire pages just as we do save as in any web browser so that I can see a proper view.
import urllib.request

url= 'https://asd.com/asdID='
for i in range(1, 5):
    print('     --> ID:', i)
    newurl = url + str(i)
    f = open(str(i)+'.html', 'w')
    page = urllib.request.urlopen(newurl)
    pagetext = str(page.read())
    f.write(pagetext)
    f.close()
Quote
#2
You have to analyze the html in your browser (view source code):

1 - Check from where it is getting the images files.
2 - Check all the CSS and JS that are being loaded to proper render the page file.
2.1 - Check if the table content is coming from JS (maybe JS is generating the content dynamically).
Quote
#3
that's because images, css, and other needed parts of the page are not part of the web page itself.
you will need to scrape these as well.
Read the scraping tutorial here:
Web scraping Part1
Web Scraping Part2
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  How to import entire module ? tonycstech 4 339 Dec-01-2019, 01:53 AM
Last Post: quyenca
  Save a file uploaded from client-side without having to read into memory andym118 3 266 Nov-21-2019, 07:34 AM
Last Post: DeaD_EyE
  Subtract 11 from entire list of quoted numbers Pleiades 1 128 Nov-14-2019, 10:26 AM
Last Post: Larz60+
  Details of attachment files in a msg file such as file names save into a python list klllmmm 2 324 Nov-12-2019, 05:59 AM
Last Post: klllmmm
  How do I read the HTML files in a directory and write the content into a CSV file? glittergirl 1 223 Sep-23-2019, 11:01 AM
Last Post: Larz60+
  Read each line, replace string and save into a new file igormonteiro 2 346 Sep-15-2019, 01:24 PM
Last Post: buran
  save content of table into file atlass218 10 558 Aug-28-2019, 12:12 PM
Last Post: Gribouillis
  HTML to Python to Windows .bat and back to HTML perfectservice33 0 265 Aug-22-2019, 06:31 AM
Last Post: perfectservice33
  How to Find & Count String Patterns Between two Markers in a HTML file ahmedwaqas92 3 304 Aug-19-2019, 10:12 AM
Last Post: ahmedwaqas92
  read text file and write into html with correct link jacklee26 4 370 Aug-02-2019, 05:48 AM
Last Post: jacklee26

Forum Jump:


Users browsing this thread: 1 Guest(s)