a html_grabber

Blue Dog · (This post was last modified: Nov-12-2016, 02:25 AM by Blue Dog.)

Here is a bot that will print out and save to a file the complete html of a website. Wow, made my me, at last. I been working on this for over a week now. Dance

It does work very well, made in python2.7.12 om win 7. So let have a look at it.

import urllib
urls = input ('Enter the web site address you want to scrape like this "http://www.google.com" ')


website = urls
filename = input ('Enter the file name you want like this "MyFile.txt "')
htmlfile = urllib.urlopen(website)
htmltext = htmlfile.read()
f = open(filename,"w") #opens file with name of "test.txt"
f.write(htmltext)
f.close()
print htmltext
input("Press the enter key to exit")

This is not much, but I hope to be able to give more back to the forum, I have got so much help here.
I want to thank you all for the help
EDIT: I added some more code, you now can name the file you are saving too.

**j.crater** · Nov-11-2016, 06:09 AM

Well done! It's great you got it working =) I can add a few comments though:

- A bit of a grammar schmazi thing is, "Inter" should be "Enter"
- there's no need for using both, 'website' and 'urls' variables, since you are just assigning same value to another, so you can have just "website = input (.....)"
- for opening files I would recommend using "with open" keywords, you can find plenty about its usage online

Keep up the good work ;)

wavic · Nov-11-2016, 03:52 PM

The first working script brings a lot of joy. :)
Keep learning

a html_grabber

User Panel Messages

Announcements