getting started, again

***snippsat*** · (This post was last modified: Jul-21-2018, 06:50 AM by snippsat.)

(Jul-21-2018, 03:18 AM)bluedoor5 Wrote: Is this now ready to do a web scrape, or do I require anything else ?

Can look at Web-Scraping part-1
Will need Requests and optional lxml(for 3.7 will need wheel from gohlke).

pip install requests  

# Eg lxml wheel for python 32-bit
pip install lxml-4.2.3-cp37-cp37m-win32.whl
Processing c:\aaa\lxml-4.2.3-cp37-cp37m-win32.whl
Installing collected packages: lxml
Successfully installed lxml-4.2.3

Then can test first code just copy into PyScripter and push run button.

import requests
from bs4 import BeautifulSoup
 
url = 'http://CNN.com'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
print(soup.find('title').text)

Output:
CNN International - Breaking News, US News, World News and Video

bluedoor5 Wrote:But there are variations, for example, "over-write" the existing text within, or in some cases append to the same txt file, so it fills up.
Where would I find examples, or are these already in those libraries ?

Reading and writing to files is a standard part of Python.

import requests
from bs4 import BeautifulSoup

url = 'http://CNN.com'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
with open('title.txt', 'a+') as f_out:
    f_out.write(f"{soup.find('title').text}\n")

So if i run code 3 times,title.txt would look like this.

Output:CNN International - Breaking News, US News, World News and Video
CNN International - Breaking News, US News, World News and Video
CNN International - Breaking News, US News, World News and Video

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	getting started	b4iknew	3	2,649	Jan-22-2019, 09:12 AM Last Post: b4iknew

getting started, again

User Panel Messages

Announcements