(Jul-21-2018, 03:18 AM)bluedoor5 Wrote: Is this now ready to do a web scrape, or do I require anything else ?Can look at Web-Scraping part-1
Will need Requests and optional lxml(for 3.7 will need wheel from gohlke).
pip install requests # Eg lxml wheel for python 32-bit pip install lxml-4.2.3-cp37-cp37m-win32.whl Processing c:\aaa\lxml-4.2.3-cp37-cp37m-win32.whl Installing collected packages: lxml Successfully installed lxml-4.2.3Then can test first code just copy into PyScripter and push run button.
import requests from bs4 import BeautifulSoup url = 'http://CNN.com' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') print(soup.find('title').text)
Output:CNN International - Breaking News, US News, World News and Video
bluedoor5 Wrote:But there are variations, for example, "over-write" the existing text within, or in some cases append to the same txt file, so it fills up.Reading and writing to files is a standard part of Python.
Where would I find examples, or are these already in those libraries ?
import requests from bs4 import BeautifulSoup url = 'http://CNN.com' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') with open('title.txt', 'a+') as f_out: f_out.write(f"{soup.find('title').text}\n")So if i run code 3 times,
title.txt
would look like this.Output:CNN International - Breaking News, US News, World News and Video
CNN International - Breaking News, US News, World News and Video
CNN International - Breaking News, US News, World News and Video