Jul-19-2018, 09:44 PM
Thanks
getting started, again
|
Jul-19-2018, 09:44 PM
Thanks
Jul-20-2018, 10:08 AM
My next attempt was to try install beautifulsoup.
I tried from the C:\python37 and C:\ Using the instructions posted earlier in this thread. I get an error at the CMD on both occasions Quote:C:\python37>cd..
Jul-20-2018, 11:00 AM
It's
pip install beautifulsoup4
Jul-20-2018, 10:47 PM
OK, successfully installed BSoup 4
2 more things, I need to install an editor, so I guess the one mentioned here previously. And secondly, ( i read it somewhere, can't think of the name ) to do the web query /scrape. In the previous tutorials, the "hello world" level, I was hoping to find where it actually produces "hello world" in a txt file. But there are variations, for example, "over-write" the existing text within, or in some cases append to the same txt file, so it fills up. And the the txt file is missing or deleted, it makes a new one, in the particular folder. I will need these types of scripts down the track. Where would I find examples, or are these already in those libraries ?
Jul-21-2018, 03:18 AM
Successfully installed PyScripter 3.4.1.0 x86
Is this now ready to do a web scrape, or do I require anything else ? Thanks (Jul-21-2018, 03:18 AM)bluedoor5 Wrote: Is this now ready to do a web scrape, or do I require anything else ?Can look at Web-Scraping part-1 Will need Requests and optional lxml(for 3.7 will need wheel from gohlke). pip install requests # Eg lxml wheel for python 32-bit pip install lxml-4.2.3-cp37-cp37m-win32.whl Processing c:\aaa\lxml-4.2.3-cp37-cp37m-win32.whl Installing collected packages: lxml Successfully installed lxml-4.2.3Then can test first code just copy into PyScripter and push run button. import requests from bs4 import BeautifulSoup url = 'http://CNN.com' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') print(soup.find('title').text)
bluedoor5 Wrote:But there are variations, for example, "over-write" the existing text within, or in some cases append to the same txt file, so it fills up.Reading and writing to files is a standard part of Python. import requests from bs4 import BeautifulSoup url = 'http://CNN.com' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') with open('title.txt', 'a+') as f_out: f_out.write(f"{soup.find('title').text}\n")So if i run code 3 times, title.txt would look like this.
Jul-21-2018, 02:44 PM
So if I run the txt file script,
the way read it, the script is looking for Quote:title.txtor whatever name I choose to call it, "my-text-file.txt" What I don't see in the script is what folder the txt file resides. In this case I have a special folder called Quote:C:\MYPYTHONSKOOL Inside the folder for testing can be my txt files ( or .py scripts ) individually named The script ran fine with no error, but could not find either Quote:title.txtor Quote:my-text-file.txt Windows Search did not come up with anything. --------- This script came up with a syntax error though Quote:"SyntaxError: invalid syntax (<module2 >,line17 I understood the code being: pip install requests Quote:# Eg lxml wheel for python 32-bit (Jul-21-2018, 02:44 PM)bluedoor5 Wrote: or whatever name I choose to call it, "my-text-file.txt"Call it what you want. (Jul-21-2018, 02:44 PM)bluedoor5 Wrote: What I don't see in the script is what folder the txt file resides.It will be in the folder you run the scrip,use File --> save eg save it as my_tite.py an point to a folder eg C:\MYPYTHONSKOOL.When run it now my-text-file.txt will be in folder C:\MYPYTHONSKOOL.Or give path to folder. with open(r'C:\MYPYTHONSKOOL\my-text-file.txt', 'a+') as f_out: f_out.write(f"{soup.find('title').text}\n" Quote:"SyntaxError: invalid syntax (<module2 >,line17There is no line 17 in my code. Quote:I understood the code being:You need to download the wheel in link i gave you gohlke The use pip intall wheel_name.whl Take it easy and try read it several times and understand what posted, you seems to struggle with everything from very basic Python to basic system OS understanding ![]()
Jul-21-2018, 04:36 PM
Quote:with open(r'C:\MYPYTHONSKOOL\my-text-file.txt', 'a+') as f_out: Worked first go. That's one way logging I will be using in future, add to next line method The other type of logging is the over-write. Also, if "my-text-file.txt" does not exist, it makes one. --------- Quote:There is no line 17 in my code. That's because I have other bits of code # hashed out. Quote:You need to download the wheel in link i gave you gohlkeyep Thanks.
Jul-23-2018, 06:00 PM
(Jul-21-2018, 04:36 PM)bluedoor5 Wrote: That's one way logging I will be using in future, add to next line method (Jul-21-2018, 04:36 PM)bluedoor5 Wrote:with open(r'C:\MYPYTHONSKOOL\my-text-file.txt', 'a+') as f_out: The second argument to open is what mode you're opening the file in. "r" is for read-only (you can't write to the file), "w" is for write (and it erases whatever used to be in the file), "a" is for append (you can write, and whatever you write is at the end of the file [...unless you call seek() first...]).For future reference, the builtin function help can be used to see the docs for something. That normally explains the available options and what they mean. For example, here's open() :>>> help(open) Help on built-in function open in module io: open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None) Open file and return a stream. Raise IOError upon failure. file is either a text or byte string giving the name (and the path if the file isn't in the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless closefd is set to False.) mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r' which means open for reading in text mode. Other common values are 'w' for writing (truncating the file if it already exists), 'x' for creating and writing to a new file, and 'a' for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position). In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.) The available modes are: ========= =============================================================== Character Meaning --------- --------------------------------------------------------------- 'r' open for reading (default) 'w' open for writing, truncating the file first 'x' create a new file and open it for writing 'a' open for writing, appending to the end of the file if it exists 'b' binary mode 't' text mode (default) '+' open a disk file for updating (reading and writing) 'U' universal newline mode (deprecated) ========= =============================================================== The default mode is 'rt' (open for reading text). For binary random access, the mode 'w+b' opens and truncates the file to 0 bytes, while 'r+b' opens the file without truncation. The 'x' mode implies 'w' and raises an `FileExistsError` if the file already exists. Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn't. Files opened in binary mode (appending 'b' to the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is appended to the mode argument), the contents of the file are returned as strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given. 'U' mode is deprecated and will raise an exception in future versions of Python. It has no effect in Python 3. Use newline to control universal newlines mode. buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows: * Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device's "block size" and falling back on `io.DEFAULT_BUFFER_SIZE`. On many systems, the buffer will typically be 4096 or 8192 bytes long. -- More -- |
|
Possibly Related Threads… | |||||
Thread | Author | Replies | Views | Last Post | |
getting started | b4iknew | 3 | 3,388 |
Jan-22-2019, 09:12 AM Last Post: b4iknew |