Jul-19-2018, 09:44 PM
Pages: 1 2
Jul-20-2018, 10:08 AM
My next attempt was to try install beautifulsoup.
I tried from the C:\python37
and C:\
Using the instructions posted earlier in this thread.
I get an error at the CMD on both occasions
I tried from the C:\python37
and C:\
Using the instructions posted earlier in this thread.
I get an error at the CMD on both occasions
Quote:C:\python37>cd..
C:\>pip install beautifulsoup
Collecting beautifulsoup
Using cached https://files.pythonhosted.org/packages/...a1a5a7accd
783d0dfe14524867e31abb05b6c0eeceee49c759d/BeautifulSoup-3.2.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Windows\TEMP\pip-install-h0hg8r5a\beautifulsoup\setup.py", line 2
2
print "Unit tests have failed!"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Uni
t tests have failed!")?
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Windows\TEMP\p
ip-install-h0hg8r5a\beautifulsoup\
C:\>
Jul-20-2018, 11:00 AM
It's
pip install beautifulsoup4
Jul-20-2018, 10:47 PM
OK, successfully installed BSoup 4
2 more things, I need to install an editor, so I guess the one mentioned here previously.
And secondly, ( i read it somewhere, can't think of the name ) to do the web query /scrape.
In the previous tutorials, the "hello world" level, I was hoping to find where it actually
produces "hello world" in a txt file.
But there are variations, for example, "over-write" the existing text within, or in some cases append to the same txt file, so it fills up. And the the txt file is missing or deleted, it makes a new one, in the particular folder.
I will need these types of scripts down the track.
Where would I find examples, or are these already in those libraries ?
2 more things, I need to install an editor, so I guess the one mentioned here previously.
And secondly, ( i read it somewhere, can't think of the name ) to do the web query /scrape.
In the previous tutorials, the "hello world" level, I was hoping to find where it actually
produces "hello world" in a txt file.
But there are variations, for example, "over-write" the existing text within, or in some cases append to the same txt file, so it fills up. And the the txt file is missing or deleted, it makes a new one, in the particular folder.
I will need these types of scripts down the track.
Where would I find examples, or are these already in those libraries ?
Jul-21-2018, 03:18 AM
Successfully installed PyScripter 3.4.1.0 x86
Is this now ready to do a web scrape, or do I require anything else ?
Thanks
Is this now ready to do a web scrape, or do I require anything else ?
Thanks
Jul-21-2018, 06:50 AM
(Jul-21-2018, 03:18 AM)bluedoor5 Wrote: [ -> ]Is this now ready to do a web scrape, or do I require anything else ?Can look at Web-Scraping part-1
Will need Requests and optional lxml(for 3.7 will need wheel from gohlke).
pip install requests # Eg lxml wheel for python 32-bit pip install lxml-4.2.3-cp37-cp37m-win32.whl Processing c:\aaa\lxml-4.2.3-cp37-cp37m-win32.whl Installing collected packages: lxml Successfully installed lxml-4.2.3Then can test first code just copy into PyScripter and push run button.
import requests from bs4 import BeautifulSoup url = 'http://CNN.com' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') print(soup.find('title').text)
Output:CNN International - Breaking News, US News, World News and Video
bluedoor5 Wrote:But there are variations, for example, "over-write" the existing text within, or in some cases append to the same txt file, so it fills up.Reading and writing to files is a standard part of Python.
Where would I find examples, or are these already in those libraries ?
import requests from bs4 import BeautifulSoup url = 'http://CNN.com' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') with open('title.txt', 'a+') as f_out: f_out.write(f"{soup.find('title').text}\n")So if i run code 3 times,
title.txt
would look like this.Output:CNN International - Breaking News, US News, World News and Video
CNN International - Breaking News, US News, World News and Video
CNN International - Breaking News, US News, World News and Video
Jul-21-2018, 02:44 PM
So if I run the txt file script,
the way read it, the script is looking for
What I don't see in the script is what folder the txt file resides.
In this case I have a special folder called
Inside the folder for testing can be my txt files ( or .py scripts ) individually named
The script ran fine with no error, but could not find either
Windows Search did not come up with anything.
---------
This script came up with a syntax error though
I understood the code being:
pip install requests
the way read it, the script is looking for
Quote:title.txtor whatever name I choose to call it, "my-text-file.txt"
What I don't see in the script is what folder the txt file resides.
In this case I have a special folder called
Quote:C:\MYPYTHONSKOOL
Inside the folder for testing can be my txt files ( or .py scripts ) individually named
The script ran fine with no error, but could not find either
Quote:title.txtor
Quote:my-text-file.txt
Windows Search did not come up with anything.
---------
This script came up with a syntax error though
Quote:"SyntaxError: invalid syntax (<module2 >,line17
I understood the code being:
pip install requests
Quote:# Eg lxml wheel for python 32-bit
pip install lxml-4.2.3-cp37-cp37m-win32.whl
Jul-21-2018, 03:29 PM
(Jul-21-2018, 02:44 PM)bluedoor5 Wrote: [ -> ]or whatever name I choose to call it, "my-text-file.txt"Call it what you want.
(Jul-21-2018, 02:44 PM)bluedoor5 Wrote: [ -> ]What I don't see in the script is what folder the txt file resides.It will be in the folder you run the scrip,use
File --> save
eg save it as my_tite.py
an point to a folder eg C:\MYPYTHONSKOOL.When run it now
my-text-file.txt
will be in folder C:\MYPYTHONSKOOL.Or give path to folder.
with open(r'C:\MYPYTHONSKOOL\my-text-file.txt', 'a+') as f_out: f_out.write(f"{soup.find('title').text}\n"
Quote:"SyntaxError: invalid syntax (<module2 >,line17There is no line 17 in my code.
Quote:I understood the code being:You need to download the wheel in link i gave you gohlke
# Eg lxml wheel for python 32-bit
pip install lxml-4.2.3-cp37-cp37m-win32.whl
The use
pip intall wheel_name.whl
Take it easy and try read it several times and understand what posted,
you seems to struggle with everything from very basic Python to basic system OS understanding

Jul-21-2018, 04:36 PM
Quote:with open(r'C:\MYPYTHONSKOOL\my-text-file.txt', 'a+') as f_out:
f_out.write(f"{soup.find('title').text}\n"
Worked first go.
That's one way logging I will be using in future, add to next line method
The other type of logging is the over-write.
Also, if "my-text-file.txt" does not exist, it makes one.
---------
Quote:There is no line 17 in my code.
That's because I have other bits of code # hashed out.
Quote:You need to download the wheel in link i gave you gohlkeyep
The use pip intall wheel_name.whl
Thanks.
Jul-23-2018, 06:00 PM
(Jul-21-2018, 04:36 PM)bluedoor5 Wrote: [ -> ]That's one way logging I will be using in future, add to next line method
The other type of logging is the over-write.
Also, if "my-text-file.txt" does not exist, it makes one.
(Jul-21-2018, 04:36 PM)bluedoor5 Wrote: [ -> ]with open(r'C:\MYPYTHONSKOOL\my-text-file.txt', 'a+') as f_out:
The second argument to
open
is what mode you're opening the file in. "r" is for read-only (you can't write to the file), "w" is for write (and it erases whatever used to be in the file), "a" is for append (you can write, and whatever you write is at the end of the file [...unless you call seek()
first...]).For future reference, the builtin function
help
can be used to see the docs for something. That normally explains the available options and what they mean. For example, here's open()
:>>> help(open) Help on built-in function open in module io: open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None) Open file and return a stream. Raise IOError upon failure. file is either a text or byte string giving the name (and the path if the file isn't in the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless closefd is set to False.) mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r' which means open for reading in text mode. Other common values are 'w' for writing (truncating the file if it already exists), 'x' for creating and writing to a new file, and 'a' for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position). In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.) The available modes are: ========= =============================================================== Character Meaning --------- --------------------------------------------------------------- 'r' open for reading (default) 'w' open for writing, truncating the file first 'x' create a new file and open it for writing 'a' open for writing, appending to the end of the file if it exists 'b' binary mode 't' text mode (default) '+' open a disk file for updating (reading and writing) 'U' universal newline mode (deprecated) ========= =============================================================== The default mode is 'rt' (open for reading text). For binary random access, the mode 'w+b' opens and truncates the file to 0 bytes, while 'r+b' opens the file without truncation. The 'x' mode implies 'w' and raises an `FileExistsError` if the file already exists. Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn't. Files opened in binary mode (appending 'b' to the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is appended to the mode argument), the contents of the file are returned as strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given. 'U' mode is deprecated and will raise an exception in future versions of Python. It has no effect in Python 3. Use newline to control universal newlines mode. buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows: * Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device's "block size" and falling back on `io.DEFAULT_BUFFER_SIZE`. On many systems, the buffer will typically be 4096 or 8192 bytes long. -- More --
Pages: 1 2