Python Forum

Pages: 1 2

Thanks

My next attempt was to try install beautifulsoup.

I tried from the C:\python37
and C:\

Using the instructions posted earlier in this thread.

I get an error at the CMD on both occasions

Quote:C:\python37>cd..

C:\>pip install beautifulsoup
Collecting beautifulsoup
Using cached https://files.pythonhosted.org/packages/...a1a5a7accd
783d0dfe14524867e31abb05b6c0eeceee49c759d/BeautifulSoup-3.2.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Windows\TEMP\pip-install-h0hg8r5a\beautifulsoup\setup.py", line 2
2
print "Unit tests have failed!"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Uni
t tests have failed!")?

----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Windows\TEMP\p
ip-install-h0hg8r5a\beautifulsoup\

C:\>

It's pip install beautifulsoup4

OK, successfully installed BSoup 4

2 more things, I need to install an editor, so I guess the one mentioned here previously.

And secondly, ( i read it somewhere, can't think of the name ) to do the web query /scrape.

In the previous tutorials, the "hello world" level, I was hoping to find where it actually
produces "hello world" in a txt file.
But there are variations, for example, "over-write" the existing text within, or in some cases append to the same txt file, so it fills up. And the the txt file is missing or deleted, it makes a new one, in the particular folder.

I will need these types of scripts down the track.

Where would I find examples, or are these already in those libraries ?

Successfully installed PyScripter 3.4.1.0 x86

Is this now ready to do a web scrape, or do I require anything else ?

Thanks

(Jul-21-2018, 03:18 AM)bluedoor5 Wrote: [ -> ]Is this now ready to do a web scrape, or do I require anything else ?

Can look at Web-Scraping part-1
Will need Requests and optional lxml(for 3.7 will need wheel from gohlke).

pip install requests  

# Eg lxml wheel for python 32-bit
pip install lxml-4.2.3-cp37-cp37m-win32.whl
Processing c:\aaa\lxml-4.2.3-cp37-cp37m-win32.whl
Installing collected packages: lxml
Successfully installed lxml-4.2.3

Then can test first code just copy into PyScripter and push run button.

import requests
from bs4 import BeautifulSoup
 
url = 'http://CNN.com'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
print(soup.find('title').text)

Output:
CNN International - Breaking News, US News, World News and Video

bluedoor5 Wrote:But there are variations, for example, "over-write" the existing text within, or in some cases append to the same txt file, so it fills up.
Where would I find examples, or are these already in those libraries ?

Reading and writing to files is a standard part of Python.

import requests
from bs4 import BeautifulSoup

url = 'http://CNN.com'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
with open('title.txt', 'a+') as f_out:
    f_out.write(f"{soup.find('title').text}\n")

So if i run code 3 times,title.txt would look like this.

Output:CNN International - Breaking News, US News, World News and Video
CNN International - Breaking News, US News, World News and Video
CNN International - Breaking News, US News, World News and Video

So if I run the txt file script,
the way read it, the script is looking for

Quote:title.txt

or whatever name I choose to call it, "my-text-file.txt"

What I don't see in the script is what folder the txt file resides.

In this case I have a special folder called

Quote:C:\MYPYTHONSKOOL

Inside the folder for testing can be my txt files ( or .py scripts ) individually named

The script ran fine with no error, but could not find either

Quote:title.txt

Quote:my-text-file.txt

Windows Search did not come up with anything.

---------
This script came up with a syntax error though

Quote:"SyntaxError: invalid syntax (<module2 >,line17

I understood the code being:
pip install requests

Quote:# Eg lxml wheel for python 32-bit
pip install lxml-4.2.3-cp37-cp37m-win32.whl

(Jul-21-2018, 02:44 PM)bluedoor5 Wrote: [ -> ]or whatever name I choose to call it, "my-text-file.txt"

Call it what you want.

(Jul-21-2018, 02:44 PM)bluedoor5 Wrote: [ -> ]What I don't see in the script is what folder the txt file resides.

It will be in the folder you run the scrip,use File --> save eg save it as my_tite.py an point to a folder eg C:\MYPYTHONSKOOL.
When run it now my-text-file.txt will be in folder C:\MYPYTHONSKOOL.
Or give path to folder.

with open(r'C:\MYPYTHONSKOOL\my-text-file.txt', 'a+') as f_out:
    f_out.write(f"{soup.find('title').text}\n"

Quote:"SyntaxError: invalid syntax (<module2 >,line17

There is no line 17 in my code.

Quote:I understood the code being:

# Eg lxml wheel for python 32-bit
pip install lxml-4.2.3-cp37-cp37m-win32.whl

You need to download the wheel in link i gave you gohlke
The use pip intall wheel_name.whl

Take it easy and try read it several times and understand what posted,
you seems to struggle with everything from very basic Python to basic system OS understanding Undecided

Quote:with open(r'C:\MYPYTHONSKOOL\my-text-file.txt', 'a+') as f_out:
f_out.write(f"{soup.find('title').text}\n"

Worked first go.
That's one way logging I will be using in future, add to next line method
The other type of logging is the over-write.
Also, if "my-text-file.txt" does not exist, it makes one.
---------

Quote:There is no line 17 in my code.

That's because I have other bits of code # hashed out.

Quote:You need to download the wheel in link i gave you gohlke
The use pip intall wheel_name.whl

yep

Thanks.

(Jul-21-2018, 04:36 PM)bluedoor5 Wrote: [ -> ]That's one way logging I will be using in future, add to next line method
The other type of logging is the over-write.
Also, if "my-text-file.txt" does not exist, it makes one.

(Jul-21-2018, 04:36 PM)bluedoor5 Wrote: [ -> ]
with open(r'C:\MYPYTHONSKOOL\my-text-file.txt', 'a+') as f_out:

The second argument to open is what mode you're opening the file in. "r" is for read-only (you can't write to the file), "w" is for write (and it erases whatever used to be in the file), "a" is for append (you can write, and whatever you write is at the end of the file [...unless you call seek() first...]).

For future reference, the builtin function help can be used to see the docs for something. That normally explains the available options and what they mean. For example, here's open():

>>> help(open)
Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise IOError upon failure.

    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)

    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position).
    In text mode, if encoding is not specified the encoding used is platform
    dependent: locale.getpreferredencoding(False) is called to get the
    current locale encoding. (For reading and writing raw bytes use binary
    mode and leave encoding unspecified.) The available modes are:

    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================

    The default mode is 'rt' (open for reading text). For binary random
    access, the mode 'w+b' opens and truncates the file to 0 bytes, while
    'r+b' opens the file without truncation. The 'x' mode implies 'w' and
    raises an `FileExistsError` if the file already exists.

    Python distinguishes between files opened in binary and text modes,
    even when the underlying operating system doesn't. Files opened in
    binary mode (appending 'b' to the mode argument) return contents as
    bytes objects without any decoding. In text mode (the default, or when
    't' is appended to the mode argument), the contents of the file are
    returned as strings, the bytes having been first decoded using a
    platform-dependent encoding or using the specified encoding if given.

    'U' mode is deprecated and will raise an exception in future versions
    of Python.  It has no effect in Python 3.  Use newline to control
    universal newlines mode.

    buffering is an optional integer used to set the buffering policy.
    Pass 0 to switch buffering off (only allowed in binary mode), 1 to select
    line buffering (only usable in text mode), and an integer > 1 to indicate
    the size of a fixed-size chunk buffer.  When no buffering argument is
    given, the default buffering policy works as follows:

    * Binary files are buffered in fixed-size chunks; the size of the buffer
      is chosen using a heuristic trying to determine the underlying device's
      "block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
      On many systems, the buffer will typically be 4096 or 8192 bytes long.
-- More  --

Pages: 1 2

bluedoor5

bluedoor5

snippsat

bluedoor5

bluedoor5

snippsat

bluedoor5

snippsat

bluedoor5

nilamo