Traceback error - Printable Version

Traceback error - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Traceback error (/thread-10502.html)

Pages: 1 2 3 4 5 6

Traceback error - tjnichols - May-23-2018

When I run the following

import requests
from bs4 import BeautifulSoup
from pathlib import Path

class GetCompletions:
    def __init__(self, infile):
        """Above will create a folder called comppdf, and geocorepdf wherever the WOGCC
           File Downloads file is run from as well as a text file for my api file to
           reside.
        """
        self.homepath = Path('.')
        self.completionspath = self.homepath / 'comppdf'
        self.completionspath.mkdir(exist_ok=True)
        self.geocorepdf = self.homepath / 'geocorepdf'
        self.geocorepdf.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)

        self.infile = self.textpath / infile
        self.api = []

        self.parse_and_save(getpdfs=True)



    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10]))
            yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10]))

        """Above will get the URL that matches my API numbers."""

    def parse_and_save(self, getpdfs=False):
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url in link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                        try:
                            header_info = response.headers['Content-Disposition']
                            idx = header_info.index('filename')
                            filename = self.log_pdfpath / header[idx+9:]
                        except ValueError:
                            filename = self.log_pdfpath / 'comp{}'.format(url[p+1:])
                            print("couldn't locate filename for {} will use: {}".format(file, filename))
                        except KeyError:
                            filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:])
                            print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers))
                            print('will use name: {}'.format(filename))
                            print(repsonse.headers)
                        with filename.open('wb') as f:
                            f.write(respnse.content)

            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.text:
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text))

if __name__ == '__main__':
    GetCompletions('api.txt')

It doesn't create this text file.

 self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)

I get the following error

Error: RESTART: C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads.py 
Traceback (most recent call last):
  File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads.py", line 71, in <module>
    GetCompletions('api.txt')
  File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads.py", line 17, in __init__
    self.text.mkdir(exist_ok=True)
AttributeError: 'GetCompletions' object has no attribute 'text'

I appreciate any help I can get!

Thanks!

Tonya

RE: Traceback error - tjnichols - May-23-2018

when I change the following code it makes the text folder but I come up with more errors than I had before.

self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)

This is different from my previous version:

self.textpath = self.homepath / 'text'
        self.textpath.mkdir(exist_ok=True)

And the resulting error:

Error: RESTART: C:/Users/toliver/AppData/Local/Programs/Python/Python36/WOGCC/WOGCC_File_Downloads delete.py 
Traceback (most recent call last):
  File "C:/Users/toliver/AppData/Local/Programs/Python/Python36/WOGCC/WOGCC_File_Downloads delete.py", line 71, in <module>
    GetCompletions('api.txt')
  File "C:/Users/toliver/AppData/Local/Programs/Python/Python36/WOGCC/WOGCC_File_Downloads delete.py", line 22, in __init__
    self.parse_and_save(getpdfs=True)
  File "C:/Users/toliver/AppData/Local/Programs/Python/Python36/WOGCC/WOGCC_File_Downloads delete.py", line 34, in parse_and_save
    for file in filelist:
NameError: name 'filelist' is not defined

Thoughts? Ideas?

As always - your help is greatly appreciated!

RE: Traceback error - Larz60+ - May-23-2018

It should be like this (as in original code)

self.textpath = self.homepath / 'text'
self.textpath.mkdir(exist_ok=True)

RE: Traceback error - tjnichols - May-23-2018

That is what it says now. I don't understand the extra spaces above, they aren't that way in my ,py file.

RE: Traceback error - Larz60+ - May-23-2018

Whatever editor you are using is not showing tabs. You should always use spaces, that way
when you use a bad editor it will show the white space.

When copying code, you need to cut and paste, not retype.
That way you know what you're getting.

I suggest you go back to the code (under 'new code' here: https://python-forum.io/Thread-EOL-While...ral?page=4

select the code by double clicking in the code window (on any line of code)
Type ctrl-c to copy the code, and ctrl-V into a new PyCharm window.

Save the code.

with curson in PyCharm window, from meny, click run-->run-->your program name
It should run without error (Of course you must be connected to internet)

If copied and pasted correctly, this code works.

RE: Traceback error - tjnichols - May-23-2018

Ok - when I run it - I get : Fetching main page for entry "api#". It gets to the end then I get this error.

Error:Traceback (most recent call last):
  File "C:\Python Tutorials\LARS with API.py", line 97, in <module>
    GetCompletions('apis.txt')
  File "C:\Python Tutorials\LARS with API.py", line 26, in __init__
    self.parse_and_save(getpdfs=True)
  File "C:\Python Tutorials\LARS with API.py", line 47, in parse_and_save
    soup = BeautifulSoup(f.read(), 'lxml')
  File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\lib\site-packages\bs4\__init__.py", line 165, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
>>>

RE: Traceback error - nilamo - May-23-2018

(May-23-2018, 02:28 PM)tjnichols Wrote: AttributeError: 'GetCompletions' object has no attribute 'text'

That error tells you everything you need to know.

(May-23-2018, 02:28 PM)tjnichols Wrote:

        self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)

What do you think self.text is? The answer is nothing, since you never set it. And because it's nothing, it definitely doesn't have a .mkdir method. The error is letting you know that you're trying to use a thing that doesn't exist. So the solution depends entirely on what you're expecting to happen.

RE: Traceback error - nilamo - May-23-2018

(May-23-2018, 06:01 PM)tjnichols Wrote: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Again, the error tells you everything you need to know. You're telling BeautifulSoup to use the lxml parser, but you don't have that installed. So, install it pip install lxml

RE: Traceback error - tjnichols - May-23-2018

Ok - scratch the last message. I got ahead of myself. All of the the addins - I'm doing that now. As well as running it in PyCharm. Also - doing that now.

This is done! Thanks nilamo!

(May-23-2018, 06:05 PM)nilamo Wrote:
(May-23-2018, 06:01 PM)tjnichols Wrote: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
Again, the error tells you everything you need to know. You're telling BeautifulSoup to use the lxml parser, but you don't have that installed. So, install it pip install lxml

RE: Traceback error - tjnichols - May-23-2018

Larz60+ - when I go to the link - it gives me a 404 error. Can you repost it?

Thanks again!