Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Traceback error
#1
When I run the following
import requests
from bs4 import BeautifulSoup
from pathlib import Path

class GetCompletions:
    def __init__(self, infile):
        """Above will create a folder called comppdf, and geocorepdf wherever the WOGCC
           File Downloads file is run from as well as a text file for my api file to
           reside.
        """
        self.homepath = Path('.')
        self.completionspath = self.homepath / 'comppdf'
        self.completionspath.mkdir(exist_ok=True)
        self.geocorepdf = self.homepath / 'geocorepdf'
        self.geocorepdf.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)

        self.infile = self.textpath / infile
        self.api = []

        self.parse_and_save(getpdfs=True)



    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10]))
            yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10]))

        """Above will get the URL that matches my API numbers."""

    def parse_and_save(self, getpdfs=False):
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url in link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                        try:
                            header_info = response.headers['Content-Disposition']
                            idx = header_info.index('filename')
                            filename = self.log_pdfpath / header[idx+9:]
                        except ValueError:
                            filename = self.log_pdfpath / 'comp{}'.format(url[p+1:])
                            print("couldn't locate filename for {} will use: {}".format(file, filename))
                        except KeyError:
                            filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:])
                            print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers))
                            print('will use name: {}'.format(filename))
                            print(repsonse.headers)
                        with filename.open('wb') as f:
                            f.write(respnse.content)

            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.text:
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text))

if __name__ == '__main__':
    GetCompletions('api.txt')
It doesn't create this text file.

 self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)
I get the following error

Error:
RESTART: C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads.py Traceback (most recent call last): File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads.py", line 71, in <module> GetCompletions('api.txt') File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads.py", line 17, in __init__ self.text.mkdir(exist_ok=True) AttributeError: 'GetCompletions' object has no attribute 'text'
I appreciate any help I can get!

Thanks!

Tonya
Reply
#2
when I change the following code it makes the text folder but I come up with more errors than I had before.

self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)
This is different from my previous version:

self.textpath = self.homepath / 'text'
        self.textpath.mkdir(exist_ok=True)
And the resulting error:

Error:
RESTART: C:/Users/toliver/AppData/Local/Programs/Python/Python36/WOGCC/WOGCC_File_Downloads delete.py Traceback (most recent call last): File "C:/Users/toliver/AppData/Local/Programs/Python/Python36/WOGCC/WOGCC_File_Downloads delete.py", line 71, in <module> GetCompletions('api.txt') File "C:/Users/toliver/AppData/Local/Programs/Python/Python36/WOGCC/WOGCC_File_Downloads delete.py", line 22, in __init__ self.parse_and_save(getpdfs=True) File "C:/Users/toliver/AppData/Local/Programs/Python/Python36/WOGCC/WOGCC_File_Downloads delete.py", line 34, in parse_and_save for file in filelist: NameError: name 'filelist' is not defined
Thoughts? Ideas?

As always - your help is greatly appreciated!
Reply
#3
It should be like this (as in original code)
self.textpath = self.homepath / 'text'
self.textpath.mkdir(exist_ok=True)
Reply
#4
That is what it says now. I don't understand the extra spaces above, they aren't that way in my ,py file.
Reply
#5
Whatever editor you are using is not showing tabs. You should always use spaces, that way
when you use a bad editor it will show the white space.

When copying code, you need to cut and paste, not retype.
That way you know what you're getting.

I suggest you go back to the code (under 'new code' here: https://python-forum.io/Thread-EOL-While...ral?page=4

select the code by double clicking in the code window (on any line of code)
Type ctrl-c to copy the code, and ctrl-V into a new PyCharm window.

Save the code.

with curson in PyCharm window, from meny, click run-->run-->your program name
It should run without error (Of course you must be connected to internet)

If copied and pasted correctly, this code works.
Reply
#6
Ok - when I run it - I get : Fetching main page for entry "api#". It gets to the end then I get this error.

Error:
Traceback (most recent call last): File "C:\Python Tutorials\LARS with API.py", line 97, in <module> GetCompletions('apis.txt') File "C:\Python Tutorials\LARS with API.py", line 26, in __init__ self.parse_and_save(getpdfs=True) File "C:\Python Tutorials\LARS with API.py", line 47, in parse_and_save soup = BeautifulSoup(f.read(), 'lxml') File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\lib\site-packages\bs4\__init__.py", line 165, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? >>>
Reply
#7
(May-23-2018, 02:28 PM)tjnichols Wrote: AttributeError: 'GetCompletions' object has no attribute 'text'

That error tells you everything you need to know.
(May-23-2018, 02:28 PM)tjnichols Wrote:
        self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)
What do you think self.text is? The answer is nothing, since you never set it. And because it's nothing, it definitely doesn't have a .mkdir method. The error is letting you know that you're trying to use a thing that doesn't exist. So the solution depends entirely on what you're expecting to happen.
Reply
#8
(May-23-2018, 06:01 PM)tjnichols Wrote: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
Again, the error tells you everything you need to know. You're telling BeautifulSoup to use the lxml parser, but you don't have that installed. So, install it pip install lxml
Reply
#9
Ok - scratch the last message. I got ahead of myself. All of the the addins - I'm doing that now. As well as running it in PyCharm. Also - doing that now.

This is done! Thanks nilamo!
(May-23-2018, 06:05 PM)nilamo Wrote:
(May-23-2018, 06:01 PM)tjnichols Wrote: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
Again, the error tells you everything you need to know. You're telling BeautifulSoup to use the lxml parser, but you don't have that installed. So, install it pip install lxml
Reply
#10
Larz60+ - when I go to the link - it gives me a 404 error. Can you repost it?

Thanks again!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Traceback error tjnichols 3 5,297 Sep-11-2018, 07:04 PM
Last Post: tjnichols
  Traceback error tjnichols 3 3,528 Sep-05-2018, 06:11 PM
Last Post: tjnichols
  Traceback error - I don't get it tjnichols 2 3,060 May-24-2018, 08:10 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020