Python Forum
Traceback error - I don't get it - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Traceback error - I don't get it (/thread-10549.html)



Traceback error - I don't get it - tjnichols - May-24-2018

This is the code I'm trying to run. It's very straight forward. Line 70 is looking for an 'api.txt' file that is in a 'text' folder - line 17 is actually created in line 16. It is created, I've actually done before I run the module so it has the api.txt to pull data from.

Here is the module:

import requests
from bs4 import BeautifulSoup
from pathlib import Path

class GetCompletions:
    def __init__(self, infile):
        """Above will create a folder called comppdf, and geocorepdf wherever the WOGCC
           File Downloads file is run from as well as a text file for my api file to
           reside.
        """
        self.homepath = Path('.')
        self.completionspath = self.homepath / 'comppdf'
        self.completionspath.mkdir(exist_ok=True)
        self.geocorepdf = self.homepath / 'geocorepdf'
        self.geocorepdf.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.textpath.mkdir(exist_ok=True)

        self.infile = self.textpath / infile
        self.api = []


    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10]))
            yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10]))

        """Above will get the URL that matches my API numbers."""

    def parse_and_save(self, getpdfs=False):
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url in link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                        try:
                            header_info = response.headers['Content-Disposition']
                            idx = header_info.index('filename')
                            filename = self.log_pdfpath / header[idx+9:]
                        except ValueError:
                            filename = self.log_pdfpath / 'comp{}'.format(url[p+1:])
                            print("couldn't locate filename for {} will use: {}".format(file, filename))
                        except KeyError:
                            filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:])
                            print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers))
                            print('will use name: {}'.format(filename))
                            print(repsonse.headers)
                        with filename.open('wb') as f:
                            f.write(respnse.content)

                self.parse_and_save(getpdfs=True)

            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.text:
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text))

if __name__ == '__main__':
    GetCompletions('api.txt')
And this is the error:
Error:
RESTART: C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC_File_Downloadstest.py Traceback (most recent call last): File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC_File_Downloadstest.py", line 70, in <module> GetCompletions('api.txt') File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC_File_Downloadstest.py", line 17, in __init__ self.text.mkdir(exist_ok=True) AttributeError: 'GetCompletions' object has no attribute 'text'
It's been suggested that I use the initial file Larz60+ helped me create. The problem is it only covers one of the reports I need to gather. If I use it, I will have to run another version to go over the same wells and get the other reports I need. I really need to do them both at the same time. This is why I've tried to re-create the one he did.

When I run this in PyCharm, it comes back with no errors.

As always - any help you can provide will be most appreciated!


RE: Traceback error - I don't get it - buran - May-24-2018

Obviously this is not the code that produce THAT exact error.
line 17 in your code is
self.textpath.mkdir(exist_ok=True)
, while the traceback says it is self.text.mkdir(exist_ok=True) and your class GetCompletions has no attribute 'text' indeed...
you are running different code (WOGCC_File_Downloadstest.py) from the one you think/post here


RE: Traceback error - I don't get it - Larz60+ - May-24-2018

There are several threads on the same subject making it impossible to help properly.
We have dueling moderators and dueling code versions!