Python Forum
Traceback error - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Traceback error (/thread-10502.html)

Pages: 1 2 3 4 5 6


RE: Traceback error - buran - May-23-2018

I think this is it https://python-forum.io/Thread-EOL-While-Scanning-String-Literal?page=4


RE: Traceback error - tjnichols - May-23-2018

Ok - I like the idea of going back to the Larz60+ file. The problem is it only gets me half of what I need.

        self.homepath = Path('.')
        self.completionspath = self.homepath / 'comppdf'
        self.completionspath.mkdir(exist_ok=True)
        self.geocorepdf = self.homepath / 'geocorepdf'
        self.geocorepdf.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)
While this is only creating folders - it will hold completion reports for oil and gas wells. For example: http://wogcc.state.wy.us/legacywogcce.cfm and click on 'Wells', then click on 'By API Number' and then enter 2521203 in the space provided. Up on the top right are 'Completions' and 'Cores/Pressures/Reports'. These are the two places I need reports from.

Larz60+'s file is the one that got me where I am right now. I'm new to this and he has helped me more than most can possibly imagine. That said - this is where we are.

import requests
from bs4 import BeautifulSoup
from pathlib import Path

class GetCompletions:
    def __init__(self, infile):
        """Above will create a folder called comppdf, and geocorepdf wherever the WOGCC
           File Downloads file is run from as well as a text file for my api file to
           reside.
        """
        self.homepath = Path('.')
        self.completionspath = self.homepath / 'comppdf'
        self.completionspath.mkdir(exist_ok=True)
        self.geocorepdf = self.homepath / 'geocorepdf'
        self.geocorepdf.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.textpath.mkdir(exist_ok=True)

        self.infile = self.textpath / infile
        self.api = []

        self.parse_and_save(getpdfs=True)



    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10]))
            yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10]))

        """Above will get the URL that matches my API numbers."""

    def parse_and_save(self, getpdfs=False):
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url in link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                        try:
                            header_info = response.headers['Content-Disposition']
                            idx = header_info.index('filename')
                            filename = self.log_pdfpath / header[idx+9:]
                        except ValueError:
                            filename = self.log_pdfpath / 'comp{}'.format(url[p+1:])
                            print("couldn't locate filename for {} will use: {}".format(file, filename))
                        except KeyError:
                            filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:])
                            print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers))
                            print('will use name: {}'.format(filename))
                            print(repsonse.headers)
                        with filename.open('wb') as f:
                            f.write(respnse.content)

                self.parse_and_save(getpdfs=True)

            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.text:
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text))

if __name__ == '__main__':
    GetCompletions('api.txt')
Ok - errors help me too!

Error:
RESTART: C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads test 2.py Traceback (most recent call last): File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads test 2.py", line 73, in <module> GetCompletions('api.txt') File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads test 2.py", line 22, in __init__ self.parse_and_save(getpdfs=True) File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC\WOGCC_File_Downloads test 2.py", line 34, in parse_and_save for file in filelist: NameError: name 'filelist' is not defined
I hope this all makes sense. I really appreciate your help!


RE: Traceback error - Larz60+ - May-23-2018

Here's that URL again, https://python-forum.io/Thread-EOL-While-Scanning-String-Literal?pid=45974#pid45974
Take the code under New code.

Double-click on code
Ctrl-c --> Copy
PyCharm (your project)
right-click on src directory label (left window)
choose new-->PythonFile whatever name you wish (do not add .py, it will do it for you)
Ctrl-v --> Paste
File --> save-all
with cursor in code window:
run --> run --> program name from above (with internet access available)
It's still going to expect an apis.txt file in the text directory (directly under src directory (same as original setup))
It will create other needed directories as required.
It will run and run properly.

Just remember that whatever api numbers you include in the apis.txt file will be downloaded each and every time.
To prevent this, rename apis.txt to apisMay23-2018.txt or some such name and start a brand new apis.txt file with the values you want to get.


RE: Traceback error - tjnichols - May-24-2018

Ok Lars60+ - I did the instructions above. I ran it as you suggested in PyCharm this is the error I received:

Error:
C:\Users\toliver\PycharmProject\Python3\StandardByExample\Chapter1\venv\Scripts\python.exe C:/Users/toliver/PycharmProject/Python3/Downloads/CompletionReports.py Traceback (most recent call last): File "C:/Users/toliver/PycharmProject/Python3/Downloads/CompletionReports.py", line 1, in <module> import requests ModuleNotFoundError: No module named 'requests' Process finished with exit code 1
When I run it with IDLE it creates the folders but returns nothing.I still need to get the additional reports though.

Thanks for your help!!


RE: Traceback error - Larz60+ - May-24-2018

pip install requests



RE: Traceback error - tjnichols - May-24-2018

Yes - I saw that error too. It says the request has already been satisfied.


RE: Traceback error - Larz60+ - May-24-2018

the error was: ModuleNotFoundError: No module named 'requests'
clearly the version of Python being used does not have a requests module.

in PyCharm, with cursor in code you wish to execute, click Run --> Edit Configurations

look at Python interpreter ... make sure it's the right one, if not select proper one and update,
otherwise just cancel, also look at working directory to make sure that's correct

finally, from cmder, type pip -V and make sure right version of pip is being used


RE: Traceback error - tjnichols - May-24-2018

Ok - that was cool! No errors but no results either.

When you say check the version of pip - I did that and it lists the things you can run with pip (well that's what I think it is). It doesn't give me an error or anything.

It does say this whenever I install something using pip:

Quote:You are using pip version 9.0.3, however version 10.0.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

Should I upgrade? The last time I did, it caused all kinds of havoc so I'm reluctant to do it now.

Thanks for your help!


RE: Traceback error - Larz60+ - May-24-2018

this is what you should see:
Output:
λ pip -V pip 10.0.1 from c:\python365\lib\site-packages\pip (python 3.6)
with of course your version numbers/

I am off to do some moving, I have a solid move date (from the moving company) of May 31.
So as of June1 If internet hooked up properly, I should be back to a (fairly) normal schedule


RE: Traceback error - tjnichols - May-24-2018

Ok - this is what I have:

Quote:λ pip -V
pip 9.0.3 from c:\users\toliver\appdata\local\programs\python\python36\lib\site-packages (python 3.6)

Does this mean you won't be back on here until June 1?

Quote:λ pip -V
pip 10.0.1 from c:\users\toliver\appdata\local\programs\python\python36\lib\site-packages\pip (python 3.6)

I did the upgrade.