Python Forum

Pages: 1 2 3 4

I apologize. Your are running from the Python command line, that is what caused the error. I meant the cmd.exe (or Powershell) command line if you are running Windows.

If you don't know how, see https://www.howtogeek.com/235101/10-ways...indows-10/
For additional help try a search on 'navigating cmd.exe'

You MUST be in the same folder as your source file, otherwise Python will not find your file. I am out of time now, but if you need additional help and do not get it, I will respond tomorrow afternoon.

Lewis

Ok - Lewis - I really appreciate your time! I've posted my current code below. Above it is the error from the Administrator Command Prompt.

This went to line 52. Does this mean lines 1 - 51 worked? Pray

Error:c:\Users\toliver\AppData\Local\Programs\Python\Python36>python WOGCC_File_Downloads.py
  File "WOGCC_File_Downloads.py, line 52
     filename = self.log_path / 'comp{}'.format(url[p 1:])

There is a caret character under the 1.

import requests
from bs4 import BeautifulSoup
from pathlib import Path

class GetCompletions:
    def __init__(self, infile):
        """Above will create a folder called comppdf, and wsgeo wherever the WOGCC
           File Downloads file is run from as well as a text file for my api file to
           reside.
        """
        self.homepath = Path('.')
        self.log.pdfpath = self.homepath / 'comppdf'
        self.log.pdfpath.mkdir(exist_ok=True)
        self.log.pdfpath = self.homepath / 'geocorepdf'
        self.log.pdfpath.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)

        self.infile = self.textpath / infile
        self.api = []

        self.parse_and_save(getpdfs=True)



    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10]))
            yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10]))

        """Above will get the URL that matches my API numbers."""

    def parse_and_save(self, getpdfs=False):
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url in link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                          try:
                              header_info = response.headers['Content-Disposition']
                              idx = header_info.index('filename')
                              filename = self.log_pdfpath / header[idx+9:]
                          except ValueError:
                              filename = self.log_path / 'comp{}'.format(url[p 1:])
                              print("couldn't locate filename for {} will use: {}".format(file, filename))
                          except KeyError
                              filename = [b]self.log_pdfpath[/b] / 'comp{}.pdf'.format(url[p + 1:])
                              print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers))
                              print('will use name: {}'.format(filename))
                              print(repsonse.headers)
                          with filename.open('wb') as f:
                              f.write(respnse.content)

            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.txt
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text)

if __name__ == '__main__':
    GetCompletions('api.txt')

filename = self.log_path / 'comp{}'.format(url[p 1:])

There is no self.log_path, it's self.log_pdfpath

Thank you Lars60+! I've made the change but I'm still getting the same error message. Should I be looking for something else?

I appreciate your time and thank you again!

The whole code block from line 47 to line 61 is indented six spaces, not four like everything else.

I just downloaded your modified code. It has many many issues. Missing characters, incorrectly spelled names, missing colons, bad indentations... more.

You have tried to modify path names without being careful.
In the following code, you create a path comppdf, but then totally overwrite the definition with geocorepdf
without any cleanup.

        self.log.pdfpath = self.homepath / 'comppdf'
        self.log.pdfpath.mkdir(exist_ok=True)
        self.log.pdfpath = self.homepath / 'geocorepdf'
        self.log.pdfpath.mkdir(exist_ok=True)

What have you done to the initial code I sent you?

I.m not saying that you shouldn't experiment, but perhaps taake it one step at a tme,
test and clean up properly.

Perhaps it would be best to start from the original.

(May-14-2018, 10:34 PM)tjnichols Wrote: [ -> ]
Error:c:\Users\toliver\AppData\Local\Programs\Python\Python36>python WOGCC_File_Downloads.py
  File "WOGCC_File_Downloads.py, line 52
     filename = self.log_path / 'comp{}'.format(url[p 1:])
There is a caret character under the 1.

Well, that's invalid syntax. What are you trying to do, when you put url[p 1:]? Is the p a typo, and you want all but the first item of the string? Or is the space the typo, and the variable name is p1? Or is p a number, and you're trying to modify it some way, such as p + 1?

Whatever your intention is, just p 1 is invalid, and throws an error because it doesn't make sense.

Lars60+ - I appreciate your help! Ok here is the code we started with. What I am trying to do is get the completions reports and anything on the Cores / Pressures / Reports link. I was simply trying to
get rid of some of the things I don't need and add the things I do.

For example: these things are great to have but they are already in my database. There is just no way for me to use this. self.apis.append(line.strip())

self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date',
'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls']

Thank you for your help!

Wavic - thank you - I will look and see where I need to make my changes! Thank you for your support!

nilamo - Thank you for your assistance! I need to look at this more closely. It is probably something I messed up when I was 'refurbishing'. I will look at it and get back to you! Thanks for your input!

import requests
from bs4 import BeautifulSoup
from pathlib import Path
import sys
 
class GetCompletions:
    def __init__(self, infile):
        self.homepath = Path('.')
        self.completionspath = self.homepath / 'xx_completions_xx'
        self.completionspath.mkdir(exist_ok=True)
        self.log_pdfpath = self.homepath / 'logpdfs'
        self.log_pdfpath.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.textpath.mkdir(exist_ok=True)
 
        self.infile = self.textpath / infile
        self.apis = []
 
        with self.infile.open() as f:
            for line in f:
                self.apis.append(line.strip())
 
        self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date',
                       'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls']
        self.get_all_pages()
        self.parse_and_save(getpdfs=True)
 
    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI={}".format(entry[3:10]))
 
    def get_all_pages(self):
        for entry, url in self.get_url():
            print('Fetching main page for entry: {}'.format(entry))
            response = requests.get(url)
            if response.status_code == 200:
                filename = self.completionspath / 'api_{}.html'.format(entry)
                with filename.open('w') as f:
                    f.write(response.text)
            else:
                print('error downloading {}'.format(entry))
 
    def parse_and_save(self, getpdfs=False):
        filelist = [file for file in self.completionspath.iterdir() if file.is_file()]
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url = link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:])
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                        with filename.open('wb') as f:
                            f.write(response.content)
            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.text:
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text))
 

if __name__ == '__main__':
    GetCompletions('apis.txt')

There's nothing wrong with making changes, it's the best way to learn.
I learned very early on (probably in late 1960's) when making changes, no matter how sure that you can do a dozen at a time,
only make one change, test to make sure it works as expected, and repeat until all twelve are done.
It will take less time in the long run, and will give you that warm fuzzy feeling that all is well!

Ok - so does this mean that 1 - 51 of the lines worked and it choked on line 52? I really hope this is true but I think its rather irrational.

Pages: 1 2 3 4

ljmetzger

tjnichols

Larz60+

tjnichols

wavic

Larz60+

nilamo

tjnichols

Larz60+

tjnichols