Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unexpected indent / invalid syntax
#31
Well, you create GetCompletions but 'log' is not an instance of some other class is not inherited it is not imported so where it comes from?
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#32
(May-16-2018, 03:13 PM)Larz60+ Wrote: Now I've got a traceback error.
@tjnichols you most read previous post bye @Lars60+ #13.
Quote:There is no self.log_path, it's self.log_pdfpath
You have manged to change _ to .
The code is also correct in your first big post #11

You really struggle with basic Python understating.
That's okay we have all been there,
but try to read(better) or just copy code that have been posted before.
I guess that @Lars60+ has tested code and that it work.
Reply
#33
I mentioned this before:
        self.log.pdfpath = self.homepath / 'comppdf'
        self.log.pdfpath.mkdir(exist_ok=True)
        self.log.pdfpath = self.homepath / 'geocorepdf'
        self.log.pdfpath.mkdir(exist_ok=True)
You are overwriting self.log.pdfpath
This was perfectly ok in the original code.
Again, especially with the plethora of errors (all caused since original was severely modified), perhaps it's time to go back to the original.
In case you've lost it, here is a copy (which I just ran without a hitch)
Just remember you must have the apis.text file in the text directory, and it contains the api numbers for the documents you want
I am attaching the original file (in order not to reload tha same info, clean the file out and start with new numbers or it will download the original) This file directs the software, It will download whatever is specified in this file, nothing more, nothing less
import requests
from bs4 import BeautifulSoup
from pathlib import Path
import sys

class GetCompletions:
    def __init__(self, infile):
        self.homepath = Path('.')
        self.completionspath = self.homepath / 'xx_completions_xx'
        self.completionspath.mkdir(exist_ok=True)
        self.log_pdfpath = self.homepath / 'logpdfs'
        self.log_pdfpath.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.textpath.mkdir(exist_ok=True)

        self.infile = self.textpath / infile
        self.apis = []

        with self.infile.open() as f:
            for line in f:
                self.apis.append(line.strip())

        self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date',
                       'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls']
        # self.get_all_pages()
        self.parse_and_save(getpdfs=True)

    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI={}".format(entry[3:10]))

    def get_all_pages(self):
        for entry, url in self.get_url():
            print('Fetching main page for entry: {}'.format(entry))
            response = requests.get(url)
            if response.status_code == 200:
                filename = self.completionspath / 'api_{}.html'.format(entry)
                with filename.open('w') as f:
                    f.write(response.text)
            else:
                print('error downloading {}'.format(entry))

    def parse_and_save(self, getpdfs=False):
        filelist = [file for file in self.completionspath.iterdir() if file.is_file()]
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url = link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                        try:
                            header_info = response.headers['Content-Disposition']
                            idx = header_info.index('filename')
                            filename = self.log_pdfpath / header_info[idx+9:]
                        except ValueError:
                            filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:])
                            print("couldn't locate filename for {} will use: {}".format(file, filename))
                        except KeyError:
                            filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:])
                            print('got KeyError on {}, response.headers = {}'.format(file, response.headers))
                            print('will use name: {}'.format(filename))
                            print(response.headers)
                        with filename.open('wb') as f:
                            f.write(response.content)
            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.text:
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text))

if __name__ == '__main__':
    GetCompletions('apis.txt')
Reply
#34
Yes I do! I just feel so rushed to understand it. The whole goal here is to populate a database en mass. I started this journey thinking I could learn this with one module on "Code Academy" and I would be good to go. Well that was ok - it was kind of like dusting off a great big book. Then I tried "StackSkills". More dusting. Then I started getting books. That was like drinking water from a firehose. At least the ones I got from David Bealey helped me understand the concept of generators.

I think it will be better after I can get over this hump. There are other things I need to do with Python but not nearly as pressing.

The reason I've gotten some of the errors I have is I thought I knew what my issue was. I retyped it thinking I would get everything not realizing I would blow it up!

I do appreciate your patience though. All of you have been awesome!

Here's the latest issue and my understanding (however limited)...

Line 71 - I have the file 'api.txt' here "C:\Users\toliver\AppData\Local\Programs\Python\Python36\text\" previous discussions with Lars60+ have pointed me in this direction. Well at least I think they have.

I went and checked this with the origional file with the following and it matches what we started with:

self.textpath = self.homepath / 'text'
self.textpath.mkdir(exist_ok=True)


Again - thank you for your help, time and patience!

Error:
RESTART: C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC_File_Downloads.py Traceback (most recent call last): File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC_File_Downloads.py", line 71, in <module> GetCompletions('api.txt') File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC_File_Downloads.py", line 17, in __init__ self.text.mkdir(exist_ok=True) AttributeError: 'GetCompletions' object has no attribute 'text'
import requests
from bs4 import BeautifulSoup
from pathlib import Path

class GetCompletions:
    def __init__(self, infile):
        """Above will create a folder called comppdf, and geocorepdf wherever the WOGCC
           File Downloads file is run from as well as a text file for my api file to
           reside.
        """
        self.homepath = Path('.')
        self.log_pdfpath = self.homepath / 'comppdf'
        self.log_pdfpath.mkdir(exist_ok=True)
        self.log_pdfpath = self.homepath / 'geocorepdf'
        self.log_pdfpath.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)

        self.infile = self.textpath / infile
        self.api = []

        self.parse_and_save(getpdfs=True)



    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10]))
            yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10]))

        """Above will get the URL that matches my API numbers."""

    def parse_and_save(self, getpdfs=False):
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url in link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                        try:
                            header_info = response.headers['Content-Disposition']
                            idx = header_info.index('filename')
                            filename = self.log_pdfpath / header[idx+9:]
                        except ValueError:
                            filename = self.log_pdfpath / 'comp{}'.format(url[p+1:])
                            print("couldn't locate filename for {} will use: {}".format(file, filename))
                        except KeyError:
                            filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:])
                            print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers))
                            print('will use name: {}'.format(filename))
                            print(repsonse.headers)
                        with filename.open('wb') as f:
                            f.write(respnse.content)

            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.text:
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text))

if __name__ == '__main__':
    GetCompletions('api.txt')

Lars60+ - Ok - I downloaded yours and it was awesome not to have no errors! I didn't have any results either though.

The goal with my 'refurbishing' of yours is to be able to download both completion reports as well as everything on the Cores/Pressures/Cores link.

Is there any way we can make what I have (yours and mine combined) work?

This is informative and valuable but I already have it in my database. Is there a reason to keep it beyond that?

        self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date',
                       'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls']
Reply
#35
If you will go back and start in a brand new directory, with the original code, I'd be glad to walk you through any changes step by step.
Get original code here: https://python-forum.io/Thread-Unexpecte...2#pid47182
I will be in and out today (just a few hours between sessions), but if you are willing to do this, I will help to make any changes you wish, and explain each line of code if so desired.

Do this, and before running and code, make sure the following are done:

Output:
1. [b]Your directory structure ar start[/b] OilWellCompletions src OilWellCompletions.py text apis.txt
Here's a copy of the original apis.txt file:

I will be back in about 2 to 3 hours

Attached Files

.txt   apis.txt (Size: 1.56 KB / Downloads: 194)
Reply
#36
You are using an object called 'file' which can't exist because of it's captured by the GC, because of after the with statement on line 35 there is only one line of code. Yet you are using it as if it is a string on line 62 and this string has an attribute called name?

Want to populate a database? Go with SQLAlchemy. It's the right tool for that.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#37
Lars60+ - can we keep editing the one we've been working on? I've thought about going back to yours but there has been a lot work (not only between you and I but others as well) put into this. I don't want anyone to feel like their work / time has been for nothing.

I would still like you to participate if you are willing. After all, you are the one who got me started with this! I do appreciate your time, effort and help!

Thank you!
Reply
#38
wavic - This creates a file in a 'text' file called 'summary[last 8 #'s of the api.txt]' The files don't have anything in them so I'm not sure what they do. It's probably something else in the code.

I don't know SQLAcademy and I'm afraid to say I'll learn it when I'm just struggling with Python.

I appreciate the thought though. Most of all, I appreciate your help!

Thanks!
Reply
#39
All - if you're game - I would like to continue with the collaborative effort! I really appreciate the time and energy everyone has given up until now. I think we are at the tail end of this!

I've made the change Lars60+ suggested when he said I was overwriting my file so here is the latest...

Error:
RESTART: C:/Users/toliver/AppData/Local/Programs/Python/Python36/OilWellCompletions/OilWellCompletions.py Traceback (most recent call last): File "C:/Users/toliver/AppData/Local/Programs/Python/Python36/OilWellCompletions/OilWellCompletions.py", line 81, in <module> GetCompletions('apis.txt') File "C:/Users/toliver/AppData/Local/Programs/Python/Python36/OilWellCompletions/OilWellCompletions.py", line 19, in __init__ with self.infile.open() as f: File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\lib\pathlib.py", line 1181, in open opener=self._opener) File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\lib\pathlib.py", line 1035, in _opener return self._accessor.open(self, flags, mode) File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\lib\pathlib.py", line 387, in wrapped return strfunc(str(pathobj), *args) FileNotFoundError: [Errno 2] No such file or directory: 'text\\apis.txt'
This is what I've found so far on the error - "Obviously, based on the error message, mkdir returns None." I will keep looking.

import requests
from bs4 import BeautifulSoup
from pathlib import Path
import sys
 
class GetCompletions:
    def __init__(self, infile):
        self.homepath = Path('.')
        self.completionspath = self.homepath / 'xx_completions_xx'
        self.completionspath.mkdir(exist_ok=True)
        self.log_pdfpath = self.homepath / 'logpdfs'
        self.log_pdfpath.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.textpath.mkdir(exist_ok=True)
 
        self.infile = self.textpath / infile
        self.apis = []
 
        with self.infile.open() as f:
            for line in f:
                self.apis.append(line.strip())
 
        self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date',
                       'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls']
        # self.get_all_pages()
        self.parse_and_save(getpdfs=True)
 
    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI={}".format(entry[3:10]))
 
    def get_all_pages(self):
        for entry, url in self.get_url():
            print('Fetching main page for entry: {}'.format(entry))
            response = requests.get(url)
            if response.status_code == 200:
                filename = self.completionspath / 'api_{}.html'.format(entry)
                with filename.open('w') as f:
                    f.write(response.text)
            else:
                print('error downloading {}'.format(entry))
 
    def parse_and_save(self, getpdfs=False):
        filelist = [file for file in self.completionspath.iterdir() if file.is_file()]
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url = link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                        try:
                            header_info = response.headers['Content-Disposition']
                            idx = header_info.index('filename')
                            filename = self.log_pdfpath / header_info[idx+9:]
                        except ValueError:
                            filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:])
                            print("couldn't locate filename for {} will use: {}".format(file, filename))
                        except KeyError:
                            filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:])
                            print('got KeyError on {}, response.headers = {}'.format(file, response.headers))
                            print('will use name: {}'.format(filename))
                            print(response.headers)
                        with filename.open('wb') as f:
                            f.write(response.content)
            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.text:
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text))
 
if __name__ == '__main__':
    GetCompletions('apis.txt')
Thanks again!

Alright, I think we may have a better idea of how to do this. For now, lets forgo the idea of fixing this one. More later! Thank you!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  IndentationError: unexpected indent in views.py ift38375 1 2,503 Dec-08-2019, 02:33 PM
Last Post: michael1789
  IndentationError: unexpected indent salahhadjar 2 4,347 Nov-04-2018, 06:10 PM
Last Post: salahhadjar

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020