Yes I do! I just feel so rushed to understand it. The whole goal here is to populate a database en mass. I started this journey thinking I could learn this with one module on "Code Academy" and I would be good to go. Well that was ok - it was kind of like dusting off a great big book. Then I tried "StackSkills". More dusting. Then I started getting books. That was like drinking water from a firehose. At least the ones I got from David Bealey helped me understand the concept of generators.
I think it will be better after I can get over this hump. There are other things I need to do with Python but not nearly as pressing.
The reason I've gotten some of the errors I have is I thought I knew what my issue was. I retyped it thinking I would get everything not realizing I would blow it up!
I do appreciate your patience though. All of you have been awesome!
Here's the latest issue and my understanding (however limited)...
Line 71 - I have the file 'api.txt' here "C:\Users\toliver\AppData\Local\Programs\Python\Python36\text\" previous discussions with Lars60+ have pointed me in this direction. Well at least I think they have.
I went and checked this with the origional file with the following and it matches what we started with:
self.textpath = self.homepath / 'text'
self.textpath.mkdir(exist_ok=True)
Again - thank you for your help, time and patience!
Lars60+ - Ok - I downloaded yours and it was awesome not to have no errors! I didn't have any results either though.
The goal with my 'refurbishing' of yours is to be able to download both completion reports as well as everything on the Cores/Pressures/Cores link.
Is there any way we can make what I have (yours and mine combined) work?
This is informative and valuable but I already have it in my database. Is there a reason to keep it beyond that?
I think it will be better after I can get over this hump. There are other things I need to do with Python but not nearly as pressing.
The reason I've gotten some of the errors I have is I thought I knew what my issue was. I retyped it thinking I would get everything not realizing I would blow it up!
I do appreciate your patience though. All of you have been awesome!
Here's the latest issue and my understanding (however limited)...
Line 71 - I have the file 'api.txt' here "C:\Users\toliver\AppData\Local\Programs\Python\Python36\text\" previous discussions with Lars60+ have pointed me in this direction. Well at least I think they have.
I went and checked this with the origional file with the following and it matches what we started with:
self.textpath = self.homepath / 'text'
self.textpath.mkdir(exist_ok=True)
Again - thank you for your help, time and patience!
Error: RESTART: C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC_File_Downloads.py
Traceback (most recent call last):
File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC_File_Downloads.py", line 71, in <module>
GetCompletions('api.txt')
File "C:\Users\toliver\AppData\Local\Programs\Python\Python36\WOGCC_File_Downloads.py", line 17, in __init__
self.text.mkdir(exist_ok=True)
AttributeError: 'GetCompletions' object has no attribute 'text'
import requests from bs4 import BeautifulSoup from pathlib import Path class GetCompletions: def __init__(self, infile): """Above will create a folder called comppdf, and geocorepdf wherever the WOGCC File Downloads file is run from as well as a text file for my api file to reside. """ self.homepath = Path('.') self.log_pdfpath = self.homepath / 'comppdf' self.log_pdfpath.mkdir(exist_ok=True) self.log_pdfpath = self.homepath / 'geocorepdf' self.log_pdfpath.mkdir(exist_ok=True) self.textpath = self.homepath / 'text' self.text.mkdir(exist_ok=True) self.infile = self.textpath / infile self.api = [] self.parse_and_save(getpdfs=True) def get_url(self): for entry in self.apis: yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10])) yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10])) """Above will get the URL that matches my API numbers.""" def parse_and_save(self, getpdfs=False): for file in filelist: with file.open('r') as f: soup = BeautifulSoup(f.read(), 'lxml') if getpdfs: links = soup.find_all('a') for link in links: url in link['href'] if 'www' in url: continue print('downloading pdf at: {}'.format(url)) p = url.index('=') response = requests.get(url, stream=True, allow_redirects=False) if response.status_code == 200: try: header_info = response.headers['Content-Disposition'] idx = header_info.index('filename') filename = self.log_pdfpath / header[idx+9:] except ValueError: filename = self.log_pdfpath / 'comp{}'.format(url[p+1:]) print("couldn't locate filename for {} will use: {}".format(file, filename)) except KeyError: filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:]) print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers)) print('will use name: {}'.format(filename)) print(repsonse.headers) with filename.open('wb') as f: f.write(respnse.content) sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10]) tds = soup.find_all('td') with sfname.open('w') as f: for td in tds: if td.text: if any(field in td.text for field in self.fields): f.write('{}\n'.format(td.text)) if __name__ == '__main__': GetCompletions('api.txt')
Lars60+ - Ok - I downloaded yours and it was awesome not to have no errors! I didn't have any results either though.
The goal with my 'refurbishing' of yours is to be able to download both completion reports as well as everything on the Cores/Pressures/Cores link.
Is there any way we can make what I have (yours and mine combined) work?
This is informative and valuable but I already have it in my database. Is there a reason to keep it beyond that?
self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date', 'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls']