May-16-2018, 03:26 PM
Well, you create GetCompletions but 'log' is not an instance of some other class is not inherited it is not imported so where it comes from?
Unexpected indent / invalid syntax
|
May-16-2018, 03:26 PM
Well, you create GetCompletions but 'log' is not an instance of some other class is not inherited it is not imported so where it comes from?
(May-16-2018, 03:13 PM)Larz60+ Wrote: Now I've got a traceback error.@tjnichols you most read previous post bye @Lars60+ #13. Quote:There is no self.log_path, it's self.log_pdfpathYou have manged to change _ to . The code is also correct in your first big post #11 You really struggle with basic Python understating. That's okay we have all been there, but try to read(better) or just copy code that have been posted before. I guess that @Lars60+ has tested code and that it work.
May-16-2018, 05:03 PM
I mentioned this before:
self.log.pdfpath = self.homepath / 'comppdf' self.log.pdfpath.mkdir(exist_ok=True) self.log.pdfpath = self.homepath / 'geocorepdf' self.log.pdfpath.mkdir(exist_ok=True)You are overwriting self.log.pdfpath This was perfectly ok in the original code. Again, especially with the plethora of errors (all caused since original was severely modified), perhaps it's time to go back to the original. In case you've lost it, here is a copy (which I just ran without a hitch) Just remember you must have the apis.text file in the text directory, and it contains the api numbers for the documents you want I am attaching the original file (in order not to reload tha same info, clean the file out and start with new numbers or it will download the original) This file directs the software, It will download whatever is specified in this file, nothing more, nothing less import requests from bs4 import BeautifulSoup from pathlib import Path import sys class GetCompletions: def __init__(self, infile): self.homepath = Path('.') self.completionspath = self.homepath / 'xx_completions_xx' self.completionspath.mkdir(exist_ok=True) self.log_pdfpath = self.homepath / 'logpdfs' self.log_pdfpath.mkdir(exist_ok=True) self.textpath = self.homepath / 'text' self.textpath.mkdir(exist_ok=True) self.infile = self.textpath / infile self.apis = [] with self.infile.open() as f: for line in f: self.apis.append(line.strip()) self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date', 'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls'] # self.get_all_pages() self.parse_and_save(getpdfs=True) def get_url(self): for entry in self.apis: yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI={}".format(entry[3:10])) def get_all_pages(self): for entry, url in self.get_url(): print('Fetching main page for entry: {}'.format(entry)) response = requests.get(url) if response.status_code == 200: filename = self.completionspath / 'api_{}.html'.format(entry) with filename.open('w') as f: f.write(response.text) else: print('error downloading {}'.format(entry)) def parse_and_save(self, getpdfs=False): filelist = [file for file in self.completionspath.iterdir() if file.is_file()] for file in filelist: with file.open('r') as f: soup = BeautifulSoup(f.read(), 'lxml') if getpdfs: links = soup.find_all('a') for link in links: url = link['href'] if 'www' in url: continue print('downloading pdf at: {}'.format(url)) p = url.index('=') response = requests.get(url, stream=True, allow_redirects=False) if response.status_code == 200: try: header_info = response.headers['Content-Disposition'] idx = header_info.index('filename') filename = self.log_pdfpath / header_info[idx+9:] except ValueError: filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:]) print("couldn't locate filename for {} will use: {}".format(file, filename)) except KeyError: filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:]) print('got KeyError on {}, response.headers = {}'.format(file, response.headers)) print('will use name: {}'.format(filename)) print(response.headers) with filename.open('wb') as f: f.write(response.content) sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10]) tds = soup.find_all('td') with sfname.open('w') as f: for td in tds: if td.text: if any(field in td.text for field in self.fields): f.write('{}\n'.format(td.text)) if __name__ == '__main__': GetCompletions('apis.txt')
Yes I do! I just feel so rushed to understand it. The whole goal here is to populate a database en mass. I started this journey thinking I could learn this with one module on "Code Academy" and I would be good to go. Well that was ok - it was kind of like dusting off a great big book. Then I tried "StackSkills". More dusting. Then I started getting books. That was like drinking water from a firehose. At least the ones I got from David Bealey helped me understand the concept of generators.
I think it will be better after I can get over this hump. There are other things I need to do with Python but not nearly as pressing. The reason I've gotten some of the errors I have is I thought I knew what my issue was. I retyped it thinking I would get everything not realizing I would blow it up! I do appreciate your patience though. All of you have been awesome! Here's the latest issue and my understanding (however limited)... Line 71 - I have the file 'api.txt' here "C:\Users\toliver\AppData\Local\Programs\Python\Python36\text\" previous discussions with Lars60+ have pointed me in this direction. Well at least I think they have. I went and checked this with the origional file with the following and it matches what we started with: self.textpath = self.homepath / 'text' self.textpath.mkdir(exist_ok=True) Again - thank you for your help, time and patience!
import requests from bs4 import BeautifulSoup from pathlib import Path class GetCompletions: def __init__(self, infile): """Above will create a folder called comppdf, and geocorepdf wherever the WOGCC File Downloads file is run from as well as a text file for my api file to reside. """ self.homepath = Path('.') self.log_pdfpath = self.homepath / 'comppdf' self.log_pdfpath.mkdir(exist_ok=True) self.log_pdfpath = self.homepath / 'geocorepdf' self.log_pdfpath.mkdir(exist_ok=True) self.textpath = self.homepath / 'text' self.text.mkdir(exist_ok=True) self.infile = self.textpath / infile self.api = [] self.parse_and_save(getpdfs=True) def get_url(self): for entry in self.apis: yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10])) yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10])) """Above will get the URL that matches my API numbers.""" def parse_and_save(self, getpdfs=False): for file in filelist: with file.open('r') as f: soup = BeautifulSoup(f.read(), 'lxml') if getpdfs: links = soup.find_all('a') for link in links: url in link['href'] if 'www' in url: continue print('downloading pdf at: {}'.format(url)) p = url.index('=') response = requests.get(url, stream=True, allow_redirects=False) if response.status_code == 200: try: header_info = response.headers['Content-Disposition'] idx = header_info.index('filename') filename = self.log_pdfpath / header[idx+9:] except ValueError: filename = self.log_pdfpath / 'comp{}'.format(url[p+1:]) print("couldn't locate filename for {} will use: {}".format(file, filename)) except KeyError: filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:]) print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers)) print('will use name: {}'.format(filename)) print(repsonse.headers) with filename.open('wb') as f: f.write(respnse.content) sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10]) tds = soup.find_all('td') with sfname.open('w') as f: for td in tds: if td.text: if any(field in td.text for field in self.fields): f.write('{}\n'.format(td.text)) if __name__ == '__main__': GetCompletions('api.txt') Lars60+ - Ok - I downloaded yours and it was awesome not to have no errors! I didn't have any results either though. The goal with my 'refurbishing' of yours is to be able to download both completion reports as well as everything on the Cores/Pressures/Cores link. Is there any way we can make what I have (yours and mine combined) work? This is informative and valuable but I already have it in my database. Is there a reason to keep it beyond that? self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date', 'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls']
If you will go back and start in a brand new directory, with the original code, I'd be glad to walk you through any changes step by step.
Get original code here: https://python-forum.io/Thread-Unexpecte...2#pid47182 I will be in and out today (just a few hours between sessions), but if you are willing to do this, I will help to make any changes you wish, and explain each line of code if so desired. Do this, and before running and code, make sure the following are done: Here's a copy of the original apis.txt file:I will be back in about 2 to 3 hours Attached Files
May-16-2018, 05:40 PM
You are using an object called 'file' which can't exist because of it's captured by the GC, because of after the with statement on line 35 there is only one line of code. Yet you are using it as if it is a string on line 62 and this string has an attribute called name?
Want to populate a database? Go with SQLAlchemy. It's the right tool for that.
Lars60+ - can we keep editing the one we've been working on? I've thought about going back to yours but there has been a lot work (not only between you and I but others as well) put into this. I don't want anyone to feel like their work / time has been for nothing.
I would still like you to participate if you are willing. After all, you are the one who got me started with this! I do appreciate your time, effort and help! Thank you!
May-16-2018, 06:45 PM
wavic - This creates a file in a 'text' file called 'summary[last 8 #'s of the api.txt]' The files don't have anything in them so I'm not sure what they do. It's probably something else in the code.
I don't know SQLAcademy and I'm afraid to say I'll learn it when I'm just struggling with Python. I appreciate the thought though. Most of all, I appreciate your help! Thanks!
All - if you're game - I would like to continue with the collaborative effort! I really appreciate the time and energy everyone has given up until now. I think we are at the tail end of this!
I've made the change Lars60+ suggested when he said I was overwriting my file so here is the latest... This is what I've found so far on the error - "Obviously, based on the error message, mkdir returns None." I will keep looking.import requests from bs4 import BeautifulSoup from pathlib import Path import sys class GetCompletions: def __init__(self, infile): self.homepath = Path('.') self.completionspath = self.homepath / 'xx_completions_xx' self.completionspath.mkdir(exist_ok=True) self.log_pdfpath = self.homepath / 'logpdfs' self.log_pdfpath.mkdir(exist_ok=True) self.textpath = self.homepath / 'text' self.textpath.mkdir(exist_ok=True) self.infile = self.textpath / infile self.apis = [] with self.infile.open() as f: for line in f: self.apis.append(line.strip()) self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date', 'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls'] # self.get_all_pages() self.parse_and_save(getpdfs=True) def get_url(self): for entry in self.apis: yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI={}".format(entry[3:10])) def get_all_pages(self): for entry, url in self.get_url(): print('Fetching main page for entry: {}'.format(entry)) response = requests.get(url) if response.status_code == 200: filename = self.completionspath / 'api_{}.html'.format(entry) with filename.open('w') as f: f.write(response.text) else: print('error downloading {}'.format(entry)) def parse_and_save(self, getpdfs=False): filelist = [file for file in self.completionspath.iterdir() if file.is_file()] for file in filelist: with file.open('r') as f: soup = BeautifulSoup(f.read(), 'lxml') if getpdfs: links = soup.find_all('a') for link in links: url = link['href'] if 'www' in url: continue print('downloading pdf at: {}'.format(url)) p = url.index('=') response = requests.get(url, stream=True, allow_redirects=False) if response.status_code == 200: try: header_info = response.headers['Content-Disposition'] idx = header_info.index('filename') filename = self.log_pdfpath / header_info[idx+9:] except ValueError: filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:]) print("couldn't locate filename for {} will use: {}".format(file, filename)) except KeyError: filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:]) print('got KeyError on {}, response.headers = {}'.format(file, response.headers)) print('will use name: {}'.format(filename)) print(response.headers) with filename.open('wb') as f: f.write(response.content) sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10]) tds = soup.find_all('td') with sfname.open('w') as f: for td in tds: if td.text: if any(field in td.text for field in self.fields): f.write('{}\n'.format(td.text)) if __name__ == '__main__': GetCompletions('apis.txt')Thanks again! Alright, I think we may have a better idea of how to do this. For now, lets forgo the idea of fixing this one. More later! Thank you! |
|
Possibly Related Threads… | |||||
Thread | Author | Replies | Views | Last Post | |
IndentationError: unexpected indent in views.py | ift38375 | 1 | 3,186 |
Dec-08-2019, 02:33 PM Last Post: michael1789 |
|
IndentationError: unexpected indent | salahhadjar | 2 | 5,678 |
Nov-04-2018, 06:10 PM Last Post: salahhadjar |