Traceback error - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Traceback error (/thread-10502.html) |
RE: Traceback error - buran - May-23-2018 I think this is it https://python-forum.io/Thread-EOL-While-Scanning-String-Literal?page=4 RE: Traceback error - tjnichols - May-23-2018 Ok - I like the idea of going back to the Larz60+ file. The problem is it only gets me half of what I need. self.homepath = Path('.') self.completionspath = self.homepath / 'comppdf' self.completionspath.mkdir(exist_ok=True) self.geocorepdf = self.homepath / 'geocorepdf' self.geocorepdf.mkdir(exist_ok=True) self.textpath = self.homepath / 'text' self.text.mkdir(exist_ok=True)While this is only creating folders - it will hold completion reports for oil and gas wells. For example: http://wogcc.state.wy.us/legacywogcce.cfm and click on 'Wells', then click on 'By API Number' and then enter 2521203 in the space provided. Up on the top right are 'Completions' and 'Cores/Pressures/Reports'. These are the two places I need reports from. Larz60+'s file is the one that got me where I am right now. I'm new to this and he has helped me more than most can possibly imagine. That said - this is where we are. import requests from bs4 import BeautifulSoup from pathlib import Path class GetCompletions: def __init__(self, infile): """Above will create a folder called comppdf, and geocorepdf wherever the WOGCC File Downloads file is run from as well as a text file for my api file to reside. """ self.homepath = Path('.') self.completionspath = self.homepath / 'comppdf' self.completionspath.mkdir(exist_ok=True) self.geocorepdf = self.homepath / 'geocorepdf' self.geocorepdf.mkdir(exist_ok=True) self.textpath = self.homepath / 'text' self.textpath.mkdir(exist_ok=True) self.infile = self.textpath / infile self.api = [] self.parse_and_save(getpdfs=True) def get_url(self): for entry in self.apis: yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10])) yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10])) """Above will get the URL that matches my API numbers.""" def parse_and_save(self, getpdfs=False): for file in filelist: with file.open('r') as f: soup = BeautifulSoup(f.read(), 'lxml') if getpdfs: links = soup.find_all('a') for link in links: url in link['href'] if 'www' in url: continue print('downloading pdf at: {}'.format(url)) p = url.index('=') response = requests.get(url, stream=True, allow_redirects=False) if response.status_code == 200: try: header_info = response.headers['Content-Disposition'] idx = header_info.index('filename') filename = self.log_pdfpath / header[idx+9:] except ValueError: filename = self.log_pdfpath / 'comp{}'.format(url[p+1:]) print("couldn't locate filename for {} will use: {}".format(file, filename)) except KeyError: filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:]) print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers)) print('will use name: {}'.format(filename)) print(repsonse.headers) with filename.open('wb') as f: f.write(respnse.content) self.parse_and_save(getpdfs=True) sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10]) tds = soup.find_all('td') with sfname.open('w') as f: for td in tds: if td.text: if any(field in td.text for field in self.fields): f.write('{}\n'.format(td.text)) if __name__ == '__main__': GetCompletions('api.txt')Ok - errors help me too! I hope this all makes sense. I really appreciate your help!
RE: Traceback error - Larz60+ - May-23-2018 Here's that URL again, https://python-forum.io/Thread-EOL-While-Scanning-String-Literal?pid=45974#pid45974 Take the code under New code. Double-click on code Ctrl-c --> Copy PyCharm (your project) right-click on src directory label (left window) choose new-->PythonFile whatever name you wish (do not add .py, it will do it for you) Ctrl-v --> Paste File --> save-all with cursor in code window: run --> run --> program name from above (with internet access available) It's still going to expect an apis.txt file in the text directory (directly under src directory (same as original setup)) It will create other needed directories as required. It will run and run properly. Just remember that whatever api numbers you include in the apis.txt file will be downloaded each and every time. To prevent this, rename apis.txt to apisMay23-2018.txt or some such name and start a brand new apis.txt file with the values you want to get. RE: Traceback error - tjnichols - May-24-2018 Ok Lars60+ - I did the instructions above. I ran it as you suggested in PyCharm this is the error I received: When I run it with IDLE it creates the folders but returns nothing.I still need to get the additional reports though.Thanks for your help!! RE: Traceback error - Larz60+ - May-24-2018 pip install requests RE: Traceback error - tjnichols - May-24-2018 Yes - I saw that error too. It says the request has already been satisfied. RE: Traceback error - Larz60+ - May-24-2018 the error was: ModuleNotFoundError: No module named 'requests' clearly the version of Python being used does not have a requests module. in PyCharm, with cursor in code you wish to execute, click Run --> Edit Configurations look at Python interpreter ... make sure it's the right one, if not select proper one and update, otherwise just cancel, also look at working directory to make sure that's correct finally, from cmder, type pip -V and make sure right version of pip is being used RE: Traceback error - tjnichols - May-24-2018 Ok - that was cool! No errors but no results either. When you say check the version of pip - I did that and it lists the things you can run with pip (well that's what I think it is). It doesn't give me an error or anything. It does say this whenever I install something using pip: Quote:You are using pip version 9.0.3, however version 10.0.1 is available. Should I upgrade? The last time I did, it caused all kinds of havoc so I'm reluctant to do it now. Thanks for your help! RE: Traceback error - Larz60+ - May-24-2018 this is what you should see: with of course your version numbers/I am off to do some moving, I have a solid move date (from the moving company) of May 31. So as of June1 If internet hooked up properly, I should be back to a (fairly) normal schedule RE: Traceback error - tjnichols - May-24-2018 Ok - this is what I have: Quote:λ pip -V Does this mean you won't be back on here until June 1? Quote:λ pip -V I did the upgrade. |