Traceback error - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Traceback error (/thread-10502.html) |
Traceback error - tjnichols - May-23-2018 When I run the following import requests from bs4 import BeautifulSoup from pathlib import Path class GetCompletions: def __init__(self, infile): """Above will create a folder called comppdf, and geocorepdf wherever the WOGCC File Downloads file is run from as well as a text file for my api file to reside. """ self.homepath = Path('.') self.completionspath = self.homepath / 'comppdf' self.completionspath.mkdir(exist_ok=True) self.geocorepdf = self.homepath / 'geocorepdf' self.geocorepdf.mkdir(exist_ok=True) self.textpath = self.homepath / 'text' self.text.mkdir(exist_ok=True) self.infile = self.textpath / infile self.api = [] self.parse_and_save(getpdfs=True) def get_url(self): for entry in self.apis: yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10])) yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10])) """Above will get the URL that matches my API numbers.""" def parse_and_save(self, getpdfs=False): for file in filelist: with file.open('r') as f: soup = BeautifulSoup(f.read(), 'lxml') if getpdfs: links = soup.find_all('a') for link in links: url in link['href'] if 'www' in url: continue print('downloading pdf at: {}'.format(url)) p = url.index('=') response = requests.get(url, stream=True, allow_redirects=False) if response.status_code == 200: try: header_info = response.headers['Content-Disposition'] idx = header_info.index('filename') filename = self.log_pdfpath / header[idx+9:] except ValueError: filename = self.log_pdfpath / 'comp{}'.format(url[p+1:]) print("couldn't locate filename for {} will use: {}".format(file, filename)) except KeyError: filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:]) print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers)) print('will use name: {}'.format(filename)) print(repsonse.headers) with filename.open('wb') as f: f.write(respnse.content) sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10]) tds = soup.find_all('td') with sfname.open('w') as f: for td in tds: if td.text: if any(field in td.text for field in self.fields): f.write('{}\n'.format(td.text)) if __name__ == '__main__': GetCompletions('api.txt')It doesn't create this text file. self.textpath = self.homepath / 'text' self.text.mkdir(exist_ok=True)I get the following error I appreciate any help I can get!Thanks! Tonya RE: Traceback error - tjnichols - May-23-2018 when I change the following code it makes the text folder but I come up with more errors than I had before. self.textpath = self.homepath / 'text' self.text.mkdir(exist_ok=True)This is different from my previous version: self.textpath = self.homepath / 'text' self.textpath.mkdir(exist_ok=True)And the resulting error: Thoughts? Ideas?As always - your help is greatly appreciated! RE: Traceback error - Larz60+ - May-23-2018 It should be like this (as in original code) self.textpath = self.homepath / 'text' self.textpath.mkdir(exist_ok=True) RE: Traceback error - tjnichols - May-23-2018 That is what it says now. I don't understand the extra spaces above, they aren't that way in my ,py file. RE: Traceback error - Larz60+ - May-23-2018 Whatever editor you are using is not showing tabs. You should always use spaces, that way when you use a bad editor it will show the white space. When copying code, you need to cut and paste, not retype. That way you know what you're getting. I suggest you go back to the code (under 'new code' here: https://python-forum.io/Thread-EOL-While...ral?page=4 select the code by double clicking in the code window (on any line of code) Type ctrl-c to copy the code, and ctrl-V into a new PyCharm window. Save the code. with curson in PyCharm window, from meny, click run-->run-->your program name It should run without error (Of course you must be connected to internet) If copied and pasted correctly, this code works. RE: Traceback error - tjnichols - May-23-2018 Ok - when I run it - I get : Fetching main page for entry "api#". It gets to the end then I get this error.
RE: Traceback error - nilamo - May-23-2018 (May-23-2018, 02:28 PM)tjnichols Wrote: AttributeError: 'GetCompletions' object has no attribute 'text' That error tells you everything you need to know. (May-23-2018, 02:28 PM)tjnichols Wrote:What do you thinkself.textpath = self.homepath / 'text' self.text.mkdir(exist_ok=True) self.text is? The answer is nothing, since you never set it. And because it's nothing, it definitely doesn't have a .mkdir method. The error is letting you know that you're trying to use a thing that doesn't exist. So the solution depends entirely on what you're expecting to happen.
RE: Traceback error - nilamo - May-23-2018 (May-23-2018, 06:01 PM)tjnichols Wrote: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?Again, the error tells you everything you need to know. You're telling BeautifulSoup to use the lxml parser, but you don't have that installed. So, install it pip install lxml
RE: Traceback error - tjnichols - May-23-2018 Ok - scratch the last message. I got ahead of myself. All of the the addins - I'm doing that now. As well as running it in PyCharm. Also - doing that now. This is done! Thanks nilamo! (May-23-2018, 06:05 PM)nilamo Wrote:(May-23-2018, 06:01 PM)tjnichols Wrote: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?Again, the error tells you everything you need to know. You're telling BeautifulSoup to use the lxml parser, but you don't have that installed. So, install it RE: Traceback error - tjnichols - May-23-2018 Larz60+ - when I go to the link - it gives me a 404 error. Can you repost it? Thanks again! |