Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unexpected indent / invalid syntax
#1
When I run this I get an "unexpected indent error" at 'def __init__ (self, infile):. When I remove the space that's marked, I get a "invalid syntax error".

I 'refurbished' this from what Lars60+ invented for me so this is just getting started.

As always - any help is most appreciated!

Thank you!

import requests
from bs4 import BeautifulSoup
from pathlib import path

class GetCompletions:
    def __init__(self, infile):
        self.homepath = Path('.')
        self.log.pdfpath = self.homepath / 'comppdf'
        self.log.pdfpath.mkdir(exist_ok=True)
        self.log.pdfpath = self.homepath / 'geocorepdf'
        self.log.pdfpath.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)

        self.infile = self.textpath / infile
        self.api = []

        self.parse_and_save(getpdfs=True)

"""Above will create a folder called comppdf, and wsgeo wherever the WOGCC
File Downloads file is run from as well as a text file for my api file to
reside."""

    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10]))
            yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10]))

"""Above will get the URL that matches my API numbers."""

    def parse_and_save(self, getpdfs=False):
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url in link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                          try:
                              header_info = response.headers['Content-Disposition']
                              idx = header_info.index('filename')
                              filename = self.log_pdfpath / header[idx+9:]
                          except ValueError:
                              filename = self.log_path / 'comp{}'.format(url[p 1:])
                              print("couldn't locate filename for {} will use: {}".format(file, filename))
                          except KeyError
                              filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:])
                              print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers))
                              print('will use name: {}'.format(filename))
                              print(repsonse.headers)
                          with filename.open('wb') as f:
                              f.write(respnse.content)

            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.txt
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text)

if__name__ == '__main__':
    GetCompletions('api.txt')
Reply
#2
You can solve this by indenting the string directly above the line where this problem occurs.
Reply
#3
Quote:"""Above will create a folder called comppdf, and wsgeo wherever the WOGCC
File Downloads file is run from as well as a text file for my api file to
reside."""
This type of comment should immediately follow the 'def' for a method, and should be indented.
For example, above should be like this:
[python]
import requests
from bs4 import BeautifulSoup
from pathlib import Path


class GetCompletions:
    def __init__(self, infile):
        """Above will create a folder called comppdf, and wsgeo wherever the WOGCC
           File Downloads file is run from as well as a text file for my api file to
           reside.
        """
Also, in line:
from pathlib import path
path needs upper case like:
from pathlib import Path
Reply
#4
Ok - cool - that got rid of the 'unexpected indent' errors. Now its just 'invalid syntax' errors.

Is there a place I can go to find out why this may be happening? I really appreciate your help - please don't get me wrong. I am just wondering about other resources.

Thanks for all of your help!


import requests
from bs4 import BeautifulSoup
from pathlib import Path

class GetCompletions:
    def __init__(self, infile):
        """Above will create a folder called comppdf, and wsgeo wherever the WOGCC
           File Downloads file is run from as well as a text file for my api file to
           reside.
        """
        self.homepath = Path('.')
        self.log.pdfpath = self.homepath / 'comppdf'
        self.log.pdfpath.mkdir(exist_ok=True)
        self.log.pdfpath = self.homepath / 'geocorepdf'
        self.log.pdfpath.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.text.mkdir(exist_ok=True)

        self.infile = self.textpath / infile
        self.api = []

        self.parse_and_save(getpdfs=True)



    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI=[]".format(entry[3:10]))
            yield (entry, "http://wogcc.state.wy.us/whatupcores.cfm?autonum=[]".format(entry[3:10]))

        """Above will get the URL that matches my API numbers."""

    def parse_and_save(self, getpdfs=False):
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url in link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                          try:
                              header_info = response.headers['Content-Disposition']
                              idx = header_info.index('filename')
                              filename = self.log_pdfpath / header[idx+9:]
                          except ValueError:
                              filename = self.log_path / 'comp{}'.format(url[p 1:])
                              print("couldn't locate filename for {} will use: {}".format(file, filename))
                          except KeyError
                              filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p + 1:])
                              print('got KeyError on {}, respnse.headers = {}'.format(file, response.headers))
                              print('will use name: {}'.format(filename))
                              print(repsonse.headers)
                          with filename.open('wb') as f:
                              f.write(respnse.content)

            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.txt
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text)

if__name__ == '__main__':
    GetCompletions('api.txt')
Reply
#5
Not having run your code, it looks like you forgot a space after "if" on line 70.

As far as resources, what are you looking for, specifically? If you're just trying to deal with syntax issues, the interpreter is usually really good about it. How are you running your code?
Reply
#6
(May-14-2018, 06:02 PM)tjnichols Wrote: Now its just 'invalid syntax' errors.

Please share the entire traceback, so we don't have to read through every line of your code and guess what the issue could be :p
Reply
#7
You have to learn to read the tea leaves in the Traceback Error or Syntax Error. The error is usually in the line number specified in the Error message. Sometimes the error is in the line before the line number specified, when something is missing.

Output:
File "temp.py", line 52 filename = self.log_path / 'comp{}'.format(url[p 1:]) ^ SyntaxError: invalid syntax
From looking at your code and similar lines, you probably want to move the pointer to after the '=' sign similar to other constructions in your code:
Try:
 filename = self.log_path / 'comp{}'.format(url[p + 1:])
The next error is:
Output:
File "temp.py", line 54 except KeyError ^ SyntaxError: invalid syntax
Try to figure this one out yourself. Look around at similar constructions to see what is different.


When you get to the syntax error at the bottom of the file, it gets a little tricky, because you first have to fix the error on the top line:
    f.write('{}\n'.format(td.text)
if__name__ == '__main__':
   GetCompletions('api.txt')
which looks like it occurs on the next line. The next line also has an error, which makes it a little tricky for a beginner. Clue to the last error: there is a missing 'space'.

Lewis
To paraphrase: 'Throw out your dead' code. https://www.youtube.com/watch?v=grbSQ6O6kbs Forward to 1:00
Reply
#8
Lewis - I understand what you're saying and I've seen what you're talking about before. This is different. I wish I could post a picture.

It's a popup window - left side is a piece of paper with a Python symbol over it. Right side is a white X to close the error. In the middle is a large white X with a red circle and bottom right there is a button to click ok.

There is nothing to tell me where my error is. Thanks for the extra white space hint! If I'm correct, I should've added one before "Get".

Any other ideas? I appreciate your help!

Thanks!
Reply
#9
It sounds like you are running the script from an IDE (Integrated Development Environment). Try using the command line.

Go to the folder (directory) that contains your source file.
python yourfilename.py
To paraphrase: 'Throw out your dead' code. https://www.youtube.com/watch?v=grbSQ6O6kbs Forward to 1:00
Reply
#10
Ok - I missed the space after "if" thanks for that! I removed the space before "Get".

Well that was REALLY HELPFUL!!! At least we have an error. I've looked it up but I don't understand it. I would appreciate help but can you also point me in the direction to learn about the error?

Thanks!

Error:
>>>WOGCC_File_Downloads.py Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'WOGCC_File_Downloads' is not defined
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  IndentationError: unexpected indent in views.py ift38375 1 2,533 Dec-08-2019, 02:33 PM
Last Post: michael1789
  IndentationError: unexpected indent salahhadjar 2 4,397 Nov-04-2018, 06:10 PM
Last Post: salahhadjar

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020