Python Forum
"EOL While Scanning String Literal"
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
"EOL While Scanning String Literal"
#21
Read the error message!
Quote:FileNotFoundError: [Errno 2] No such file or directory: 'text\\apis.txt'
is there a text directory directly below the source, and did you download the file attached to my post and put it there as requested?

Do so and try again.
Reply
#22
Ok - I've read the error. The problem is this 'text\\apis.txt'. It shows \\. I am unable to add a \ to the file nor am I able to create a file without a name. So I am stuck.

I’ve copied and pasted your text file into a text document and pasted it into the appropriate place. There is no option to simply save it (it could also be operator error). For that matter this is all operator error I’m sure. I’ve tried clicking it, right clicking it – still the same options.

I thought this was obvious but maybe not so much. I did try placing the api.txt file in the text folder and ran it. I still got the error.
Reply
#23
the file name should be 'apis.txt' and should be in the text folder.

Quote:I did try placing the api.txt file in the text folder and ran it. I still got the error.
check spelling, 'apis.txt' not 'api.txt'
Look at line 97
Reply
#24
Ok - Lars60+ - I feel like an idiot! Sometimes I can't see the forest for the trees! For this, I sincerely apologize!

I ran the code and it worked great for all of the apis for "Fetching main page for entry: "api" at this point it stops. The xx_completions_xx file has the files that were obtained with the above code as well as the summary reports. I just don't get the completion reports.

I'm sorry this has taken so long - I've tried several things. I thought the problem might be on my end. The weird thing is most of your code matches what you had before.

Let me know what you think.

Thanks!

T
Reply
#25
your directory structure should look like:
Output:
parent/ src/ OilWells.py (or whatever you named the script) logpdfs/ all pdf files text/ apis.txt all summary files xx_completions_xx/ all completion html files (double clicking on any brings up browser with report
This has to be what you have as well, if the program ran without errors
You should dig into the code and understand what is happening.
If you don't under stand part of it, ask.
Reply
#26
Larz60+ - I'm sorry. I've been working on other projects so I've had to step away from this one. I've gotten this to work beautifully but I have some questions...

The major one is when the pdf's are downloaded they don't have the API numbers that match the website numbers or the API text list. Can you tell me where they are coming from? Is there any way we can make them match? If there are multiple completions, are they all downloaded? I've looked for a way to attach one of the files, but I can't find it. If you can tell me how, I will add it as a reference.

The other data you've included is awesome! I wouldn't have thought of it. Can we make a report of it at the end and include the results of the completions? Whether there was a completion(s) or not? If we can't include your awesome data, it's OK. I can get it from another source.

I'm really getting into David Beazely's teachings. I really appreciate that lead! I've gotten his Python Cookbook. I'm looking forward to the next one considering this one was written in 2013. I like his videos on YouTube too.

At any rate, I really appreciate your help!
Reply
#27
Refresh my memory (this was half a month ago, can't remember that far back)
Please provide:
  • Current Name of pdf
  • Desired name of pdf
Reply
#28
When it's downloaded - it's renamed to "comp105709.pdf".
I would like the name to stay the same - 921263002.pdf for this particular one there are two completion reports so the first would be 921263001.pdf.

It looks like the number behind the 'comp' is maybe random?

Thanks!
Reply
#29
Quote:I would like the name to stay the same - 921263002.pdf for this particular one there are two completion reports so the first would be 921263001.pdf.
This statement is confusing.
What does 921263001.pdf have to do with 921263002.pdf?

meanwhile, I need to find where I put the code!
Reply
#30
import requests
from bs4 import BeautifulSoup
from pathlib import Path
import sys
 
class GetCompletions:
    def __init__(self, infile):
        self.homepath = Path('.')
        self.completionspath = self.homepath / 'xx_completions_xx'
        self.completionspath.mkdir(exist_ok=True)
        self.log_pdfpath = self.homepath / 'logpdfs'
        self.log_pdfpath.mkdir(exist_ok=True)
        self.textpath = self.homepath / 'text'
        self.textpath.mkdir(exist_ok=True)
 
        self.infile = self.textpath / infile
        self.apis = []
 
        with self.infile.open() as f:
            for line in f:
                self.apis.append(line.strip())
 
        self.fields = ['Spud Date', 'Total Depth', 'IP Oil Bbls', 'Reservoir Class', 'Completion Date',
                       'Plug Back', 'IP Gas Mcf', 'TD Formation', 'Formation', 'IP Water Bbls']
        self.get_all_pages()
        self.parse_and_save(getpdfs=True)
 
    def get_url(self):
        for entry in self.apis:
            yield (entry, "http://wogcc.state.wy.us/wyocomp.cfm?nAPI={}".format(entry[3:10]))
 
    def get_all_pages(self):
        for entry, url in self.get_url():
            print('Fetching main page for entry: {}'.format(entry))
            response = requests.get(url)
            if response.status_code == 200:
                filename = self.completionspath / 'api_{}.html'.format(entry)
                with filename.open('w') as f:
                    f.write(response.text)
            else:
                print('error downloading {}'.format(entry))
 
    def parse_and_save(self, getpdfs=False):
        filelist = [file for file in self.completionspath.iterdir() if file.is_file()]
        for file in filelist:
            with file.open('r') as f:
                soup = BeautifulSoup(f.read(), 'lxml')
            if getpdfs:
                links = soup.find_all('a')
                for link in links:
                    url = link['href']
                    if 'www' in url:
                        continue
                    print('downloading pdf at: {}'.format(url))
                    p = url.index('=')
                    filename = self.log_pdfpath / 'comp{}.pdf'.format(url[p+1:])
                    response = requests.get(url, stream=True, allow_redirects=False)
                    if response.status_code == 200:
                        with filename.open('wb') as f:
                            f.write(response.content)
            sfname = self.textpath / 'summary_{}.txt'.format((file.name.split('_'))[1].split('.')[0][3:10])
            tds = soup.find_all('td')
            with sfname.open('w') as f:
                for td in tds:
                    if td.text:
                        if any(field in td.text for field in self.fields):
                            f.write('{}\n'.format(td.text))
 
 
        with self.infile.open('w') as f:
            for item in apis:
                f.write(f'{item}\n')
 
if __name__ == '__main__':
    GetCompletions('apis.txt')
I think this is what you need.

What does 921263001.pdf have to do with 921263002.pdf? - Some of the wells have more than one completion report. I just need to make sure I get them all.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Literal beginner - needs help warriordazza 2 1,764 Apr-27-2020, 11:15 AM
Last Post: warriordazza

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020