Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
I have no idea
#3
So here is my latest draft:

from bs4 import BeautifulSoup
import requests
import re

apis = ['49005253730000','49005255270000']


def wogcc_completions_scraper():
    x = 0
    while x < len(apis):
        wogcc_url = 'http://pipeline.wyo.gov/whatups/whatupcomps.cfm?nautonum={}'.format(apis[x][3:10])
        print (apis[x])
        wogcc_request = requests.get(wogcc_url)
        soup = BeautifulSoup(wogcc_request.content, "html.parser")
        href_tags = soup.find_all('a')
    x = 1
    print('Foubd completions repott.')

   ### This section of code will data scrape the WOGCC for the completion report

        link_regex = "http://pipeline.wyo.gov/wellapi.cfm?nAPIno={}"
        link_pattern = re.compile(link_regex)
        link_file = re.findall(link_pattern, str(soup))

        new_urls = []
        y = 0

        print(link_file)
        if len(link_file) == 0:
            print (str(apis[x]) + " No")
        else:
            print (str(apis[x]) + " Yes")
            for link in link_file:
                link1 = "http://pipeline.wyo.gov/" + str(link)
                new_urls.append(link1)

            while y < len(new_urls):
                download = requests.get(new_urls[y])
                with open((str(apis[x]) + "_" + "Completion_report" + ".pdf"), "wb") as code:
                    code.write(download.content)
            y += 1
    

        
wogcc_completions_scraper()
Here is the error:

Error:
C:\Python365\python.exe C:/scrapingEnv/WOGCC_Well_Completions_Lil_Scraper_12b_TJN_10062018.py File "C:/scrapingEnv/WOGCC_Well_Completions_Lil_Scraper_12b_TJN_10062018.py", line 22 link_regex = "http://pipeline.wyo.gov/wellapi.cfm?nAPIno={}" ^ IndentationError: unexpected indent Process finished with exit code 1
I hope this leads to I have gone further into solving my issues rather than falling behind. I think this issue has to do with the regex issue we talked about before or simply an indent issue. I just don't know what to replace it with. Can anyone point me into the direction of learning what the proper (or just what you know what works for you) way to do this?

I appreciate any / all help I can get!
Reply


Messages In This Thread
I have no idea - by tjnichols - Oct-05-2018, 05:01 PM
RE: I have no idea - by buran - Oct-05-2018, 07:07 PM
RE: I have no idea - by tjnichols - Oct-07-2018, 04:00 PM
RE: I have no idea - by buran - Oct-07-2018, 04:05 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020