Oct-07-2018, 04:00 PM
So here is my latest draft:
I appreciate any / all help I can get!
from bs4 import BeautifulSoup import requests import re apis = ['49005253730000','49005255270000'] def wogcc_completions_scraper(): x = 0 while x < len(apis): wogcc_url = 'http://pipeline.wyo.gov/whatups/whatupcomps.cfm?nautonum={}'.format(apis[x][3:10]) print (apis[x]) wogcc_request = requests.get(wogcc_url) soup = BeautifulSoup(wogcc_request.content, "html.parser") href_tags = soup.find_all('a') x = 1 print('Foubd completions repott.') ### This section of code will data scrape the WOGCC for the completion report link_regex = "http://pipeline.wyo.gov/wellapi.cfm?nAPIno={}" link_pattern = re.compile(link_regex) link_file = re.findall(link_pattern, str(soup)) new_urls = [] y = 0 print(link_file) if len(link_file) == 0: print (str(apis[x]) + " No") else: print (str(apis[x]) + " Yes") for link in link_file: link1 = "http://pipeline.wyo.gov/" + str(link) new_urls.append(link1) while y < len(new_urls): download = requests.get(new_urls[y]) with open((str(apis[x]) + "_" + "Completion_report" + ".pdf"), "wb") as code: code.write(download.content) y += 1 wogcc_completions_scraper()Here is the error:
Error:C:\Python365\python.exe C:/scrapingEnv/WOGCC_Well_Completions_Lil_Scraper_12b_TJN_10062018.py
File "C:/scrapingEnv/WOGCC_Well_Completions_Lil_Scraper_12b_TJN_10062018.py", line 22
link_regex = "http://pipeline.wyo.gov/wellapi.cfm?nAPIno={}"
^
IndentationError: unexpected indent
Process finished with exit code 1
I hope this leads to I have gone further into solving my issues rather than falling behind. I think this issue has to do with the regex issue we talked about before or simply an indent issue. I just don't know what to replace it with. Can anyone point me into the direction of learning what the proper (or just what you know what works for you) way to do this?I appreciate any / all help I can get!