Oct-05-2018, 05:01 PM
Ok - I'm trying (and failing) to write my version of what I want my download file to obtain completion files from our oil and gas commission. I admit it's very much cobbled together. Its a mashup between what I've learned here (my job), YouTube videos, and several books / chapters on scraping the web.
They all have their way of doing things and I get that. I just don't know which one is best so guidance with that would be helpful in addition to this problem.
I am running this in a virtual environment.
They all have their way of doing things and I get that. I just don't know which one is best so guidance with that would be helpful in addition to this problem.
I am running this in a virtual environment.
from bs4 import BeautifulSoup import requests import re apis = ['49005253730000','49005255270000'] def wogcc_completions_scraper(): x = 0 while x < len(apis): wogcc_url = 'http://pipeline.wyo.gov/whatups/whatupcomps.cfm?nautonum={}' + str(apis[x][3:10]) print (str(apis[x])) las_only = [] wogcc_request = requests.get(wogcc_url) soup = BeautifulSoup(wogcc_request.content, "html.parser") href_tags = soup.find_all('a') ### This section of code will data scrape the WOGCC for the completion report link_regex = "http://pipeline.wyo.gov/wellapi.cfm?nAPIno={}" link_pattern = re.compile(link_regex) link_file = re.findall(link_pattern, str(soup)) new_urls = [] y = 0 print(link_file) if len(link_file) == 0: print (str(apis[x]) + " No") else: print (str(apis[x]) + " Yes") for link in link_file: link1 = "http://pipeline.wyo.gov/" + str(link) new_urls.append(link1) while y < len(new_urls): download = requests.get(new_urls[y]) with open((str(apis[x]) + "_" + "Completion_report" + ".pdf"), "wb") as code: code.write(download.content) y += 1 wogcc_completions_scraper()Here's the issue: 1 - it keeps running 49005253730000 over and over. There is a completions report to download here so it should be saying yes and moving to the next API(rather than No shown below). There is also another issue that I simply don't understand.
Error:Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>>
RESTART: C:\scrapingEnv\WOGCC_Well_Completions_Lil_Scraper_12b_TJN_EDITS.py
Warning (from warnings module):
File "C:\Python365\lib\site-packages\requests\__init__.py", line 91
RequestsDependencyWarning)
RequestsDependencyWarning: urllib3 (dev) or chardet (3.0.4) doesn't match a supported version!
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
[]
49005253730000 No
49005253730000
As always - any help you can provide will be most appreciated!