Jul-17-2023, 01:50 AM
I want to scrape latitude and longitude of cell towers from the FCC web site. I am writing in Python by using requests package.
The problem is that I’m not getting back any data. Here is the site:
https://wireless2.fcc.gov/UlsApp/AsrSear...Search.jsp
Entering the data manually I get my data.
Here is the manual query.
https://drive.google.com/file/d/1GsswwgB...sp=sharing
Some of the data I got back.
https://drive.google.com/file/d/1GdKW_zHdNDJlE6Ddo5DIaHNaOKHjnwfH/view?usp=sharing
https://drive.google.com/file/d/12SH98yg...sp=sharing
The problem is that I’m not getting back any data. Here is the site:
https://wireless2.fcc.gov/UlsApp/AsrSear...Search.jsp
Entering the data manually I get my data.
Here is the manual query.
https://drive.google.com/file/d/1GsswwgB...sp=sharing
Some of the data I got back.
https://drive.google.com/file/d/1GdKW_zHdNDJlE6Ddo5DIaHNaOKHjnwfH/view?usp=sharing
Output:# Scrape Fed website
#
from datetime import datetime
import pandas as pd
import requests
from bs4 import BeautifulSoup
# ------------------------------>
# https://stackoverflow.com/questions/16337511/log-all-requests-from-the-python-requests-module
# turns on debugging.
#import requests #already imported
import logging
import http.client
http.client.HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
# <-------------- end of debug code --------
# try : # why hide the errors?
# datetime object containing current date and time
now = datetime.now()
# dd/mm/YY H:M:S
dt_string = now.strftime("%m/%d/%Y %H:%M:%S")
print("|| dt_string is",dt_string)
print("||-----------------------> ",dt_string+" ---------------")
# post your link
url ="https://wireless2.fcc.gov/UlsApp/AsrSearch/asrRegistrationSearch.jsp"
# Get the page
page = requests.get(url)
print("||type of page is ", type(page))
print("||encoding of page is ", page.encoding)
print("||convert page to string is ", str(page) )
print("||length of converted string is ", len(str(page)) )
#print("||page.content is ", page.content)
#print("||web page,\n",page)
# "w" or"a"
f = open("webPageQuery.html", "w")
f.write("<!-- Now the file has more content! -->")
f.write(page.text)
f.close
f = open("webPageQuery.txt", "w")
f.write("<!-- Now the file has more content for sure. -->")
f.write(page.text)
f.close
print("||page.status_code is ", page.status_code)
Output:# fill in the data
varN = "N"
varW = "W"
varKilometers = "Kilometers"
varState = "state"
varAL = "AL"
var01001 = "01001"
varSubmit = "Submit"
varN = "N"
varAny = "any"
varMeters = "Meters"
varMeters = "Meters"
vartrue = "true"
# format for results
payload = \
{\
"fiLatDeg":"",\
"fiLatMin":"",\
"fiLatSec":"",\
"fiLatDir":varN,\
"fiLongDeg":"",\
"fiLongMin":"",\
"fiLongSec":"",\
"fiLongDir":varW,\
"fiRadius":"",\
"fiRadiusMetricType":varKilometers,\
"locatechoice":varState,\
"asr_r_city":"",\
"asr_r_state":varAL,\
"asr_r_county":var01001,\
"asr_r_structure_zipcode":"",\
"Submit":varSubmit,\
"fiExactMatchInd":varN,\
"fiHeightChoice":varAny,\
"fiOverallHgtAGL":"",\
"fiOverallAGLExactMetricType":varMeters,\
"fiLowerOverallHgtAGL":"",\
"fiUpperOverallHgtAGL":"",\
"fiOverallAGLRangeMetricType":varMeters,\
"jsValidated":vartrue
}
print("||payload is ", payload)
Output:r = requests.post(url, data=payload)
print("||type of page is ", type(r))
print("||encoding of page is ", r.encoding)
print("||convert page to string is ", str(r) )
print("||length of converted string is ", len(str(r)) )
#print("||page.content is ", r.content)
#print("||web page,\n",r)
# "w" or"a"
f = open("webPageFedQuery.html", "w")
f.write("<!-- Now the file has more content! -->")
f.write(r.text)
f.close
f = open("webPageFedQuery.txt", "w")
f.write("<!-- Now the file has more content for sure. -->")
f.write(r.text)
f.close
print("||page.status_code is ", r.status_code)
Next, I have the results. I don't get any of the cell tower locations. I enabled import logging. I put || at the start of the print statement.Output:Parallels-HS-user-mac:python mac$ python3 scrapeState\&Local.py
|| foo
|| <module>
|| dt_string is 07/13/2023 15:32:35
||-----------------------> 07/13/2023 15:32:35 ---------------
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): wireless2.fcc.gov:443
send: b'GET /UlsApp/AsrSearch/asrRegistrationSearch.jsp HTTP/1.1\r\nHost: wireless2.fcc.gov\r\nUser-Agent: python-requests/2.31.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Thu, 13 Jul 2023 19:32:35 GMT
header: Content-Type: text/html; charset=ISO-8859-1
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Set-Cookie: AWSALB=4P9ulojhmIW4BvMIfRFbMw6h05JCR21eZi3qC+3I/WhLdl0XMwptO5bRI0zceveiX1LQaXDa4CCT2Cb0+/VGIP2GlV8gfijFF3ppfswVipx7c16iIb9kKYWIPLVu; Expires=Thu, 20 Jul 2023 19:32:35 GMT; Path=/
header: Set-Cookie: AWSALBCORS=4P9ulojhmIW4BvMIfRFbMw6h05JCR21eZi3qC+3I/WhLdl0XMwptO5bRI0zceveiX1LQaXDa4CCT2Cb0+/VGIP2GlV8gfijFF3ppfswVipx7c16iIb9kKYWIPLVu; Expires=Thu, 20 Jul 2023 19:32:35 GMT; Path=/; SameSite=None; Secure
header: Server: Apache
header: Strict-Transport-Security: max-age=31536000; includeSubDomains
header: X-Frame-Options: SAMEORIGIN
header: Set-Cookie: JSESSIONID_ASRSEARCH=Uj5Qva28W93KrBEbU3rIeeqMLLLfs0T06ToO_ERht3w_SNktaEcl!-188465283!1997095822; path=/UlsApp/; HttpOnly
header: X-OneAgent-JS-Injection: true
header: X-ruxit-JS-Agent: true
header: Server-Timing: dtSInfo;desc="0", dtRpid;desc="-176933270"
header: Set-Cookie: dtCookie=v_4_srv_3_sn_B2EC06D566E8DB06EB91CF9992E6E07C_perc_100000_ol_0_mul_1_app-3A12116eb046fa524b_1; Path=/; Domain=.fcc.gov
DEBUG:urllib3.connectionpool:https://wireless2.fcc.gov:443 "GET /UlsApp/AsrSearch/asrRegistrationSearch.jsp HTTP/1.1" 200 None
||type of page is <class 'requests.models.Response'>
||encoding of page is ISO-8859-1
||convert page to string is <Response [200]>
||length of converted string is 16
||page.status_code is 200
||------------------- First data page ----------------------------
||payload is {'fiLatDeg': '', 'fiLatMin': '', 'fiLatSec': '', 'fiLatDir': 'N', 'fiLongDeg': '', 'fiLongMin': '', 'fiLongSec': '', 'fiLongDir': 'W', 'fiRadius': '', 'fiRadiusMetricType': 'Kilometers', 'locatechoice': 'state', 'asr_r_city': '', 'asr_r_state': 'AL', 'asr_r_county': '01001', 'asr_r_structure_zipcode': '', 'Submit': 'Submit', 'fiExactMatchInd': 'N', 'fiHeightChoice': 'any', 'fiOverallHgtAGL': '', 'fiOverallAGLExactMetricType': 'Meters', 'fiLowerOverallHgtAGL': '', 'fiUpperOverallHgtAGL': '', 'fiOverallAGLRangeMetricType': 'Meters', 'jsValidated': 'true'}
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): wireless2.fcc.gov:443
send: b'POST /UlsApp/AsrSearch/asrRegistrationSearch.jsp HTTP/1.1\r\nHost: wireless2.fcc.gov\r\nUser-Agent: python-requests/2.31.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Length: 414\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\n'
send: b'fiLatDeg=&fiLatMin=&fiLatSec=&fiLatDir=N&fiLongDeg=&fiLongMin=&fiLongSec=&fiLongDir=W&fiRadius=&fiRadiusMetricType=Kilometers&locatechoice=state&asr_r_city=&asr_r_state=AL&asr_r_county=01001&asr_r_structure_zipcode=&Submit=Submit&fiExactMatchInd=N&fiHeightChoice=any&fiOverallHgtAGL=&fiOverallAGLExactMetricType=Meters&fiLowerOverallHgtAGL=&fiUpperOverallHgtAGL=&fiOverallAGLRangeMetricType=Meters&jsValidated=true'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Thu, 13 Jul 2023 19:32:35 GMT
header: Content-Type: text/html; charset=ISO-8859-1
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Set-Cookie: AWSALB=bSnLvJkOlLdcM7LDq2Hgmjuuin9ams5fy2ohNGoGjUod3iIG2ZDtVG+0V0KyLgwrKg6AJ57wR3ESCppOyw0vwIZRqAmJMDOW2lQhvRLYR5ftAdIjtFreqCqJ6Yiz; Expires=Thu, 20 Jul 2023 19:32:35 GMT; Path=/
header: Set-Cookie: AWSALBCORS=bSnLvJkOlLdcM7LDq2Hgmjuuin9ams5fy2ohNGoGjUod3iIG2ZDtVG+0V0KyLgwrKg6AJ57wR3ESCppOyw0vwIZRqAmJMDOW2lQhvRLYR5ftAdIjtFreqCqJ6Yiz; Expires=Thu, 20 Jul 2023 19:32:35 GMT; Path=/; SameSite=None; Secure
header: Server: Apache
header: Strict-Transport-Security: max-age=31536000; includeSubDomains
header: X-Frame-Options: SAMEORIGIN
header: Set-Cookie: JSESSIONID_ASRSEARCH=p9lQva9ZjHOzmvlXxki25yTiTd2CEwYBeIBieLFy8j0533OmwmiP!-188465283!1997095822; path=/UlsApp/; HttpOnly
header: X-OneAgent-JS-Injection: true
header: X-ruxit-JS-Agent: true
header: Server-Timing: dtSInfo;desc="0", dtRpid;desc="-1328681484"
header: Set-Cookie: dtCookie=v_4_srv_5_sn_A49E4149C72FD6723F1F2F9C9DDEED70_perc_100000_ol_0_mul_1_app-3A12116eb046fa524b_1; Path=/; Domain=.fcc.gov
DEBUG:urllib3.connectionpool:https://wireless2.fcc.gov:443 "POST /UlsApp/AsrSearch/asrRegistrationSearch.jsp HTTP/1.1" 200 None
||type of page is <class 'requests.models.Response'>
||encoding of page is ISO-8859-1
||convert page to string is <Response [200]>
||length of converted string is 16
||page.status_code is 200
Parallels-HS-user-mac:python mac$
This is what data I should get from the request.https://drive.google.com/file/d/12SH98yg...sp=sharing