It's a bit tricky, because the site uses javascript. But you are lucky because closer inspection at the requests being send from browser reveals they get the data from api in json format. Then we can replicate the request headers as closely as possible.
Here is something to start with
probably it's a good idea to implement user-agent and proxy rotation if you are going to do a lot of requests in order to avoid detection.
Here is something to start with
import requests from bs4 import BeautifulSoup import time def get_json(query): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0', 'Accept': 'application/json, text/plain, */*', 'Accept-Language': 'en-US,en;q=0.5', 'Referer': 'https://ucr.gov/enforcement/{}'.format(query), 'Cache-Control': 'no-cache,no-store,must-revalidate,max-age=0,private', 'UCR-UI-Version': '19.2.1', 'Origin': 'https://ucr.gov', 'Connection': 'keep-alive', } s = requests.Session() params = ( ('pageNumber', '0'), ('itemsPerPage', '15'), ) url = 'https://admin.ucr.gov/api/enforcement/{}'.format(query) response = s.get(url, headers=headers, params=params) return response.json() if __name__ == '__main__': dots = [192123, 1921, 192, 1712583] for query in dots: data = get_json(query=query) print('DOT: {}'.format(query)) if data.get('carrier'): for registration in data['history']['registrations']: print('Year: {year}, Status: {status}'.format(**registration)) else: print('Not valid DOT') print('\n-----------\n') # implement 0.5 delay between requests time.sleep(0.5)
Output:DOT: 192123
Year: 2019, Status: unregistered
Year: 2018, Status: unregistered
Year: 2017, Status: unregistered
-----------
DOT: 1921
Year: 2019, Status: unregistered
Year: 2018, Status: unregistered
Year: 2017, Status: unregistered
-----------
DOT: 192
Not valid DOT
-----------
DOT: 1712583
Year: 2019, Status: unregistered
Year: 2018, Status: unregistered
Year: 2017, Status: unregistered
-----------
You can print the entire json to see all the information available.probably it's a good idea to implement user-agent and proxy rotation if you are going to do a lot of requests in order to avoid detection.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs