It's a bit tricky, because the site uses javascript. But you are lucky because closer inspection at the requests being send from browser reveals they get the data from api in json format. Then we can replicate the request headers as closely as possible.
Here is something to start with
import requests
from bs4 import BeautifulSoup
import time
def get_json(query):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://ucr.gov/enforcement/{}'.format(query),
'Cache-Control': 'no-cache,no-store,must-revalidate,max-age=0,private',
'UCR-UI-Version': '19.2.1',
'Origin': 'https://ucr.gov',
'Connection': 'keep-alive',
}
s = requests.Session()
params = (
('pageNumber', '0'),
('itemsPerPage', '15'),
)
url = 'https://admin.ucr.gov/api/enforcement/{}'.format(query)
response = s.get(url, headers=headers, params=params)
return response.json()
if __name__ == '__main__':
dots = [192123, 1921, 192, 1712583]
for query in dots:
data = get_json(query=query)
print('DOT: {}'.format(query))
if data.get('carrier'):
for registration in data['history']['registrations']:
print('Year: {year}, Status: {status}'.format(**registration))
else:
print('Not valid DOT')
print('\n-----------\n')
# implement 0.5 delay between requests
time.sleep(0.5)
Output:
DOT: 192123
Year: 2019, Status: unregistered
Year: 2018, Status: unregistered
Year: 2017, Status: unregistered
-----------
DOT: 1921
Year: 2019, Status: unregistered
Year: 2018, Status: unregistered
Year: 2017, Status: unregistered
-----------
DOT: 192
Not valid DOT
-----------
DOT: 1712583
Year: 2019, Status: unregistered
Year: 2018, Status: unregistered
Year: 2017, Status: unregistered
-----------
You can print the entire json to see all the information available.
probably it's a good idea to implement user-agent and proxy rotation if you are going to do a lot of requests in order to avoid detection.