Python Forum
Python + request from specific website - please help
Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python + request from specific website - please help
#4
It's a bit tricky, because the site uses javascript. But you are lucky because closer inspection at the requests being send from browser reveals they get the data from api in json format. Then we can replicate the request headers as closely as possible.
Here is something to start with

import requests
from bs4 import BeautifulSoup
import time

def get_json(query):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'en-US,en;q=0.5',
        'Referer': 'https://ucr.gov/enforcement/{}'.format(query),
        'Cache-Control': 'no-cache,no-store,must-revalidate,max-age=0,private',
        'UCR-UI-Version': '19.2.1',
        'Origin': 'https://ucr.gov',
        'Connection': 'keep-alive',
    }
 
    s = requests.Session()
     
    params = (
        ('pageNumber', '0'),
        ('itemsPerPage', '15'),
    )

    url = 'https://admin.ucr.gov/api/enforcement/{}'.format(query)
    response = s.get(url, headers=headers, params=params)
    return response.json()


if __name__ == '__main__':
    dots = [192123, 1921, 192, 1712583]
    for query in dots:
        data = get_json(query=query)
        print('DOT: {}'.format(query))
        if data.get('carrier'):
            for registration in data['history']['registrations']:
                print('Year: {year}, Status: {status}'.format(**registration))
        else:
            print('Not valid DOT')
        print('\n-----------\n')
        
        # implement 0.5 delay between requests
        time.sleep(0.5)
Output:
DOT: 192123 Year: 2019, Status: unregistered Year: 2018, Status: unregistered Year: 2017, Status: unregistered ----------- DOT: 1921 Year: 2019, Status: unregistered Year: 2018, Status: unregistered Year: 2017, Status: unregistered ----------- DOT: 192 Not valid DOT ----------- DOT: 1712583 Year: 2019, Status: unregistered Year: 2018, Status: unregistered Year: 2017, Status: unregistered -----------
You can print the entire json to see all the information available.
probably it's a good idea to implement user-agent and proxy rotation if you are going to do a lot of requests in order to avoid detection.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Messages In This Thread
RE: Python + request from specific website - please help - by buran - Feb-06-2019, 08:28 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Question Python request (post/get) Drunknmonkie 1 2,680 Jan-19-2023, 02:02 PM
Last Post: prvncpa
  Retrieve website content using Python? Vadanane 1 1,290 Jan-16-2023, 09:55 AM
Last Post: Axel_Erfurt
  I want to create an automated website in python mkdhrub1 2 2,457 Dec-27-2021, 11:27 PM
Last Post: Larz60+
  Python to build website Methew324 1 2,249 Dec-15-2020, 05:57 AM
Last Post: buran
  Scraping all website text using Python MKMKMKMK 1 2,104 Nov-26-2020, 10:35 PM
Last Post: Larz60+
  Python Webscraping with a Login Website warriordazza 0 2,621 Jun-07-2020, 07:04 AM
Last Post: warriordazza
  Python tool based on website? zarize 2 2,507 Mar-21-2020, 02:25 PM
Last Post: zarize
  hi new at python , trying to get urls from website dviry 6 4,756 Feb-24-2018, 07:34 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020