Python Forum

Full Version: Link implementation
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Total newbie here. For starters in my scraping coding, I am faced with 2 websites that implement their links differently. In one case, I can return all the links in a table element and even close the window and be just fine because the href of each link will take me to its destination anyhow, but with the other website, those links will only take me back to the disclaimer page. What are the steps for involved in circumventing this behavior if any, or best practices in scraping sites like this? I am using scrapy and selenium
Please provide something to work with,
what are the URL's?
(Aug-30-2022, 07:49 PM)Larz60+ Wrote: [ -> ]Please provide something to work with,
what are the URL's?

This is the one with links that always work:
https://www.mshp.dps.missouri.gov/HP71/search.jsp

This is the one that has a disclaimer page in which the links if not clicked directly from its own responses, only reloads the homepage:
https://casesearch.courts.state.md.us/ca...-index.jsp
Thanks for the URL's:
Just logged in (1:53 A.M. EST) will take a look in my morning if not already answered by another.
Took a quick look at the pages you provided.
The first page has a query form which can be partially filled and then searched on

I would use selenium for this both pages:
  1. 1st url because of the query form, from which you need to at lease select the date, and select the Troop from map.
    I noticed that the date range is limited (by pull-down) to last few days, you may be able to work around this.

  2. 2nd url because you must verify that you have read the conditions
    once this has been done, you will be directed to the search page which contains a form that must be filled out.

selenium can automate this process. I don't use scrappy, so can't say if it's capable or not.

I would suggest the following quick tutorials (on this forum)

Web-Scraping part-1
Web-scraping part-2