Python Forum
Scraping Images from Missing/ Exploited Children Site for Use with Rekognition
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping Images from Missing/ Exploited Children Site for Use with Rekognition
#1
Hello,

I'm relatively new to Python and am trying to get some help scraping images from the following site: https://api.missingkids.org/missingkids/...ssState=SC (National Center for Missing and Exploited Children)

I would then like to upload them into an AWS S3 bucket for comparison against images in another bucket using Rekognition.

I've tried numerous tutorials with no luck. Any tips/ advice, even just pointing me to a useful tutorial, would be very much appreciated! We're trying to locate children victims of human trafficking.

Thanks!

Cody
Reply
#2
Look at Web-Scraping part-1 and part-2

Some hint's,find name and NCMC number.
With NCMC number can make url for the large image,then do not need to follow link to get it.
If there are 2 images of person it will be after NCMC c1 first image e1 second image.
Quick example first person.
import requests
from bs4 import BeautifulSoup

url = 'https://api.missingkids.org/missingkids/servlet/PubCaseSearchServlet?act=usMapSearch&missState=SC'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
first_pers = soup.find('td', width="40%") #find_all for all
Usage test:
>>> name = first_pers.find_all('b')[0].text
>>> name
'FRANCISCO ALBERTO ALVARADO'
>>> ncmc = first_pers.find_all('b')[1].text
>>> ncmc
'NCMC1373468'
>>> 
>>> # Make url for large image
>>> img_ncmc_url = f'http://api.missingkids.org/photographs/{ncmc}c1.jpg'
>>> img_ncmc_url
'http://api.missingkids.org/photographs/NCMC1373468c1.jpg' 
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Scraping a site Henry467 0 550 Dec-10-2023, 11:26 AM
Last Post: Henry467
  How to scraping data from dinamic site sergio21124444 2 646 Nov-08-2023, 12:43 PM
Last Post: sergio21124444

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020