Python Forum

Full Version: Scraping Images from Missing/ Exploited Children Site for Use with Rekognition
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I'm relatively new to Python and am trying to get some help scraping images from the following site: https://api.missingkids.org/missingkids/...ssState=SC (National Center for Missing and Exploited Children)

I would then like to upload them into an AWS S3 bucket for comparison against images in another bucket using Rekognition.

I've tried numerous tutorials with no luck. Any tips/ advice, even just pointing me to a useful tutorial, would be very much appreciated! We're trying to locate children victims of human trafficking.

Thanks!

Cody
Look at Web-Scraping part-1 and part-2

Some hint's,find name and NCMC number.
With NCMC number can make url for the large image,then do not need to follow link to get it.
If there are 2 images of person it will be after NCMC c1 first image e1 second image.
Quick example first person.
import requests
from bs4 import BeautifulSoup

url = 'https://api.missingkids.org/missingkids/servlet/PubCaseSearchServlet?act=usMapSearch&missState=SC'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
first_pers = soup.find('td', width="40%") #find_all for all
Usage test:
>>> name = first_pers.find_all('b')[0].text
>>> name
'FRANCISCO ALBERTO ALVARADO'
>>> ncmc = first_pers.find_all('b')[1].text
>>> ncmc
'NCMC1373468'
>>> 
>>> # Make url for large image
>>> img_ncmc_url = f'http://api.missingkids.org/photographs/{ncmc}c1.jpg'
>>> img_ncmc_url
'http://api.missingkids.org/photographs/NCMC1373468c1.jpg'