Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Scraping Images from Missing/ Exploited Children Site for Use with Rekognition
#1
Hello,

I'm relatively new to Python and am trying to get some help scraping images from the following site: https://api.missingkids.org/missingkids/...ssState=SC (National Center for Missing and Exploited Children)

I would then like to upload them into an AWS S3 bucket for comparison against images in another bucket using Rekognition.

I've tried numerous tutorials with no luck. Any tips/ advice, even just pointing me to a useful tutorial, would be very much appreciated! We're trying to locate children victims of human trafficking.

Thanks!

Cody
Quote
#2
Look at Web-Scraping part-1 and part-2

Some hint's,find name and NCMC number.
With NCMC number can make url for the large image,then do not need to follow link to get it.
If there are 2 images of person it will be after NCMC c1 first image e1 second image.
Quick example first person.
import requests
from bs4 import BeautifulSoup

url = 'https://api.missingkids.org/missingkids/servlet/PubCaseSearchServlet?act=usMapSearch&missState=SC'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
first_pers = soup.find('td', width="40%") #find_all for all
Usage test:
>>> name = first_pers.find_all('b')[0].text
>>> name
'FRANCISCO ALBERTO ALVARADO'
>>> ncmc = first_pers.find_all('b')[1].text
>>> ncmc
'NCMC1373468'
>>> 
>>> # Make url for large image
>>> img_ncmc_url = f'http://api.missingkids.org/photographs/{ncmc}c1.jpg'
>>> img_ncmc_url
'http://api.missingkids.org/photographs/NCMC1373468c1.jpg' 
Quote

Top Page

Forum Jump:


Users browsing this thread: 1 Guest(s)