Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Scraping Images from Missing/ Exploited Children Site for Use with Rekognition

I'm relatively new to Python and am trying to get some help scraping images from the following site: (National Center for Missing and Exploited Children)

I would then like to upload them into an AWS S3 bucket for comparison against images in another bucket using Rekognition.

I've tried numerous tutorials with no luck. Any tips/ advice, even just pointing me to a useful tutorial, would be very much appreciated! We're trying to locate children victims of human trafficking.


Look at Web-Scraping part-1 and part-2

Some hint's,find name and NCMC number.
With NCMC number can make url for the large image,then do not need to follow link to get it.
If there are 2 images of person it will be after NCMC c1 first image e1 second image.
Quick example first person.
import requests
from bs4 import BeautifulSoup

url = ''
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
first_pers = soup.find('td', width="40%") #find_all for all
Usage test:
>>> name = first_pers.find_all('b')[0].text
>>> name
>>> ncmc = first_pers.find_all('b')[1].text
>>> ncmc
>>> # Make url for large image
>>> img_ncmc_url = f'{ncmc}c1.jpg'
>>> img_ncmc_url

Top Page

Forum Jump:

Users browsing this thread: 1 Guest(s)