![]() |
Scraping Images from Missing/ Exploited Children Site for Use with Rekognition - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Scraping Images from Missing/ Exploited Children Site for Use with Rekognition (/thread-24381.html) |
Scraping Images from Missing/ Exploited Children Site for Use with Rekognition - codytsterling - Feb-11-2020 Hello, I'm relatively new to Python and am trying to get some help scraping images from the following site: https://api.missingkids.org/missingkids/servlet/PubCaseSearchServlet?act=usMapSearch&missState=SC (National Center for Missing and Exploited Children) I would then like to upload them into an AWS S3 bucket for comparison against images in another bucket using Rekognition. I've tried numerous tutorials with no luck. Any tips/ advice, even just pointing me to a useful tutorial, would be very much appreciated! We're trying to locate children victims of human trafficking. Thanks! Cody RE: Scraping Images from Missing/ Exploited Children Site for Use with Rekognition - snippsat - Feb-11-2020 Look at Web-Scraping part-1 and part-2 Some hint's,find name and NCMC number.With NCMC number can make url for the large image,then do not need to follow link to get it. If there are 2 images of person it will be after NCMC c1 first image e1 second image.Quick example first person. import requests from bs4 import BeautifulSoup url = 'https://api.missingkids.org/missingkids/servlet/PubCaseSearchServlet?act=usMapSearch&missState=SC' response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') first_pers = soup.find('td', width="40%") #find_all for allUsage test: >>> name = first_pers.find_all('b')[0].text >>> name 'FRANCISCO ALBERTO ALVARADO' >>> ncmc = first_pers.find_all('b')[1].text >>> ncmc 'NCMC1373468' >>> >>> # Make url for large image >>> img_ncmc_url = f'http://api.missingkids.org/photographs/{ncmc}c1.jpg' >>> img_ncmc_url 'http://api.missingkids.org/photographs/NCMC1373468c1.jpg' |