Python Forum

Full Version: Downloading images from webpages
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all,

I have thousands of images to download that I need to automate. I tried a number of solutions I found on the internet, but they all produce an empty file.
Each link looks like this: http://g2w.ubi.com/farcry2/thumb.php?id=...4d995a558a
There is just one small image in the middle but the link is not pointing directly to a file (which I'm guessing the problem is.) In the browser the image can be downloaded as a png file, also, saving the whole page as will also produce that png image. Would anyone have an idea how python could grab and download that image?
That image is on a non-secure site, so shouldn't be opened.
The site use JavaScript when click on link to generate source link to image.
Could use Selenium for this,and other way is to look as source code to see what's going on.
Can give a quick demo as this may be not a so easy if new to this.
In source download soup are all id's in a JavaScript array,can use regex to grab all id's.
Then do new call with new url that take id as parameter.
import requests
from bs4 import BeautifulSoup
import re

url = 'http://g2w.ubi.com/farcry2/?page=1'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
# Get all id's for page 1
tumb_id = re.findall(r"id=(.*)';", str(soup), re.MULTILINE)
# First one
params = (('id', tumb_id[0]),)
# Download
response = requests.get('http://g2w.ubi.com/farcry2/thumb.php', params=params)
with open('tumb.png', 'wb') as f:
    f.write(response.content)