Posts: 8
Threads: 1
Joined: Jun 2024
Jun-17-2024, 02:43 PM
(This post was last modified: Jun-21-2024, 05:36 PM by charled.)
Hi everyone,
This is my first post here. I never used Python before.
I got 12 000 images to download from a server. Each image is named by an ID.
For this work, we've made a csv file with 2 columns :
- the absolute url of each file,
- the name to replace the /id.ext by /name.ext
So the script has to :
- pick the first url,
- download the file to a disk renaming it.
I suppose it is very simple to do with Python. Where to find the help or some scripts ?
Thanks for your help.
Posts: 7,320
Threads: 123
Joined: Sep 2016
(Jun-17-2024, 02:43 PM)charled Wrote: I suppose it is very simple to do with Python. Where to find the help or some scripts ? We usually like to see some effort or it's more of a small job description.
It's not a hard task,but if you have never used Python then it can be.
To help with start to read the .csv file,try to run code and look at output is ok before downloading url.
And use a smaller a sample,do not test with all 12 000.
#import requests
import csv
from pathlib import Path
csv_file = Path('your.csv')
#output_dir = Path('downloaded_images')
# Create the directory if it doesn't exist
#output_dir.mkdir(parents=True, exist_ok=True)
# Read the CSV file and download images(not finish "Requests")
with csv_file.open(mode='r', newline='') as fp:
reader = csv.reader(fp)
# Skip the header row
header = next(reader)
for row in reader:
url = row[0]
new_name = row[1]
print(url, new_name) Output: https://example.com/image1.jpg image1_new_name.jpg
https://example.com/image2.png image2_new_name.png
https://example.com/image3.gif image3_new_name.gif
https://example.com/image4.jpg image4_new_name.jpg
https://example.com/image5.png image5_new_name.png
Posts: 8
Threads: 1
Joined: Jun 2024
Hi Snippsat.
I would prefer have time to start and learn Python. But my client asked me help yesterday with a deadline at june 30... of course... After that, photos will be erased.
So thanks for your help. I'll try it immediately.
Posts: 8
Threads: 1
Joined: Jun 2024
So I tried the code and get this error
Error: python3 '/home/jluc/Documents/CLIENTS/Découvertes/Ezus/recup_images_ezus.py'
Traceback (most recent call last):
File "/home/jluc/Documents/CLIENTS/Découvertes/Ezus/recup_images_ezus.py", line 17, in <module>
new_name = row[1]
IndexError: list index out of range
Sounds like this is a problem with csv file. Field delimiter is ; . i tried some other but same error. Here is the content
"url";"new_name"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1675942561904.jpeg";"AL_Colmar_1"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1675942561915.jpeg";"AL_Colmar_2"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1675942561919.jpeg";"AL_Colmar_3"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1675942561923.jpeg";"AL_Colmar_4"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1677684808533.jpeg";"AL_Colmar_5"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1677684808539.jpeg";"AL_Colmar_6"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1677684808591.jpeg";"AL_Colmar_7"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1677684808629.jpeg";"AL_Colmar_8"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1678457387970.jpg";"AL_Domremy_1"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1678457387974.jpg";"AL_Domremy_2"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1678457387978.jpg";"AL_Domremy_3"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1678457387985.jpg";"AL_Domremy_4"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1678785998140.jpg";"AL_Luxembourg_1"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1678785998145.jpg";"AL_Luxembourg_2"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1678785998167.jpg";"AL_Luxembourg_3"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1678785998211.jpg";"AL_Luxembourg_4"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1675938805803.jpg";"AL_Metz_1"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1675938805807.jpg";"AL_Metz_2"
"https://ezus-cmtyhfgfzxdnjtahdgpfmgfj.s3.amazonaws.com/media/1675938805810.jpeg";"AL_Metz_3" In case, here is my code
#import requests
import csv
from pathlib import Path
csv_file = Path('/home/jluc/Documents/CLIENTS/Découvertes/Ezus/testrecup.csv')
#output_dir = Path('/home/jluc/Documents/CLIENTS/Découvertes/Ezus/images')
# Create the directory if it doesn't exist
#output_dir.mkdir(parents=True, exist_ok=True)
# Read the CSV file and download images(not finish "Requests")
with csv_file.open(mode='r', newline='') as fp:
reader = csv.reader(fp)
# Skip the header row
header = next(reader)
for row in reader:
url = row[0]
new_name = row[1]
print(url, new_name)
Posts: 1,094
Threads: 143
Joined: Jul 2017
This gets your files OK. The problem, as I see it is, the files may have different endings, .jpg .jpeg .png
Better get the files with the original name, then rename if you really need to!
After fetching, rename them if you really want to, using a loop
import requests
import csv
from pathlib import Path
path2csv = '/home/pedro/myPython/requests/csv/french_photos.csv'
savepath = '/home/pedro/myPython/requests/csv/downloaded_images'
savep = Path(savepath)
# from snippsat with small changes by me
csv_file = Path(path2csv)
output_dir = Path(savepath)
# Create the directory if it doesn't exist
output_dir.mkdir(parents=True, exist_ok=True)
# Read the CSV file and download images(not finish "Requests")
with csv_file.open(mode='r', newline='') as fp:
# your csv delimiter is ;
reader = csv.reader(fp, delimiter=';')
# Skip the header row
header = next(reader)
for row in reader:
url = row[0]
savename = url.split('/')[-1]
save_file = savep / savename
#new_name = row[1]
print(url)
print(save_file)
with open(save_file, 'wb') as f:
f.write(requests.get(url).content)
# now run a loop to rename the files if you wish Hope the client is happy!
Posts: 2,126
Threads: 11
Joined: May 2017
Just for fun.
import csv
from bisect import bisect_left as bisect
from pathlib import Path
from urllib.parse import urlparse
from urllib.request import urlopen
def read_csv(file):
with open(file, newline="", encoding="ascii") as fd:
reader = csv.reader(fd, delimiter=";")
# skipping header
next(reader)
yield from reader
def transform_rows(csv_file):
for url, name in read_csv(csv_file):
source_file = Path(urlparse(url).path)
yield url, Path(name).with_suffix(source_file.suffix.lower())
def get_size(response):
headers = dict(response.getheaders())
return int(headers["Content-Length"]) if "Content-Length" in headers else None
class Progress:
def __init__(self, response):
self.size = get_size(response)
self.last_msg = ""
self.percentages = [0.25, 0.5, 0.75, 1.0]
def update(self, transferred):
if self.size is None:
return
relative = transferred / self.size
value = self.percentages[bisect(self.percentages, relative)]
current_msg = f"{value:.0%}"
if self.last_msg != current_msg:
print(current_msg, end=" ", flush=True)
self.last_msg = current_msg
def download(url, target_dir, target_file):
with open(target_dir / target_file, "wb") as fd:
with urlopen(url) as response:
transferred = 0
progress = Progress(response)
while chunk := response.read(1024):
transferred += len(chunk)
fd.write(chunk)
progress.update(transferred)
def main(csv_file, target_dir):
target_dir = Path(target_dir)
target_dir.mkdir(parents=True, exist_ok=True)
for url, file in transform_rows(csv_file):
print(f"Downloading {file}", end=" ")
download(url, target_dir, file)
print()
if __name__ == "__main__":
main(
r"C:\Users\YOUR_USER\Desktop\testrecup.csv",
r"C:\Users\YOUR_USER\Desktop\XYZFK",
)
Pedroski55 likes this post
Posts: 8
Threads: 1
Joined: Jun 2024
Thanks Pedro. I don't understand why I can't save directly the files with the right name.
Posts: 7,320
Threads: 123
Joined: Sep 2016
Here is working code based on .csv you posted.
import requests
import csv
from pathlib import Path
csv_file = Path('url_am.csv')
output_dir = Path('downloaded_images')
# Create the directory if it doesn't exist
output_dir.mkdir(parents=True, exist_ok=True)
with csv_file.open(mode='r', newline='') as file:
reader = csv.reader(file, delimiter=';')
header = next(reader)
for row in reader:
url = row[0]
new_name = row[1]
#print(url, new_name)
response = requests.get(url)
if response.status_code == 200:
# Create the full path for the new image
new_name = f'{new_name}.jpg'
file_path = output_dir / new_name
# Save the image to disk
with file_path.open('wb') as image_file:
image_file.write(response.content)
print(f'Successfully downloaded --> {new_name}')
else:
print(f'Failed to download {url} - Status code: {response.status_code}') Output: Successfully downloaded --> AL_Colmar_1.jpg
Successfully downloaded --> AL_Colmar_2.jpg
Successfully downloaded --> AL_Colmar_3.jpg
Successfully downloaded --> AL_Colmar_4.jpg
Successfully downloaded --> AL_Colmar_5.jpg
Successfully downloaded --> AL_Colmar_6.jpg
Successfully downloaded --> AL_Colmar_7.jpg
Successfully downloaded --> AL_Colmar_8.jpg
Successfully downloaded --> AL_Domremy_1.jpg
Successfully downloaded --> AL_Domremy_2.jpg
Successfully downloaded --> AL_Domremy_3.jpg
Successfully downloaded --> AL_Domremy_4.jpg
Successfully downloaded --> AL_Luxembourg_1.jpg
Successfully downloaded --> AL_Luxembourg_2.jpg
Successfully downloaded --> AL_Luxembourg_3.jpg
Successfully downloaded --> AL_Luxembourg_4.jpg
Successfully downloaded --> AL_Metz_1.jpg
Successfully downloaded --> AL_Metz_2.jpg
Successfully downloaded --> AL_Metz_3.jpg
Posts: 8
Threads: 1
Joined: Jun 2024
Thanks Snippsat. It works well.
Just one more thing : not all images are jpg, some are .png. Is it possible to copy the right extension ?
Thanks.
Posts: 7,320
Threads: 123
Joined: Sep 2016
Jun-19-2024, 07:17 PM
(This post was last modified: Jun-19-2024, 07:18 PM by snippsat.)
(Jun-19-2024, 06:55 PM)charled Wrote: Just one more thing : not all images are jpg, some are .png. Is it possible to copy the right extension ? Change line 19 to this.
new_name = f"{new_name}.{response.url.split('.')[-1]}"
|