Python Forum
How to sort image files according to a metadata file? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to sort image files according to a metadata file? (/thread-22968.html)



How to sort image files according to a metadata file? - Brahmslove - Dec-05-2019

I would like to sort image files with same tags into a specific folder.
(The HAM 10000 dataset, https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T)

A metadata file (HAM10000_metadata.csv) was made like this;

lesion_id image_id dx dx_type age sex localization
HAM_0000550 ISIC_0024306 nv follow_up 45 male trunk
HAM_0003577 ISIC_0024307 nv follow_up 50 male lower extremity
HAM_0001477 ISIC_0024308 nv follow_up 55 female trunk
HAM_0000484 ISIC_0024309 nv follow_up 40 male trunk
HAM_0003350 ISIC_0024310 mel histo 60 male chest
HAM_0000981 ISIC_0024311 nv follow_up 75 female back
HAM_0001359 ISIC_0024312 bkl histo 75 male lower extremity
HAM_0002869 ISIC_0024313 mel histo 50 female back
HAM_0002198 ISIC_0024314 nv histo 75 male lower extremity
.
.
. so on

And, all image files were named after image_id (ISIC_XXXXXXX.jpeg).
What I want is to sort these image files (ISIC_XXXXXXX.jpeg) according to the variable, "dx" (nv, mel, bkl,...) ! In a metadata file (HAM10000_metadata.csv), there are seven different values of "dx" (akiec, bcc, bk1, mel, df, vasc, nv).
Therefore, I want to put these 10,000 image files with a same value of "dx" into 7 different folders according to the metadata file which contains the macthed value of "dx" of every image file.

How can I make a script?
(all files are located at c:\ and I would like make new seven folders with same name of "dx")

Thank you for your help!!


RE: How to sort image files according to a metadata file? - scidam - Dec-05-2019

You can use pandas.read_csv to load csv into Python memory as Pandas.DataFrame instance. Further, you can use DataFrame's groupby method. To use it you will need to specify desired grouping variables (e.g. 'dx_type' in your case). Finally, you can iterate over all groups, find which files belong to each group (from image_id column) and copy these files to specific directories.