Python Forum

Full Version: How to sort image files according to a metadata file?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I would like to sort image files with same tags into a specific folder.
(The HAM 10000 dataset, https://dataverse.harvard.edu/dataset.xh...DVN/DBW86T)

A metadata file (HAM10000_metadata.csv) was made like this;

lesion_id image_id dx dx_type age sex localization
HAM_0000550 ISIC_0024306 nv follow_up 45 male trunk
HAM_0003577 ISIC_0024307 nv follow_up 50 male lower extremity
HAM_0001477 ISIC_0024308 nv follow_up 55 female trunk
HAM_0000484 ISIC_0024309 nv follow_up 40 male trunk
HAM_0003350 ISIC_0024310 mel histo 60 male chest
HAM_0000981 ISIC_0024311 nv follow_up 75 female back
HAM_0001359 ISIC_0024312 bkl histo 75 male lower extremity
HAM_0002869 ISIC_0024313 mel histo 50 female back
HAM_0002198 ISIC_0024314 nv histo 75 male lower extremity
.
.
. so on

And, all image files were named after image_id (ISIC_XXXXXXX.jpeg).
What I want is to sort these image files (ISIC_XXXXXXX.jpeg) according to the variable, "dx" (nv, mel, bkl,...) ! In a metadata file (HAM10000_metadata.csv), there are seven different values of "dx" (akiec, bcc, bk1, mel, df, vasc, nv).
Therefore, I want to put these 10,000 image files with a same value of "dx" into 7 different folders according to the metadata file which contains the macthed value of "dx" of every image file.

How can I make a script?
(all files are located at c:\ and I would like make new seven folders with same name of "dx")

Thank you for your help!!
You can use pandas.read_csv to load csv into Python memory as Pandas.DataFrame instance. Further, you can use DataFrame's groupby method. To use it you will need to specify desired grouping variables (e.g. 'dx_type' in your case). Finally, you can iterate over all groups, find which files belong to each group (from image_id column) and copy these files to specific directories.