New column based on list

Olga · (This post was last modified: May-04-2018, 04:13 PM by Olga.)

I have a csv file (VV_AL_3T3_P3.csv) and each of the rows of each csv file correspond to tiff images of plankton. It looks like this:

Particle_ID Diameter Image_File Lenght ....etc
1 15.36 VV_AL_3T3_P3_R3_000001.tif 18.09
2 17.39 VV_AL_3T3_P3_R3_000001.tif 19.86
3 17.21 VV_AL_3T3_P3_R3_000001.tif 21.77
4 9.42 VV_AL_3T3_P3_R3_000001.tif 9.83

The images were located all together in a folder and then classified by shape in folders. The name of the tiff images is formed by the Image_file + Particle ID; for example for the first row: VV_AL_3T3_P3_R3_000001_1.tiff

Now, I want to add a new column called 'Class' into the csv file that I already have (VV_AL_3T3_P3.csv) with the name of the folder where each .tiff file is located (the class) using python; like this:

Particle_ID Diameter Image_File Lenght Class
1 15.36 VV_AL_3T3_P3_R3_000001.tif 18.09 Spherical
2 17.39 VV_AL_3T3_P3_R3_000001.tif 19.86 Elongated
3 17.21 VV_AL_3T3_P3_R3_000001.tif 21.77 Pennates
4 9.42 VV_AL_3T3_P3_R3_000001.tif 9.83 Others

So far, I have a list with the names of the folders where every tiff file is located. This is the list that will be the new column. However, how can I do to fit every folder with its row? In other words, matching the 'Class' with 'Particle ID' and 'Image file'.

For now:
## Load modules:
import os
import pandas as pd
import numpy as np
import cv2

## Function to recursively list files in dir by extension
def file_match(path,extension):
cfiles = []
for root, dirs, files in os.walk('./'):
for file in files:
if file.endswith(extension):
cfiles.append(os.path.join(root, file))
return cfiles

## Load all image file at all folders:
image_files = file_match(path='./',extension='.tiff')

## List of directories where each image was found:
img_dir = [os.path.dirname(one_img)[2:] for one_img in image_files]
len(img_dir)

## List of images:
# Image file column in csv files:
img_file = [os.path.basename(one_img)[:22] for one_img in image_files]
len(img_file)
# Particle id column in csv files:
part_id = [os.path.basename(one_img)[23:][:-5] for one_img in image_files]
len(part_id)

## I have the information related with the collage picture, particle id and the classification folder.
# Now i need to create a loop where this information is merged...

## Load csv file:
data = pd.read_csv('VV_AL_3T3.csv')
sample_file = data['Image File'] # Column name
sample_id = data['Particle ID'] # Particle ID

I have seen a similar case here: Create new column in dataframe with match values from other dataframe

but I don't really know how to use the 'map.set_index' and also, he has two data frames whereas I just have one.

**nilamo** · May-04-2018, 04:41 PM

So what is it you're trying to accomplish? Adding a column to the space-separated file (it's not csv, please don't call it that, you'll only confuse people)? Or have you already added that info to the file, and now are trying to use it to open files?

Olga · (This post was last modified: May-07-2018, 07:47 AM by Olga.)

What I want to reach is create a new column in the space-separated file with the 'Class' which is the name of the folder where each .tiff file is located.
I have a bunch of .tiff files contained in several folders name with different types of classes like 'Elongated', 'Spherical'...etc. The name of those files is composed by the 'Image File' and the 'Particle ID' (ex. VV_AL_3T3_P3_R1_0001_20 where the last part '20' is the Particle ID.
I do also have a space-separated file with the information of those tiff files. Image File and Particle ID are two different columns. Now I want a new column with the classes. I have manage to create that new column. However, the classes don't fit with theirs belonging tiff files.

New column based on list

User Panel Messages

Announcements