Array manipulation

0zMeister · Feb-09-2020, 05:52 AM

Hi,

I've got some data in a csv file in the following format:
ID Number
a 1
a 2
a 3
a 4
b 5
b 6
b 7
b 8
c 9
c 10
c 11
c 12

I need it in a matrix/tabular format i.e.:
a b c
1 5 9
2 6 10
3 7 11
4 8 12

I can read the csv into a dataframe, etc. I'm struggling to convert the array into the matrix using numpy and/or panda array manipulations.

I can isolate a specific set of data using something like data.Number[data.ID='a']. I can also get the unique ID's data.ID.unique.

I was thinking of using a for uniqueID in data.ID.unique loop to then append the data to a new array in amatrix format.

Any sugestions on how to append the arrays and allocate headers to form the matrix?

Cheers,

OM

**Larz60+** · Feb-09-2020, 07:25 AM

what have you tried so far? show code working or not

**perfringo** · Feb-09-2020, 07:56 AM

I would read it into Python dictionary (where key is letter and value is list of digits). From that it's easy to convert to dataframe of required format if needed.

0zMeister · Feb-09-2020, 08:30 AM

I've found a 'work around' method. It works in this case, because each 'ID' has the same amount of values. It's not ideal since you might have a case where not all ID's have the same amount of values:

df = pd.read_csv("MBV2rawdata.csv") #csv file with all the data
cpid = df.CPID.unique() # get unique IDs
xnp = df.to_numpy()
xval = xnp[:,2] # only need the values in column 2 of the dataframe
xrs = np.reshape(xval, (round(len(df)/len(cpid)), len(cpid)))

jefsummers · Feb-09-2020, 01:42 PM

OK, don't have time this am to debug this - I KNOW it will not work as written, but posting it to convey the ideas behind it.
Import the csv into a pandas dataframe, then create 3 dataframes from that using conditionals to get your a, b, c frames. Then reassemble.
I doubt I am doing the conditionals correctly - leave that to you to debug. Adding the columns should work.

import pandas as pd
import numpy as np
# read the csv file into a dataframe
df = pd.read_csv('frame.csv')
#create 3 dataframes for the 3 letters
dfA = df[df.iloc[0] == 'a']
dfB = df[df.iloc[0] == 'b']
dfC = df[df.iloc[0] == 'c']
#now combine the dataframes
dfA['b'] = dfB
dfA['c'] = dfC

0zMeister · Feb-09-2020, 07:23 PM

Thanks jefsummers, that's exactly the idea I have in terms of concept. The only difference being that in this case I've got about 40 'a', 'b', 'c's etc. Thus I'd like to do it with a for loop instead of manually. Although I've been programming since forever, I'm new to Python and struggling a bit with the syntax, commands, etc.

baquerik · Feb-09-2020, 08:57 PM

You could create a set extracting all the unique IDs.

Then create a new dataframe and in a loop add every new column quering the old dataframe for every element in the set.

0zMeister · Feb-15-2020, 12:34 AM

Ok, so I've managed to get it working, although it might not be the most elegant solution. I build the new dataframe based on data extracted from the old dataframe. Innitially I got a lot of 'nan's in the new dataframe. I traced it to an indexing issue where the data I extract from the old dataframe keep it's index, which causes a mismatch with the new dataframe's index, therefore causing the 'nan'. So before adding the data extracted from the old dataframe, I needed to reindex it to align with the new dataframe's index. See code below. As I'm new to python I welcome comments on how to do this better, more efficiently. For some reason the reindexing takes quite a bit of processing time.

# read csv into dataframe
df = pd.read_csv("MBV2rawdata.csv") #CPID, Date, Value, Class

# I need this in the format Date, CPID#1, CPID#2, ... , CPID#n, Class


cpid = df.CPID.unique()


sf = df.Date[df.CPID == 24021] #Sf is the newly created dataframe

sf = pd.DataFrame(sf)


i=0
for colname in cpid:
    col = pd.DataFrame(df.Value[df.CPID==colname]) #get the column from df   
    for j in range(len(col)):   #have to reindex col to align with sf's index
        col.rename(index = { j + len(col)*i : j}, inplace=True)
    sf.insert(i+1, colname, col, True) #add col to sf
    i+=1


sf['Class']=df.Class[df.CPID == 24021] # add the 'Class' vlaues to sf

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	array manipulation	divon	3	1,740	Sep-03-2021, 09:44 AM Last Post: divon
	Help with array manipulation	pberrett	4	2,381	Mar-30-2020, 03:02 AM Last Post: pberrett
	N-Dim array manipulation in a loop, getting IndexError: too many indices for array	cesardepaula	1	4,441	Mar-13-2019, 01:39 AM Last Post: scidam

Array manipulation

User Panel Messages

Announcements