Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Array manipulation
#1
Hi,

I've got some data in a csv file in the following format:
ID Number
a 1
a 2
a 3
a 4
b 5
b 6
b 7
b 8
c 9
c 10
c 11
c 12

I need it in a matrix/tabular format i.e.:
a b c
1 5 9
2 6 10
3 7 11
4 8 12

I can read the csv into a dataframe, etc. I'm struggling to convert the array into the matrix using numpy and/or panda array manipulations.

I can isolate a specific set of data using something like data.Number[data.ID='a']. I can also get the unique ID's data.ID.unique.

I was thinking of using a for uniqueID in data.ID.unique loop to then append the data to a new array in amatrix format.

Any sugestions on how to append the arrays and allocate headers to form the matrix?

Cheers,

OM
Reply
#2
what have you tried so far? show code working or not
Reply
#3
I would read it into Python dictionary (where key is letter and value is list of digits). From that it's easy to convert to dataframe of required format if needed.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#4
I've found a 'work around' method. It works in this case, because each 'ID' has the same amount of values. It's not ideal since you might have a case where not all ID's have the same amount of values:

df = pd.read_csv("MBV2rawdata.csv") #csv file with all the data
cpid = df.CPID.unique() # get unique IDs
xnp = df.to_numpy()
xval = xnp[:,2] # only need the values in column 2 of the dataframe
xrs = np.reshape(xval, (round(len(df)/len(cpid)), len(cpid)))
Reply
#5
OK, don't have time this am to debug this - I KNOW it will not work as written, but posting it to convey the ideas behind it.
Import the csv into a pandas dataframe, then create 3 dataframes from that using conditionals to get your a, b, c frames. Then reassemble.
I doubt I am doing the conditionals correctly - leave that to you to debug. Adding the columns should work.
import pandas as pd
import numpy as np
# read the csv file into a dataframe
df = pd.read_csv('frame.csv')
#create 3 dataframes for the 3 letters
dfA = df[df.iloc[0] == 'a']
dfB = df[df.iloc[0] == 'b']
dfC = df[df.iloc[0] == 'c']
#now combine the dataframes
dfA['b'] = dfB
dfA['c'] = dfC
Reply
#6
Thanks jefsummers, that's exactly the idea I have in terms of concept. The only difference being that in this case I've got about 40 'a', 'b', 'c's etc. Thus I'd like to do it with a for loop instead of manually. Although I've been programming since forever, I'm new to Python and struggling a bit with the syntax, commands, etc.
Reply
#7
You could create a set extracting all the unique IDs.

Then create a new dataframe and in a loop add every new column quering the old dataframe for every element in the set.
Reply
#8
Ok, so I've managed to get it working, although it might not be the most elegant solution. I build the new dataframe based on data extracted from the old dataframe. Innitially I got a lot of 'nan's in the new dataframe. I traced it to an indexing issue where the data I extract from the old dataframe keep it's index, which causes a mismatch with the new dataframe's index, therefore causing the 'nan'. So before adding the data extracted from the old dataframe, I needed to reindex it to align with the new dataframe's index. See code below. As I'm new to python I welcome comments on how to do this better, more efficiently. For some reason the reindexing takes quite a bit of processing time.

# read csv into dataframe
df = pd.read_csv("MBV2rawdata.csv") #CPID, Date, Value, Class

# I need this in the format Date, CPID#1, CPID#2, ... , CPID#n, Class


cpid = df.CPID.unique()


sf = df.Date[df.CPID == 24021] #Sf is the newly created dataframe

sf = pd.DataFrame(sf)


i=0
for colname in cpid:
    col = pd.DataFrame(df.Value[df.CPID==colname]) #get the column from df   
    for j in range(len(col)):   #have to reindex col to align with sf's index
        col.rename(index = { j + len(col)*i : j}, inplace=True)
    sf.insert(i+1, colname, col, True) #add col to sf
    i+=1


sf['Class']=df.Class[df.CPID == 24021] # add the 'Class' vlaues to sf
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  array manipulation divon 3 1,740 Sep-03-2021, 09:44 AM
Last Post: divon
  Help with array manipulation pberrett 4 2,381 Mar-30-2020, 03:02 AM
Last Post: pberrett
  N-Dim array manipulation in a loop, getting IndexError: too many indices for array cesardepaula 1 4,441 Mar-13-2019, 01:39 AM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020