Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Dataframe not appending correctly
#1
I have a dictionary of files in this format:

{'filea': ['test/folder2/filea', 'test/folder3/filea', 'test/folder1/filea'],
'fileb': ['test/folder2/fileb', 'test/folder3/fileb', 'test/folder1/fileb'],
'filec': ['test/folder2/filec', 'test/folder3/filec', 'test/folder1/filec']}

and I have created a for loop to go through each filename and create a dataframe that combines the files corresponding to each key in the dictionary above but when I run my loop, the next fileb in this case is getting appended into the dataframe created for file a. I am not sure how to fix this as I spent a few hours to no avail at solving this problem probably also because I have a very long code in between to understand where my mistake is with indentation. My code is as below:

Lets say the dictionary above is called file_list.
for key,files in file_list.items():
    #dataset = pd.Dataframe()
    for i in files: #loop over the files in each key
       #do something....

    df = pd.DataFrame({'A':B,'C':D,'E':F})
    print('This dataframe has the shape:',df.shape)

    #save dataframe
    df.to_hdf('xxx.hdf'.format(key[0:-4]),mode='w', key='df')

I still can't really see where my mistake is as when the loop works on the files in fileb, it is getting appended into the dataframe that has the data from filea instead of creating a whole new dataframe for fileb.
Any help on this is much appreciated!
Quote
#2
There are lots of questions here: What are B, D, F in line #6? what means 'xxx.hdf', why did you use key[0:-4], are all of your keys have length 5? Did you use pd.concat somewhere in your code? Should 'xxx.hdf'.format(key[0:-4]) be something like '{}.hdf'.format(key)?
Quote
#3
(Jul-10-2019, 12:17 AM)scidam Wrote: There are lots of questions here: What are B, D, F in line #6? what means 'xxx.hdf', why did you use key[0:-4], are all of your keys have length 5? Did you use pd.concat somewhere in your code? Should 'xxx.hdf'.format(key[0:-4]) be something like '{}.hdf'.format(key)?

B, D, F are separate arrays that were concatenated and appended to an array for each feature. 'A', 'C' and 'E' are the columns names. Sorry for the vague names but just for the sake of preserving the data confidentiality.

xxx.hdf is the file name that I want to give for each of the dataframe created and this name is derived from the key value in the dictionary. It is supposed to be df.to_hdf('C:\...\....\{}.hdf'.format(key[0:-4]),mode='w', key='df'). key[0:-4] allows me to extract only the key name upto before the last 4 characters (to remove '.csv') in the end of the key value. So originally the key values are filea.csv, fileb.csv and filec.csv

I hope this explains.

I think what is happening is the arrays that are getting appended are not getting removed so I do not know if I will be doing the right thing to add the code below after saving the dataframe:

del(B)
del(D)
del(F)
or if there is a more efficient way to reset these arrays?

alright, i have solved my problem. I placed creating the empty arrays in the beginning of the code to be after the first for loop instead of having it before the first for loop so that every time it loops, a new set of empty arrays is created. I guess I just needed a break and your questions to think through! Thanks!
Quote

Top Page

Forum Jump:


Users browsing this thread: 1 Guest(s)