Python Forum

I am a newbie in deep learning and try to make feature matrix with python. My sample data structure is like below,

dataset.csv

State   Earnings   Hispanic   Indian   Asian   Black   White   people_in_poverty

Alabama   0.2        0.4        0.6     0.6      0.2    0.8          a.csv
Florida   0.5        0.6        0.4     0.1      0.6    0.7          b.csv
Kentucky  0.7        0.7        0.9     0.8      0.3    0.6          c.csv
Minnesota 0.3        0.1        0.2     0.5      0.2    0.7          d.csv
....

The column names [Earnings, Hispanic, Indian, Asian, Black, White] are the attributes and people_in_poverty is the class of feature matrix. When the value of people_in_poverty is numeric, the python codes are simple.

people_in_poverty
0.7
0.3
0.2
0.6

import pandas as pd
df = pd.read_csv('dataset.csv', names=['state', 'Earnings', 'Hispanic', 'Indian', 'Asian', 'Black', 'White', 'people_in_poverty'])
dataset = df.values

However, in my case, the class of feature matrix has csv file which includes the times series data.

people_in_poverty
a.csv
b.csv
c.csv
d.csv

a.csv
2010-08-27        0.2
2010-09-27        0.7
2010-10-27        0.6
2010-11-27        0.9
2010-12-27        0.4
2011-01-27        0.8
2011-02-27        0.5
2011-03-27        0.3

Then I want to know how to modify my pd.read_csv() python codes. The class of the feature matrix is not the numeric value, but csv file containing the time series values. Any advice is needed. Thanks in advanced.

What shape do you envision for your DataFrame?

I try to make model to predict 'people_in_poverty' of each 'state' with inputs of 'earning' and 'races'. So my feature matrix would be like below,

State   Earnings   Hispanic   Indian   Asian   Black   White                             people_in_poverty
 
Alabama   0.2        0.4        0.6     0.6      0.2    0.8     [[2010-08-27,0.2], [2010-09-27,0.7], [2010-10-27,0.6], [2010-11-27,0.9]]
Florida   0.5        0.6        0.4     0.1      0.6    0.7     [[2010-08-27,0.5], [2010-09-27,0.6], [2010-10-27,0.2], [2010-11-27,0.8]]
Kentucky  0.7        0.7        0.9     0.8      0.3    0.6     [[2010-08-27,0.1], [2010-09-27,0.4], [2010-10-27,0.5], [2010-11-27,0.5]]
Minnesota 0.3        0.1        0.2     0.5      0.2    0.7     [[2010-08-27,0.6], [2010-09-27,0.3], [2010-10-27,0.7], [2010-11-27,0.3]]

Hello, How about this python codes using pandas median function. Because the 'people_in_poverty' columns have list values, I use the pandas median function to get the single value of feature matrix,

import pandas as pd

data_col1 = [['2010-08-27',0.2], ['2010-09-27',0.7], ['2010-10-27',0.6], ['2010-11-27',0.9]]
data_col2 = [['2010-08-27',0.5], ['2010-09-27',0.6], ['2010-10-27',0.2], ['2010-11-27',0.8]]
data_col3 = [['2010-08-27',0.1], ['2010-09-27',0.4], ['2010-10-27',0.5], ['2010-11-27',0.5]]
data_col4 = [['2010-08-27',0.6], ['2010-09-27',0.3], ['2010-10-27',0.7], ['2010-11-27',0.3]]

df1 = pd.DataFrame(data_col1, columns=['Date', 'Value'])
df2 = pd.DataFrame(data_col2, columns=['Date', 'Value'])
df3 = pd.DataFrame(data_col3, columns=['Date', 'Value'])
df4 = pd.DataFrame(data_col4, columns=['Date', 'Value'])

print(df1['Value'].median())
print(df2['Value'].median())
print(df3['Value'].median())
print(df4['Value'].median())

Any advice will be deeply appreciated, Thanks

If you want to isolate a certain element of your dataframe, I found that a great way to do this is to isolate a given column, convert that column to a list, then use list[index] to isolate the given element.

column = df.ColumnTitle.to_string(index=False) #replace ColumnTitle with your own column title
lst = (list(column.split()))[1:]
element = lst[x] #where x represents an index

Hopefully this helps!

aupres

jefsummers

aupres

aupres

MattKahn13