Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 how to handling time series data file with Python?
#1
I am a newbie in deep learning and try to make feature matrix with python. My sample data structure is like below,

dataset.csv

State   Earnings   Hispanic   Indian   Asian   Black   White   people_in_poverty

Alabama   0.2        0.4        0.6     0.6      0.2    0.8          a.csv
Florida   0.5        0.6        0.4     0.1      0.6    0.7          b.csv
Kentucky  0.7        0.7        0.9     0.8      0.3    0.6          c.csv
Minnesota 0.3        0.1        0.2     0.5      0.2    0.7          d.csv
....
The column names [Earnings, Hispanic, Indian, Asian, Black, White] are the attributes and people_in_poverty is the class of feature matrix. When the value of people_in_poverty is numeric, the python codes are simple.

people_in_poverty
0.7
0.3
0.2
0.6
import pandas as pd
df = pd.read_csv('dataset.csv', names=['state', 'Earnings', 'Hispanic', 'Indian', 'Asian', 'Black', 'White', 'people_in_poverty'])
dataset = df.values
However, in my case, the class of feature matrix has csv file which includes the times series data.


people_in_poverty
a.csv
b.csv
c.csv
d.csv
a.csv
2010-08-27        0.2
2010-09-27        0.7
2010-10-27        0.6
2010-11-27        0.9
2010-12-27        0.4
2011-01-27        0.8
2011-02-27        0.5
2011-03-27        0.3
Then I want to know how to modify my pd.read_csv() python codes. The class of the feature matrix is not the numeric value, but csv file containing the time series values. Any advice is needed. Thanks in advanced.
Quote
#2
What shape do you envision for your DataFrame?
Quote
#3
I try to make model to predict 'people_in_poverty' of each 'state' with inputs of 'earning' and 'races'. So my feature matrix would be like below,

State   Earnings   Hispanic   Indian   Asian   Black   White                             people_in_poverty
 
Alabama   0.2        0.4        0.6     0.6      0.2    0.8     [[2010-08-27,0.2], [2010-09-27,0.7], [2010-10-27,0.6], [2010-11-27,0.9]]
Florida   0.5        0.6        0.4     0.1      0.6    0.7     [[2010-08-27,0.5], [2010-09-27,0.6], [2010-10-27,0.2], [2010-11-27,0.8]]
Kentucky  0.7        0.7        0.9     0.8      0.3    0.6     [[2010-08-27,0.1], [2010-09-27,0.4], [2010-10-27,0.5], [2010-11-27,0.5]]
Minnesota 0.3        0.1        0.2     0.5      0.2    0.7     [[2010-08-27,0.6], [2010-09-27,0.3], [2010-10-27,0.7], [2010-11-27,0.3]]
Quote
#4
Hello, How about this python codes using pandas median function. Because the 'people_in_poverty' columns have list values, I use the pandas median function to get the single value of feature matrix,

import pandas as pd

data_col1 = [['2010-08-27',0.2], ['2010-09-27',0.7], ['2010-10-27',0.6], ['2010-11-27',0.9]]
data_col2 = [['2010-08-27',0.5], ['2010-09-27',0.6], ['2010-10-27',0.2], ['2010-11-27',0.8]]
data_col3 = [['2010-08-27',0.1], ['2010-09-27',0.4], ['2010-10-27',0.5], ['2010-11-27',0.5]]
data_col4 = [['2010-08-27',0.6], ['2010-09-27',0.3], ['2010-10-27',0.7], ['2010-11-27',0.3]]

df1 = pd.DataFrame(data_col1, columns=['Date', 'Value'])
df2 = pd.DataFrame(data_col2, columns=['Date', 'Value'])
df3 = pd.DataFrame(data_col3, columns=['Date', 'Value'])
df4 = pd.DataFrame(data_col4, columns=['Date', 'Value'])

print(df1['Value'].median())
print(df2['Value'].median())
print(df3['Value'].median())
print(df4['Value'].median())
Any advice will be deeply appreciated, Thanks
Quote
#5
If you want to isolate a certain element of your dataframe, I found that a great way to do this is to isolate a given column, convert that column to a list, then use list[index] to isolate the given element.
column = df.ColumnTitle.to_string(index=False) #replace ColumnTitle with your own column title
lst = (list(column.split()))[1:]
element = lst[x] #where x represents an index

Hopefully this helps!
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Changing Time Series from Start to End of Month illmattic 0 219 Jul-16-2020, 10:49 AM
Last Post: illmattic
  HELP- DATA FRAME INTO TIME SERIES- BASIC bntayfur 0 150 Jul-11-2020, 09:04 PM
Last Post: bntayfur
  Differencing Time series and Inverse after Training donnertrud 0 192 May-27-2020, 06:11 AM
Last Post: donnertrud
  How can I convert time-series data in rows into column srvmig 0 247 Apr-11-2020, 05:40 AM
Last Post: srvmig
  Using shift to compute the percent change in a time series new_to_python 6 653 Mar-03-2020, 07:50 PM
Last Post: new_to_python
  Python numpy fft from data file magnet1 1 343 Feb-06-2020, 07:30 AM
Last Post: magnet1
  Linear Regression on Time Series karlito 5 444 Jan-28-2020, 10:02 AM
Last Post: buran
  IDE for Finance TIME SERIES Data Trader2013 2 371 Jan-19-2020, 04:44 PM
Last Post: danielgoldfarb
  Apply rolling window function over time dimension of 3D data Staph 0 372 Jan-01-2020, 08:31 AM
Last Post: Staph
  Time Series Production Process Problem Mzarour 0 309 Dec-06-2019, 06:44 PM
Last Post: Mzarour

Forum Jump:


Users browsing this thread: 1 Guest(s)