how to handling time series data file with Python? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: how to handling time series data file with Python? (/thread-28643.html) |
how to handling time series data file with Python? - aupres - Jul-28-2020 I am a newbie in deep learning and try to make feature matrix with python. My sample data structure is like below, dataset.csv State Earnings Hispanic Indian Asian Black White people_in_poverty Alabama 0.2 0.4 0.6 0.6 0.2 0.8 a.csv Florida 0.5 0.6 0.4 0.1 0.6 0.7 b.csv Kentucky 0.7 0.7 0.9 0.8 0.3 0.6 c.csv Minnesota 0.3 0.1 0.2 0.5 0.2 0.7 d.csv ....The column names [Earnings, Hispanic, Indian, Asian, Black, White] are the attributes and people_in_poverty is the class of feature matrix. When the value of people_in_poverty is numeric, the python codes are simple.people_in_poverty 0.7 0.3 0.2 0.6 import pandas as pd df = pd.read_csv('dataset.csv', names=['state', 'Earnings', 'Hispanic', 'Indian', 'Asian', 'Black', 'White', 'people_in_poverty']) dataset = df.valuesHowever, in my case, the class of feature matrix has csv file which includes the times series data. people_in_poverty a.csv b.csv c.csv d.csv a.csv 2010-08-27 0.2 2010-09-27 0.7 2010-10-27 0.6 2010-11-27 0.9 2010-12-27 0.4 2011-01-27 0.8 2011-02-27 0.5 2011-03-27 0.3Then I want to know how to modify my pd.read_csv() python codes. The class of the feature matrix is not the numeric value, but csv file containing the time series values. Any advice is needed. Thanks in advanced.
RE: how to handling time series data file with Python? - jefsummers - Jul-28-2020 What shape do you envision for your DataFrame? RE: how to handling time series data file with Python? - aupres - Jul-29-2020 I try to make model to predict 'people_in_poverty' of each 'state' with inputs of 'earning' and 'races'. So my feature matrix would be like below, State Earnings Hispanic Indian Asian Black White people_in_poverty Alabama 0.2 0.4 0.6 0.6 0.2 0.8 [[2010-08-27,0.2], [2010-09-27,0.7], [2010-10-27,0.6], [2010-11-27,0.9]] Florida 0.5 0.6 0.4 0.1 0.6 0.7 [[2010-08-27,0.5], [2010-09-27,0.6], [2010-10-27,0.2], [2010-11-27,0.8]] Kentucky 0.7 0.7 0.9 0.8 0.3 0.6 [[2010-08-27,0.1], [2010-09-27,0.4], [2010-10-27,0.5], [2010-11-27,0.5]] Minnesota 0.3 0.1 0.2 0.5 0.2 0.7 [[2010-08-27,0.6], [2010-09-27,0.3], [2010-10-27,0.7], [2010-11-27,0.3]] RE: how to handling time series data file with Python? - aupres - Aug-07-2020 Hello, How about this python codes using pandas median function. Because the 'people_in_poverty' columns have list values, I use the pandas median function to get the single value of feature matrix, import pandas as pd data_col1 = [['2010-08-27',0.2], ['2010-09-27',0.7], ['2010-10-27',0.6], ['2010-11-27',0.9]] data_col2 = [['2010-08-27',0.5], ['2010-09-27',0.6], ['2010-10-27',0.2], ['2010-11-27',0.8]] data_col3 = [['2010-08-27',0.1], ['2010-09-27',0.4], ['2010-10-27',0.5], ['2010-11-27',0.5]] data_col4 = [['2010-08-27',0.6], ['2010-09-27',0.3], ['2010-10-27',0.7], ['2010-11-27',0.3]] df1 = pd.DataFrame(data_col1, columns=['Date', 'Value']) df2 = pd.DataFrame(data_col2, columns=['Date', 'Value']) df3 = pd.DataFrame(data_col3, columns=['Date', 'Value']) df4 = pd.DataFrame(data_col4, columns=['Date', 'Value']) print(df1['Value'].median()) print(df2['Value'].median()) print(df3['Value'].median()) print(df4['Value'].median())Any advice will be deeply appreciated, Thanks RE: how to handling time series data file with Python? - MattKahn13 - Aug-10-2020 If you want to isolate a certain element of your dataframe, I found that a great way to do this is to isolate a given column, convert that column to a list, then use list[index] to isolate the given element. column = df.ColumnTitle.to_string(index=False) #replace ColumnTitle with your own column title lst = (list(column.split()))[1:] element = lst[x] #where x represents an index Hopefully this helps! |