ML - Printable Version

ML - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: ML (/thread-21352.html)

ML - rezapci - Sep-25-2019

Hello Fellow Scientists
i am kinda confused on this project, so I said let me ask here.
I have a dataset csv with with 30001 row and 14 columns starts with time 08/03/2018 1:00 to 05/01/2019 13:26
i need to use two or more clustering algorithms, build an unsupervised time-series classifier to identify characteristic day-length patterns in the attached data. Note that each of the columns in the provided data set includes sensor measurements of the same kind for light in a room (units in Lux).
using appropriate quantitative metrics to determine the number of time series clusters and to evaluate their quality.
Optional: In light of the data and the differences between algorithms, speculate on why a given method yielded quantitatively better clusters.
Code can be written in Python or R but any other language is allowed as long as i provide the code.
Interactive notebooks (like Jupyter) including the code, comments and visualizations are preferred.

Thank You indeed

RE: ML - Larz60+ - Sep-26-2019

Certainly you have experimented with a few test cases.
Please show what you have tried, and any error tracebacks encountered.

RE: ML - rezapci - Sep-26-2019

for a example :

import operator

def sort_by_column(csv_cont, col, reverse=False):
    """ 
    Sorts CSV contents by column name (if col argument is type <str>) 
    or column index (if col argument is type <int>). 
    
    """
    header = csv_cont[0]
    body = csv_cont[1:]
    if isinstance(col, str):  
        col_index = header.index(col)
    else:
        col_index = col
    body = sorted(body, 
           key=operator.itemgetter(col_index), 
           reverse=reverse)
    body.insert(0, header)
    return body[/quote]
[quote]csv_cont = csv_to_list('dataset.csv')

print('nOriginal CSV file:')
print_csv(csv_cont)

print('nCSV sorted by col "d13":')
convert_cells_to_floats(csv_cont)
csv_sorted = sort_by_column(csv_cont, '13')
print_csv(csv_sorted)

RE: ML - rezapci - Sep-29-2019

from __future__ import print_function, division
import numpy as np

train = pd.read_csv("dataset.csv", nrows=299999,
                    dtype={'sensor_data': np.int16, 'time_off': np.float64})
train.head(5)

train.rename({"sensor_data": "signal", "time_off": "day_light"}, axis="columns", inplace=True)
train.head(5)

for n in range(5):
    print(train.day_light.values[n])
---
Here i got a error !!

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-27-277ffea17200> in <module>
      1 for n in range(5):
----> 2     print(train.day_light.values[n])

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5066                 return self[name]
-> 5067             return object.__getattribute__(self, name)
   5068 
   5069     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'day_light'

---

and i want to see time of sensor detection increases and decreases.