ML - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: ML (/thread-21352.html) |
ML - rezapci - Sep-25-2019 Hello Fellow Scientists i am kinda confused on this project, so I said let me ask here. I have a dataset csv with with 30001 row and 14 columns starts with time 08/03/2018 1:00 to 05/01/2019 13:26 i need to use two or more clustering algorithms, build an unsupervised time-series classifier to identify characteristic day-length patterns in the attached data. Note that each of the columns in the provided data set includes sensor measurements of the same kind for light in a room (units in Lux). using appropriate quantitative metrics to determine the number of time series clusters and to evaluate their quality. Optional: In light of the data and the differences between algorithms, speculate on why a given method yielded quantitatively better clusters. Code can be written in Python or R but any other language is allowed as long as i provide the code. Interactive notebooks (like Jupyter) including the code, comments and visualizations are preferred. Thank You indeed RE: ML - Larz60+ - Sep-26-2019 Certainly you have experimented with a few test cases. Please show what you have tried, and any error tracebacks encountered. RE: ML - rezapci - Sep-26-2019 for a example : import operator def sort_by_column(csv_cont, col, reverse=False): """ Sorts CSV contents by column name (if col argument is type <str>) or column index (if col argument is type <int>). """ header = csv_cont[0] body = csv_cont[1:] if isinstance(col, str): col_index = header.index(col) else: col_index = col body = sorted(body, key=operator.itemgetter(col_index), reverse=reverse) body.insert(0, header) return body[/quote] [quote]csv_cont = csv_to_list('dataset.csv') print('nOriginal CSV file:') print_csv(csv_cont) print('nCSV sorted by col "d13":') convert_cells_to_floats(csv_cont) csv_sorted = sort_by_column(csv_cont, '13') print_csv(csv_sorted) RE: ML - rezapci - Sep-29-2019 from __future__ import print_function, division import numpy as np train = pd.read_csv("dataset.csv", nrows=299999, dtype={'sensor_data': np.int16, 'time_off': np.float64}) train.head(5) train.rename({"sensor_data": "signal", "time_off": "day_light"}, axis="columns", inplace=True) train.head(5) for n in range(5): print(train.day_light.values[n]) --- Here i got a error !! --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-27-277ffea17200> in <module> 1 for n in range(5): ----> 2 print(train.day_light.values[n]) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object.__getattribute__(self, name) 5068 5069 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'day_light' ---and i want to see time of sensor detection increases and decreases. |