Hey everybody,
thats my first Post! *excited*
I'm currently doing Plots of COVID-19 Data provided by JohnsHopkins University.
I read the CSV-files from their GitHub Repository.
I now want to add the "Day since first Infection" (for each country) to every Dataset.
I have prepared a minimal example:
Content of the CSV-file "test.csv":
CAVE: The Data is unsorted and there are Gaps in the dates.
Please feel free to suggest any other or better way for solving this problem.
Looking forward to your answers,
Jonas
Ouh Guys. I did it.
Had to do astype('datetime64') with my date. Now it is working.
Thanks a lot anyway!
Heres the working code:
thats my first Post! *excited*
I'm currently doing Plots of COVID-19 Data provided by JohnsHopkins University.
I read the CSV-files from their GitHub Repository.
I now want to add the "Day since first Infection" (for each country) to every Dataset.
I have prepared a minimal example:
Content of the CSV-file "test.csv":
CAVE: The Data is unsorted and there are Gaps in the dates.
Output:date,country,patients
2020-01-02,germany,0
2020-01-04,swiss,0
2020-01-06,germany,5
2020-01-01,germany,0
2020-01-03,germany,1
2020-01-05,swiss,0
2020-01-03,france,0
2020-01-07,swiss,5
2020-01-05,germany,4
2020-01-02,france,0
My current Code:#!/usr/bin/env python3 import io import pandas as pd DATA = pd.read_csv ('test.csv', parse_dates=True) #print(DATA.head()) def add_days(dataframe): print("add_days started") dates = dataframe.index.get_level_values(1) return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1)) df = DATA.set_index(["country", "date"]).sort_index() print(df) result_df = df[df["patients"] > 0].groupby(level=0).apply(add_days) print(result_df)I am getting the following output:
Output: patients
country date
france 2020-01-02 0
2020-01-03 0
germany 2020-01-01 0
2020-01-02 0
2020-01-03 1
2020-01-05 4
2020-01-06 5
swiss 2020-01-04 0
2020-01-05 0
2020-01-07 5
add_days started
add_days started
Error:TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in apply(self, func, *args, **kwargs)
688 try:
--> 689 result = self._python_apply_general(f)
690 except Exception:
~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_apply_general(self, f)
706 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 707 self.axis)
708
~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py in apply(self, f, data, axis)
189 group_axes = _get_axes(group)
--> 190 res = f(group)
191 if not _is_indexed_like(res, group_axes):
<ipython-input-1-4e67c13c33df> in add_days(dataframe)
12 dates = dataframe.index.get_level_values(1)
---> 13 return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1))
14
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __sub__(self, other)
2213 def __sub__(self, other):
-> 2214 return Index(np.array(self) - other)
2215
TypeError: unsupported operand type(s) for -: 'str' and 'str'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-1-4e67c13c33df> in <module>
15 df = DATA.set_index(["country", "date"]).sort_index()
16 print(df)
---> 17 result_df = df[df["patients"] > 0].groupby(level=0).apply(add_days)
18 print(result_df)
~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in apply(self, func, *args, **kwargs)
699
700 with _group_selection_context(self):
--> 701 return self._python_apply_general(f)
702
703 return result
~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_apply_general(self, f)
705 def _python_apply_general(self, f):
706 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 707 self.axis)
708
709 return self._wrap_applied_output(
~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py in apply(self, f, data, axis)
188 # group might be modified
189 group_axes = _get_axes(group)
--> 190 res = f(group)
191 if not _is_indexed_like(res, group_axes):
192 mutated = True
<ipython-input-1-4e67c13c33df> in add_days(dataframe)
11 print("add_days started")
12 dates = dataframe.index.get_level_values(1)
---> 13 return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1))
14
15 df = DATA.set_index(["country", "date"]).sort_index()
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __sub__(self, other)
2212
2213 def __sub__(self, other):
-> 2214 return Index(np.array(self) - other)
2215
2216 def __rsub__(self, other):
TypeError: unsupported operand type(s) for -: 'str' and 'str'
I actually would like to get a "output.csv" similar to this (of course, could be sorted or whatever):Output:date,country,patients,day
2020-01-02,germany,0,0
2020-01-04,swiss,0,0
2020-01-06,germany,5,4
2020-01-01,germany,0,0
2020-01-03,germany,1,1
2020-01-05,swiss,0,0
2020-01-03,france,0,0
2020-01-07,swiss,5,1
2020-01-05,germany,4,3
2020-01-02,france,0,0
The label "Day 0" would actually be unnecesary.Please feel free to suggest any other or better way for solving this problem.
Looking forward to your answers,
Jonas
Ouh Guys. I did it.
Had to do astype('datetime64') with my date. Now it is working.
Thanks a lot anyway!
Heres the working code:
#!/usr/bin/env python3 import io import pandas as pd DATA = pd.read_csv ('test2.csv', parse_dates=True) #print(DATA.head()) DATA['date'] = DATA['date'].astype('datetime64') def add_days(dataframe): print("add_days started") dates = dataframe.index.get_level_values(1) return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1)) df = DATA.set_index(["country", "date"]).sort_index() print(df) result_df = df[df["patients"] > 0].groupby(level=0).apply(add_days) print(result_df)