Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Day X since Infection
#1
Hey everybody,

thats my first Post! *excited*
I'm currently doing Plots of COVID-19 Data provided by JohnsHopkins University.

I read the CSV-files from their GitHub Repository.
I now want to add the "Day since first Infection" (for each country) to every Dataset.
I have prepared a minimal example:

Content of the CSV-file "test.csv":
CAVE: The Data is unsorted and there are Gaps in the dates.


Output:
date,country,patients 2020-01-02,germany,0 2020-01-04,swiss,0 2020-01-06,germany,5 2020-01-01,germany,0 2020-01-03,germany,1 2020-01-05,swiss,0 2020-01-03,france,0 2020-01-07,swiss,5 2020-01-05,germany,4 2020-01-02,france,0
My current Code:
#!/usr/bin/env python3
import io

import pandas as pd

DATA = pd.read_csv ('test.csv', parse_dates=True)

#print(DATA.head())

def add_days(dataframe):
    print("add_days started")
    dates = dataframe.index.get_level_values(1)
    return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1))

df = DATA.set_index(["country", "date"]).sort_index()
print(df)
result_df = df[df["patients"] > 0].groupby(level=0).apply(add_days)
print(result_df)
I am getting the following output:

Output:
patients country date france 2020-01-02 0 2020-01-03 0 germany 2020-01-01 0 2020-01-02 0 2020-01-03 1 2020-01-05 4 2020-01-06 5 swiss 2020-01-04 0 2020-01-05 0 2020-01-07 5 add_days started add_days started
Error:
TypeError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in apply(self, func, *args, **kwargs) 688 try: --> 689 result = self._python_apply_general(f) 690 except Exception: ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_apply_general(self, f) 706 keys, values, mutated = self.grouper.apply(f, self._selected_obj, --> 707 self.axis) 708 ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py in apply(self, f, data, axis) 189 group_axes = _get_axes(group) --> 190 res = f(group) 191 if not _is_indexed_like(res, group_axes): <ipython-input-1-4e67c13c33df> in add_days(dataframe) 12 dates = dataframe.index.get_level_values(1) ---> 13 return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1)) 14 ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __sub__(self, other) 2213 def __sub__(self, other): -> 2214 return Index(np.array(self) - other) 2215 TypeError: unsupported operand type(s) for -: 'str' and 'str' During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) <ipython-input-1-4e67c13c33df> in <module> 15 df = DATA.set_index(["country", "date"]).sort_index() 16 print(df) ---> 17 result_df = df[df["patients"] > 0].groupby(level=0).apply(add_days) 18 print(result_df) ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in apply(self, func, *args, **kwargs) 699 700 with _group_selection_context(self): --> 701 return self._python_apply_general(f) 702 703 return result ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_apply_general(self, f) 705 def _python_apply_general(self, f): 706 keys, values, mutated = self.grouper.apply(f, self._selected_obj, --> 707 self.axis) 708 709 return self._wrap_applied_output( ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py in apply(self, f, data, axis) 188 # group might be modified 189 group_axes = _get_axes(group) --> 190 res = f(group) 191 if not _is_indexed_like(res, group_axes): 192 mutated = True <ipython-input-1-4e67c13c33df> in add_days(dataframe) 11 print("add_days started") 12 dates = dataframe.index.get_level_values(1) ---> 13 return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1)) 14 15 df = DATA.set_index(["country", "date"]).sort_index() ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __sub__(self, other) 2212 2213 def __sub__(self, other): -> 2214 return Index(np.array(self) - other) 2215 2216 def __rsub__(self, other): TypeError: unsupported operand type(s) for -: 'str' and 'str'
I actually would like to get a "output.csv" similar to this (of course, could be sorted or whatever):

Output:
date,country,patients,day 2020-01-02,germany,0,0 2020-01-04,swiss,0,0 2020-01-06,germany,5,4 2020-01-01,germany,0,0 2020-01-03,germany,1,1 2020-01-05,swiss,0,0 2020-01-03,france,0,0 2020-01-07,swiss,5,1 2020-01-05,germany,4,3 2020-01-02,france,0,0
The label "Day 0" would actually be unnecesary.

Please feel free to suggest any other or better way for solving this problem.

Looking forward to your answers,
Jonas

Ouh Guys. I did it.

Had to do astype('datetime64') with my date. Now it is working.
Thanks a lot anyway!

Heres the working code:

#!/usr/bin/env python3
import io

import pandas as pd

DATA = pd.read_csv ('test2.csv', parse_dates=True)

#print(DATA.head())

DATA['date'] = DATA['date'].astype('datetime64')

def add_days(dataframe):
    print("add_days started")
    dates = dataframe.index.get_level_values(1)
    return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1))

df = DATA.set_index(["country", "date"]).sort_index()
print(df)
result_df = df[df["patients"] > 0].groupby(level=0).apply(add_days)
print(result_df)
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020