Day X since Infection

mueller · (This post was last modified: Mar-15-2020, 03:40 AM by Larz60+.)

Hey everybody,

thats my first Post! *excited*
I'm currently doing Plots of COVID-19 Data provided by JohnsHopkins University.

I read the CSV-files from their GitHub Repository.
I now want to add the "Day since first Infection" (for each country) to every Dataset.
I have prepared a minimal example:

Content of the CSV-file "test.csv":
CAVE: The Data is unsorted and there are Gaps in the dates.

Output:date,country,patients
2020-01-02,germany,0
2020-01-04,swiss,0
2020-01-06,germany,5
2020-01-01,germany,0
2020-01-03,germany,1
2020-01-05,swiss,0
2020-01-03,france,0
2020-01-07,swiss,5
2020-01-05,germany,4
2020-01-02,france,0

My current Code:

#!/usr/bin/env python3
import io

import pandas as pd

DATA = pd.read_csv ('test.csv', parse_dates=True)

#print(DATA.head())

def add_days(dataframe):
    print("add_days started")
    dates = dataframe.index.get_level_values(1)
    return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1))

df = DATA.set_index(["country", "date"]).sort_index()
print(df)
result_df = df[df["patients"] > 0].groupby(level=0).apply(add_days)
print(result_df)

I am getting the following output:

Output:                    patients
country date                
france  2020-01-02         0
        2020-01-03         0
germany 2020-01-01         0
        2020-01-02         0
        2020-01-03         1
        2020-01-05         4
        2020-01-06         5
swiss   2020-01-04         0
        2020-01-05         0
        2020-01-07         5
add_days started
add_days started

Error:TypeError                                 Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in apply(self, func, *args, **kwargs)
    688             try:
--> 689                 result = self._python_apply_general(f)
    690             except Exception:

~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_apply_general(self, f)
    706         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 707                                                    self.axis)
    708 

~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py in apply(self, f, data, axis)
    189             group_axes = _get_axes(group)
--> 190             res = f(group)
    191             if not _is_indexed_like(res, group_axes):

<ipython-input-1-4e67c13c33df> in add_days(dataframe)
     12     dates = dataframe.index.get_level_values(1)
---> 13     return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1))
     14 

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __sub__(self, other)
   2213     def __sub__(self, other):
-> 2214         return Index(np.array(self) - other)
   2215 

TypeError: unsupported operand type(s) for -: 'str' and 'str'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-4e67c13c33df> in <module>
     15 df = DATA.set_index(["country", "date"]).sort_index()
     16 print(df)
---> 17 result_df = df[df["patients"] > 0].groupby(level=0).apply(add_days)
     18 print(result_df)

~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in apply(self, func, *args, **kwargs)
    699 
    700                 with _group_selection_context(self):
--> 701                     return self._python_apply_general(f)
    702 
    703         return result

~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_apply_general(self, f)
    705     def _python_apply_general(self, f):
    706         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 707                                                    self.axis)
    708 
    709         return self._wrap_applied_output(

~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py in apply(self, f, data, axis)
    188             # group might be modified
    189             group_axes = _get_axes(group)
--> 190             res = f(group)
    191             if not _is_indexed_like(res, group_axes):
    192                 mutated = True

<ipython-input-1-4e67c13c33df> in add_days(dataframe)
     11     print("add_days started")
     12     dates = dataframe.index.get_level_values(1)
---> 13     return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1))
     14 
     15 df = DATA.set_index(["country", "date"]).sort_index()

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __sub__(self, other)
   2212 
   2213     def __sub__(self, other):
-> 2214         return Index(np.array(self) - other)
   2215 
   2216     def __rsub__(self, other):

TypeError: unsupported operand type(s) for -: 'str' and 'str'

I actually would like to get a "output.csv" similar to this (of course, could be sorted or whatever):

Output:date,country,patients,day
2020-01-02,germany,0,0
2020-01-04,swiss,0,0
2020-01-06,germany,5,4
2020-01-01,germany,0,0
2020-01-03,germany,1,1
2020-01-05,swiss,0,0
2020-01-03,france,0,0
2020-01-07,swiss,5,1
2020-01-05,germany,4,3
2020-01-02,france,0,0

The label "Day 0" would actually be unnecesary.

Please feel free to suggest any other or better way for solving this problem.

Looking forward to your answers,
Jonas

Ouh Guys. I did it.

Had to do astype('datetime64') with my date. Now it is working.
Thanks a lot anyway!

Heres the working code:

#!/usr/bin/env python3
import io

import pandas as pd

DATA = pd.read_csv ('test2.csv', parse_dates=True)

#print(DATA.head())

DATA['date'] = DATA['date'].astype('datetime64')

def add_days(dataframe):
    print("add_days started")
    dates = dataframe.index.get_level_values(1)
    return dataframe.assign(days=dates - dates[0] + pd.Timedelta(days=1))

df = DATA.set_index(["country", "date"]).sort_index()
print(df)
result_df = df[df["patients"] > 0].groupby(level=0).apply(add_days)
print(result_df)

Day X since Infection

User Panel Messages

Announcements