I have data which includes id , gender , collected time test name and Test values , Units of measurement
Test Names will include all tests that a patient taken and Value col will have its corresponding test result.
I want to analysis on only certain tests and retrieve corresponding test values from "value" col . The analysis will be on those tests and their values , so I thought it would be good idea to pivot on those test names and test values. However when I add TS col I get an error and adding any other test name in the multiindex code does not throw an error.
Steps:
Steps:
df_s.head(30).dropna()
Here in the below screenshot we can see there multiple test taken for each requisition id:
In the below code Iam only getting tests which I want to do analysis 1#
Question2: When I proceeded to go ahead without tsh, after conversion of test col- rows to cols, I get blank values in respective test col ( example T4 col) because 1) the person has taken the test but there is no value in the dataset(python is treating it as Null value and can be imputed/rejected - no issue 2) the patient has not taken this test but has taken atleast one other tests may be T3, hcg etc but not this test- this is considered as string '' . I want to get rid of these rows for amy analysis .. is there an approach while transforming the data to take care of so that I only want the result of the code to have T4 and its value( numeric or null). I do not want a scenario where the person has not taken test at all. OR is there a way to impute these values so I will know the person has taken T4, T3 but not Hcg , bilrubin etc?
Please advise. Long questions but I hope it this explanatory
Test Names will include all tests that a patient taken and Value col will have its corresponding test result.
I want to analysis on only certain tests and retrieve corresponding test values from "value" col . The analysis will be on those tests and their values , so I thought it would be good idea to pivot on those test names and test values. However when I add TS col I get an error and adding any other test name in the multiindex code does not throw an error.
Steps:
Steps:
df_s.head(30).dropna()
Here in the below screenshot we can see there multiple test taken for each requisition id:
In the below code Iam only getting tests which I want to do analysis 1#
df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])]2# Resetting the index:
df_s3=df_s2.set_index(['ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index()3# applything multiindex
idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'], ]) #, df_s3['Unit of Measure'] df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='') df_s5.columns.name = None df_s6= df_s5.reset_index() df_s6.head(100)I get this result if do not add TSH (from Test Col)
Output: ID Name Age Sex CT RT Units BILRUBIN FREE THYROID 3 FREE THYROID 4 Human Chorionic Gonadotropin (hCG) TOTAL THYROXINE (T4) TOTAL TRIIODOTHYRONINE (T3)
0 RQ0556048140 Madhuri Dev 28 Years Female 8/01/2019 10:30 8/01/2019 14:39 ng/dl 1.93
1 RQ0556048140 Madhuri Dev 28 Years Female 8/01/2019 10:30 8/01/2019 14:39 μg/dl 18.70
2 RQ06916497688 B/O BARSARANI BISWAL 10 Month(s) 15 Day(s) Female 17/06/2019 12:00 17/06/2019 14:41 ng/dl 179.82
3 RQ06916497688 B/O BARSARANI BISWAL 10 Month(s) 15 Day(s) Female 17/06/2019 12:00 17/06/2019 14:41 μg/dl 18.30
4 RQ09026492462 Sri Hemanta Sarkar 46 Years Male 30/04/2018 08:35 30/04/2018 14:30 ng/dl 1.15
5 RQ09026492462 Sri Hemanta Sarkar 46 Years Male 30/04/2018 08:35 30/04/2018 14:30 μg/dl 9.20
6 RQ1001489038840 RENUKA MAHAPATRA 65 Years Female 28/07/2019 08:20 28/07/2019 13:16 ng/dl 90
7 RQ1001489038840 RENUKA MAHAPATRA 65 Years Female 28/07/2019 08:20 28/07/2019 13:16 μg/dl 7.40
8 RQ1004195473943 Mrs Mamata Samantray 45 Years Female 23/09/2017 11:40 23/09/2017 13:13 ng/dl 1.58
9 RQ1004195473943 Mrs Mamata Samantray 45 Years Female 23/09/2017 11:40 23/09/2017 13:13 μg/dl 15.60
10 RQ1009478939089 Sabita Lenka 30 Years Female 11/06/2017 13:00 12/06/2017 10:10 ng/dl 1.78
11 RQ1009478939089 Sabita Lenka 30 Years Female 11/06/2017 13:00 12/06/2017 10:10 μg/dl 12.50
12 RQ1012532242276 Sanjukta Mishra 47 Years Female 19/03/2018 11:30 19/03/2018 16:35 ng/dl 0.66
13 RQ1012532242276 Sanjukta Mishra 47 Years Female 19/03/2018 11:30 19/03/2018 16:35 μg/dl 6.40
14 RQ1013250484240 Mrs Abha Kansari 45 Years Female 27/07/2017 11:20 27/07/2017 12:42 ng/dl NaN
15 RQ1013250484240 Mrs Abha Kansari 45 Years Female 27/07/2017 11:20 27/07/2017 12:42 μg/dl NaN
16 RQ1013716969697 Madhusmita Sahu 17 Years Female 31/07/2017 11:40 31/07/2017 13:38 ng/dl 0.29
17 RQ1013716969697 Madhusmita Sahu 17 Years Female 31/07/2017 11:40 31/07/2017 13:38 μg/dl 0.70
18 RQ10189073348 Sumati Mahapatra 55 Years Female 11/02/2017 09:30 11/02/2017 14:14 ng/dl 0.90
19 RQ10189073348 Sumati Mahapatra 55 Years Female 11/02/2017 09:30 11/02/2017 14:14 μg/dl 9.10
20 RQ101981055296 NARMADA GUPTA 50 Years Female 23/08/2019 09:45 23/08/2019 17:25 ng/dl 105
21 RQ101981055296 NARMADA GUPTA 50 Years Female 23/08/2019 09:45 23/08/2019 17:25 μg/dl 7.10
22 RQ102281766132 Pyari Xalxo 39 Years Female 4/03/2017 10:10 4/03/2017 13:10 mIU/ml 28640
23 RQ102281766132 Pyari Xalxo 39 Years Female 4/03/2017 10:10 4/03/2017 13:10 ng/dl 1.67
24 RQ102281766132 Pyari Xalxo 39 Years Female 4/03/2017 10:10 4/03/2017 13:10 μg/dl 13.10
25 RQ1023270930913 VICTORIA KISPOTTA 42 Years Female 9/09/2019 11:50 9/09/2019 12:22 ng/dl 82
26 RQ1023270930913 VICTORIA KISPOTTA 42 Years Female 9/09/2019 11:50 9/09/2019 12:22 μg/dl 5.78
27 RQ1026366989473 PRATIMA PATNAIK 38 Years Female 8/07/2019 01:15 8/07/2019 16:37 ng/dl 88.55
28 RQ1026366989473 PRATIMA PATNAIK 38 Years Female 8/07/2019 01:15 8/07/2019 16:37 μg/dl 10.70
29 RQ1028984992315 Agastin Horo 40 Years Female 8/02/2017 10:20 8/02/2017 16:44 ng/dl 0.81
... ... ... ... ... ... ... ... ... ... ... ... ... ...
70 RQ1076842665319 Puspalata Behera 23 Years Female 28/02/2017 11:30 28/02/2017 13:29 μg/dl 7.90
71 RQ1078176595194 Pramadini Chhatoi 37 Years Female 31/03/2018 10:00 31/03/2018 13:54 ng/dl 0.96
72 RQ1078176595194 Pramadini Chhatoi 37 Years Female 31/03/2018 10:00 31/03/2018 13:54 μg/dl 9.80
73 RQ1082829630987 Reena Acharya 52 Years Female 13/11/2017 09:10 13/11/2017 11:49 ng/dl 1.07
74 RQ1082829630987 Reena Acharya 52 Years Female 13/11/2017 09:10 13/11/2017 11:49 μg/dl 11.30
75 RQ1084664624181 NIBEDITA KAR 36 Years Female 29/05/2019 10:14 29/05/2019 10:14 ng/dl 97.25
76 RQ1084664624181 NIBEDITA KAR 36 Years Female 29/05/2019 10:14 29/05/2019 10:14 μg/dl 8.10
77 RQ108506693161 Vinod Hemrom 40 Years Male 24/10/2018 12:00 24/10/2018 16:24 ng/dl 1.26
78 RQ108506693161 Vinod Hemrom 40 Years Male 24/10/2018 12:00 24/10/2018 16:24 μg/dl 11.80
79 RQ109122773470 Tara Bhadur 23 Years Female 23/06/2018 11:30 23/06/2018 15:11 ng/dl 1.35
80 RQ109122773470 Tara Bhadur 23 Years Female 23/06/2018 11:30 23/06/2018 15:11 μg/dl 7.70
81 RQ109263648697 Jyoti Thakur 35 Years Female 15/09/2018 11:30 15/09/2018 16:22 ng/dl 1.01
82 RQ109263648697 Jyoti Thakur 35 Years Female 15/09/2018 11:30 15/09/2018 16:22 μg/dl 9.50
83 RQ1093448652128 PUSPITA MISHRA 23 Years Female 30/04/2019 09:45 30/04/2019 19:37 ng/dl 83
84 RQ1093448652128 PUSPITA MISHRA 23 Years Female 30/04/2019 09:45 30/04/2019 19:37 μg/dl 6.10
85 RQ109359752914 HIRAMANI KACHAP 30 Years Female 14/08/2019 03:00 14/08/2019 18:26 ng/dl 88
86 RQ109359752914 HIRAMANI KACHAP 30 Years Female 14/08/2019 03:00 14/08/2019 18:26 μg/dl 6.50
87 RQ1097475978863 CHULESWARI PATRA 18 Years Female 1/09/2019 10:30 1/09/2019 11:08 ng/dl 88
88 RQ1097475978863 CHULESWARI PATRA 18 Years Female 1/09/2019 10:30 1/09/2019 11:08 μg/dl 6.90
89 RQ1098576134741 S PATEL 29 Years Female 30/07/2017 10:30 31/07/2017 15:17 ng/dl 1.14
90 RQ1098576134741 S PATEL 29 Years Female 30/07/2017 10:30 31/07/2017 15:17 μg/dl 12.70
91 RQ1098887741955 L MUKHI 32 Years Female 5/08/2019 05:00 5/08/2019 18:02 ng/dl 118
92 RQ1098887741955 L MUKHI 32 Years Female 5/08/2019 05:00 5/08/2019 18:02 μg/dl 8.50
93 RQ1099369598030 Amit Sahoo 38 Years Male 28/03/2019 09:14 28/03/2019 09:51 ng/dl 1.88
94 RQ1099369598030 Amit Sahoo 38 Years Male 28/03/2019 09:14 28/03/2019 09:51 μg/dl 10.70
95 RQ1101382949711 MEHROOM NISHA 50 Years Female 29/04/2019 08:55 29/04/2019 10:43 ng/dl 172.77
96 RQ1101382949711 MEHROOM NISHA 50 Years Female 29/04/2019 08:55 29/04/2019 10:43 μg/dl 10.10
97 RQ1103159227767 Mamata Das Mahapatra 30 Years Female 5/01/2019 11:20 5/01/2019 11:20 ng/dl 1.21
98 RQ1103159227767 Mamata Das Mahapatra 30 Years Female 5/01/2019 11:20 5/01/2019 11:20 μg/dl 9.10
99 RQ1114005283147 Manji Kaur 47 Years Female 19/06/2017 09:00 19/06/2017 10:50 ng/dl
Code with TSH test: Retry 1# with TSHdf_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','THYROID STIMULATING HORMONE (TSH)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])] df_s3=df_s2.set_index(['ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index() idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'], ]) #, df_s3['Unit of Measure'] df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='') df_s5.columns.name = None df_s6= df_s5.reset_index() df_s6.head(100)
Output:---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-63-82c092f6cd99> in <module>
1 idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'], ])
----> 2 df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='')
3 df_s5.columns.name = None
4 df_s6= df_s5.reset_index()
5 df_s6.head(100)
C:\BhargaviM\MyAnaconda\lib\site-packages\pandas\core\series.py in unstack(self, level, fill_value)
2897 """
2898 from pandas.core.reshape.reshape import unstack
-> 2899 return unstack(self, level, fill_value)
2900
2901 # ----------------------------------------------------------------------
C:\BhargaviM\MyAnaconda\lib\site-packages\pandas\core\reshape\reshape.py in unstack(obj, level, fill_value)
499 unstacker = _Unstacker(obj.values, obj.index, level=level,
500 fill_value=fill_value,
--> 501 constructor=obj._constructor_expanddim)
502 return unstacker.get_result()
503
C:\BhargaviM\MyAnaconda\lib\site-packages\pandas\core\reshape\reshape.py in __init__(self, values, index, level, value_columns, fill_value, constructor)
135
136 self._make_sorted_values_labels()
--> 137 self._make_selectors()
138
139 def _make_sorted_values_labels(self):
C:\BhargaviM\MyAnaconda\lib\site-packages\pandas\core\reshape\reshape.py in _make_selectors(self)
173
174 if mask.sum() < len(self.index):
--> 175 raise ValueError('Index contains duplicate entries, '
176 'cannot reshape')
177
ValueError: Index contains duplicate entries, cannot reshape
Question1 (Retry 1# with TSH ): Please help me with the correct approach, what I understand the error is because once it convert it is not finding any unique index but not sure how to resolve it.Question2: When I proceeded to go ahead without tsh, after conversion of test col- rows to cols, I get blank values in respective test col ( example T4 col) because 1) the person has taken the test but there is no value in the dataset(python is treating it as Null value and can be imputed/rejected - no issue 2) the patient has not taken this test but has taken atleast one other tests may be T3, hcg etc but not this test- this is considered as string '' . I want to get rid of these rows for amy analysis .. is there an approach while transforming the data to take care of so that I only want the result of the code to have T4 and its value( numeric or null). I do not want a scenario where the person has not taken test at all. OR is there a way to impute these values so I will know the person has taken T4, T3 but not Hcg , bilrubin etc?
Please advise. Long questions but I hope it this explanatory
Output: