ValueError: Index contains duplicate entries, cannot reshape” error when I try to use

Smiling29 · (This post was last modified: Oct-13-2019, 01:55 PM by Smiling29.)

I have data which includes id , gender , collected time test name and Test values , Units of measurement

Test Names will include all tests that a patient taken and Value col will have its corresponding test result.

I want to analysis on only certain tests and retrieve corresponding test values from "value" col . The analysis will be on those tests and their values , so I thought it would be good idea to pivot on those test names and test values. However when I add TS col I get an error and adding any other test name in the multiindex code does not throw an error.

Steps:
Steps:

df_s.head(30).dropna()
Here in the below screenshot we can see there multiple test taken for each requisition id:
In the below code Iam only getting tests which I want to do analysis 1#

df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])]

2# Resetting the index:

df_s3=df_s2.set_index(['ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index()

3# applything multiindex

idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'],  ])
#, df_s3['Unit of Measure']
df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='')
df_s5.columns.name = None
df_s6= df_s5.reset_index()
df_s6.head(100)

I get this result if do not add TSH (from Test Col)

Output:	ID	Name	Age	Sex	CT	RT	Units	BILRUBIN	FREE THYROID 3	FREE THYROID 4	Human Chorionic Gonadotropin (hCG)	TOTAL THYROXINE (T4)	TOTAL TRIIODOTHYRONINE (T3)
0	RQ0556048140	Madhuri Dev	28 Years	Female	8/01/2019 10:30	8/01/2019 14:39	ng/dl						1.93
1	RQ0556048140	Madhuri Dev	28 Years	Female	8/01/2019 10:30	8/01/2019 14:39	μg/dl					18.70	
2	RQ06916497688	B/O BARSARANI BISWAL	10 Month(s) 15 Day(s)	Female	17/06/2019 12:00	17/06/2019 14:41	ng/dl						179.82
3	RQ06916497688	B/O BARSARANI BISWAL	10 Month(s) 15 Day(s)	Female	17/06/2019 12:00	17/06/2019 14:41	μg/dl					18.30	
4	RQ09026492462	Sri Hemanta Sarkar	46 Years	Male	30/04/2018 08:35	30/04/2018 14:30	ng/dl						1.15
5	RQ09026492462	Sri Hemanta Sarkar	46 Years	Male	30/04/2018 08:35	30/04/2018 14:30	μg/dl					9.20	
6	RQ1001489038840	RENUKA MAHAPATRA	65 Years	Female	28/07/2019 08:20	28/07/2019 13:16	ng/dl						90
7	RQ1001489038840	RENUKA MAHAPATRA	65 Years	Female	28/07/2019 08:20	28/07/2019 13:16	μg/dl					7.40	
8	RQ1004195473943	Mrs Mamata Samantray	45 Years	Female	23/09/2017 11:40	23/09/2017 13:13	ng/dl						1.58
9	RQ1004195473943	Mrs Mamata Samantray	45 Years	Female	23/09/2017 11:40	23/09/2017 13:13	μg/dl					15.60	
10	RQ1009478939089	Sabita Lenka	30 Years	Female	11/06/2017 13:00	12/06/2017 10:10	ng/dl						1.78
11	RQ1009478939089	Sabita Lenka	30 Years	Female	11/06/2017 13:00	12/06/2017 10:10	μg/dl					12.50	
12	RQ1012532242276	Sanjukta Mishra	47 Years	Female	19/03/2018 11:30	19/03/2018 16:35	ng/dl						0.66
13	RQ1012532242276	Sanjukta Mishra	47 Years	Female	19/03/2018 11:30	19/03/2018 16:35	μg/dl					6.40	
14	RQ1013250484240	Mrs Abha Kansari	45 Years	Female	27/07/2017 11:20	27/07/2017 12:42	ng/dl						NaN
15	RQ1013250484240	Mrs Abha Kansari	45 Years	Female	27/07/2017 11:20	27/07/2017 12:42	μg/dl					NaN	
16	RQ1013716969697	Madhusmita Sahu	17 Years	Female	31/07/2017 11:40	31/07/2017 13:38	ng/dl						0.29
17	RQ1013716969697	Madhusmita Sahu	17 Years	Female	31/07/2017 11:40	31/07/2017 13:38	μg/dl					0.70	
18	RQ10189073348	Sumati Mahapatra	55 Years	Female	11/02/2017 09:30	11/02/2017 14:14	ng/dl						0.90
19	RQ10189073348	Sumati Mahapatra	55 Years	Female	11/02/2017 09:30	11/02/2017 14:14	μg/dl					9.10	
20	RQ101981055296	NARMADA GUPTA	50 Years	Female	23/08/2019 09:45	23/08/2019 17:25	ng/dl						105
21	RQ101981055296	NARMADA GUPTA	50 Years	Female	23/08/2019 09:45	23/08/2019 17:25	μg/dl					7.10	
22	RQ102281766132	Pyari Xalxo	39 Years	Female	4/03/2017 10:10	4/03/2017 13:10	mIU/ml				28640		
23	RQ102281766132	Pyari Xalxo	39 Years	Female	4/03/2017 10:10	4/03/2017 13:10	ng/dl						1.67
24	RQ102281766132	Pyari Xalxo	39 Years	Female	4/03/2017 10:10	4/03/2017 13:10	μg/dl					13.10	
25	RQ1023270930913	VICTORIA KISPOTTA	42 Years	Female	9/09/2019 11:50	9/09/2019 12:22	ng/dl						82
26	RQ1023270930913	VICTORIA KISPOTTA	42 Years	Female	9/09/2019 11:50	9/09/2019 12:22	μg/dl					5.78	
27	RQ1026366989473	PRATIMA PATNAIK	38 Years	Female	8/07/2019 01:15	8/07/2019 16:37	ng/dl						88.55
28	RQ1026366989473	PRATIMA PATNAIK	38 Years	Female	8/07/2019 01:15	8/07/2019 16:37	μg/dl					10.70	
29	RQ1028984992315	Agastin Horo	40 Years	Female	8/02/2017 10:20	8/02/2017 16:44	ng/dl						0.81
...	...	...	...	...	...	...	...	...	...	...	...	...	...
70	RQ1076842665319	Puspalata Behera	23 Years	Female	28/02/2017 11:30	28/02/2017 13:29	μg/dl					7.90	
71	RQ1078176595194	Pramadini Chhatoi	37 Years	Female	31/03/2018 10:00	31/03/2018 13:54	ng/dl						0.96
72	RQ1078176595194	Pramadini Chhatoi	37 Years	Female	31/03/2018 10:00	31/03/2018 13:54	μg/dl					9.80	
73	RQ1082829630987	Reena Acharya	52 Years	Female	13/11/2017 09:10	13/11/2017 11:49	ng/dl						1.07
74	RQ1082829630987	Reena Acharya	52 Years	Female	13/11/2017 09:10	13/11/2017 11:49	μg/dl					11.30	
75	RQ1084664624181	NIBEDITA KAR	36 Years	Female	29/05/2019 10:14	29/05/2019 10:14	ng/dl						97.25
76	RQ1084664624181	NIBEDITA KAR	36 Years	Female	29/05/2019 10:14	29/05/2019 10:14	μg/dl					8.10	
77	RQ108506693161	Vinod Hemrom	40 Years	Male	24/10/2018 12:00	24/10/2018 16:24	ng/dl						1.26
78	RQ108506693161	Vinod Hemrom	40 Years	Male	24/10/2018 12:00	24/10/2018 16:24	μg/dl					11.80	
79	RQ109122773470	Tara Bhadur	23 Years	Female	23/06/2018 11:30	23/06/2018 15:11	ng/dl						1.35
80	RQ109122773470	Tara Bhadur	23 Years	Female	23/06/2018 11:30	23/06/2018 15:11	μg/dl					7.70	
81	RQ109263648697	Jyoti Thakur	35 Years	Female	15/09/2018 11:30	15/09/2018 16:22	ng/dl						1.01
82	RQ109263648697	Jyoti Thakur	35 Years	Female	15/09/2018 11:30	15/09/2018 16:22	μg/dl					9.50	
83	RQ1093448652128	PUSPITA MISHRA	23 Years	Female	30/04/2019 09:45	30/04/2019 19:37	ng/dl						83
84	RQ1093448652128	PUSPITA MISHRA	23 Years	Female	30/04/2019 09:45	30/04/2019 19:37	μg/dl					6.10	
85	RQ109359752914	HIRAMANI KACHAP	30 Years	Female	14/08/2019 03:00	14/08/2019 18:26	ng/dl						88
86	RQ109359752914	HIRAMANI KACHAP	30 Years	Female	14/08/2019 03:00	14/08/2019 18:26	μg/dl					6.50	
87	RQ1097475978863	CHULESWARI PATRA	18 Years	Female	1/09/2019 10:30	1/09/2019 11:08	ng/dl						88
88	RQ1097475978863	CHULESWARI PATRA	18 Years	Female	1/09/2019 10:30	1/09/2019 11:08	μg/dl					6.90	
89	RQ1098576134741	S PATEL	29 Years	Female	30/07/2017 10:30	31/07/2017 15:17	ng/dl						1.14
90	RQ1098576134741	S PATEL	29 Years	Female	30/07/2017 10:30	31/07/2017 15:17	μg/dl					12.70	
91	RQ1098887741955	L MUKHI	32 Years	Female	5/08/2019 05:00	5/08/2019 18:02	ng/dl						118
92	RQ1098887741955	L MUKHI	32 Years	Female	5/08/2019 05:00	5/08/2019 18:02	μg/dl					8.50	
93	RQ1099369598030	Amit Sahoo	38 Years	Male	28/03/2019 09:14	28/03/2019 09:51	ng/dl						1.88
94	RQ1099369598030	Amit Sahoo	38 Years	Male	28/03/2019 09:14	28/03/2019 09:51	μg/dl					10.70	
95	RQ1101382949711	MEHROOM NISHA	50 Years	Female	29/04/2019 08:55	29/04/2019 10:43	ng/dl						172.77
96	RQ1101382949711	MEHROOM NISHA	50 Years	Female	29/04/2019 08:55	29/04/2019 10:43	μg/dl					10.10	
97	RQ1103159227767	Mamata Das Mahapatra	30 Years	Female	5/01/2019 11:20	5/01/2019 11:20	ng/dl						1.21
98	RQ1103159227767	Mamata Das Mahapatra	30 Years	Female	5/01/2019 11:20	5/01/2019 11:20	μg/dl					9.10	
99	RQ1114005283147	Manji Kaur	47 Years	Female	19/06/2017 09:00	19/06/2017 10:50	ng/dl

Code with TSH test: Retry 1# with TSH

df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','THYROID STIMULATING HORMONE (TSH)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])]

df_s3=df_s2.set_index(['ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index()

idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'],  ])
#, df_s3['Unit of Measure']
df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='')
df_s5.columns.name = None
df_s6= df_s5.reset_index()
df_s6.head(100)

Output:---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-63-82c092f6cd99> in <module>
      1 idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'],  ])
----> 2 df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='')
      3 df_s5.columns.name = None
      4 df_s6= df_s5.reset_index()
      5 df_s6.head(100)

C:\BhargaviM\MyAnaconda\lib\site-packages\pandas\core\series.py in unstack(self, level, fill_value)
   2897         """
   2898         from pandas.core.reshape.reshape import unstack
-> 2899         return unstack(self, level, fill_value)
   2900 
   2901     # ----------------------------------------------------------------------

C:\BhargaviM\MyAnaconda\lib\site-packages\pandas\core\reshape\reshape.py in unstack(obj, level, fill_value)
    499         unstacker = _Unstacker(obj.values, obj.index, level=level,
    500                                fill_value=fill_value,
--> 501                                constructor=obj._constructor_expanddim)
    502         return unstacker.get_result()
    503 

C:\BhargaviM\MyAnaconda\lib\site-packages\pandas\core\reshape\reshape.py in __init__(self, values, index, level, value_columns, fill_value, constructor)
    135 
    136         self._make_sorted_values_labels()
--> 137         self._make_selectors()
    138 
    139     def _make_sorted_values_labels(self):

C:\BhargaviM\MyAnaconda\lib\site-packages\pandas\core\reshape\reshape.py in _make_selectors(self)
    173 
    174         if mask.sum() < len(self.index):
--> 175             raise ValueError('Index contains duplicate entries, '
    176                              'cannot reshape')
    177 

ValueError: Index contains duplicate entries, cannot reshape

Question1 (Retry 1# with TSH ): Please help me with the correct approach, what I understand the error is because once it convert it is not finding any unique index but not sure how to resolve it.

Question2: When I proceeded to go ahead without tsh, after conversion of test col- rows to cols, I get blank values in respective test col ( example T4 col) because 1) the person has taken the test but there is no value in the dataset(python is treating it as Null value and can be imputed/rejected - no issue 2) the patient has not taken this test but has taken atleast one other tests may be T3, hcg etc but not this test- this is considered as string '' . I want to get rid of these rows for amy analysis .. is there an approach while transforming the data to take care of so that I only want the result of the code to have T4 and its value( numeric or null). I do not want a scenario where the person has not taken test at all. OR is there a way to impute these values so I will know the person has taken T4, T3 but not Hcg , bilrubin etc?

Please advise. Long questions but I hope it this explanatory

Output:

**Larz60+** · Oct-13-2019, 06:21 PM

On lines 6 and 7, the only column that is different is THYROXINE and (T4).
Thus a duplicate index.

Smiling29 · Oct-14-2019, 11:29 AM

(Oct-13-2019, 06:21 PM)Larz60+ Wrote: On lines 6 and 7, the only column that is different is THYROXINE and (T4). Thus a duplicate index.

Thank you Larz60! I think ID is not unique to consider so I added and new ID1 col to have a unique value by adding this code
to the initial dataframe (first one after importing data ) and the rest of the are same

df['ID1'] = range(1, len(df.index)+1)

Output:---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-504-78aab1b20f9e> in <module>
----> 1 idx = pd.MultiIndex.from_arrays(df_s3['ID1'] ,[df_s3['ID'], df_s3['Age'], df_s3['Sex'], df_s3['Collected Time'],df_s3['Received Time'], df_s3['Units'], df_s3['Test'],  ])
      2 # #, df_s3['Units']
      3 df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='')
      4 df_s5.columns.name = None
      5 df_s6= df_s5.reset_index()

C:\BhargaviM\MyAnaconda\lib\site-packages\pandas\core\indexes\multi.py in from_arrays(cls, arrays, sortorder, names)
   1267         # raise ValueError, if not
   1268         for i in range(1, len(arrays)):
-> 1269             if len(arrays[i]) != len(arrays[i - 1]):
   1270                 raise ValueError('all arrays must be same length')
   1271 

TypeError: object of type 'numpy.int64' has no len()

Canyou please help.

**Larz60+** · Oct-14-2019, 04:30 PM

Need more information if you want detailed help.

Source of original data, or per-processed data (including 'ID1')
Enough code to be able to run from origin to error point

Smiling29 · (This post was last modified: Oct-14-2019, 05:19 PM by Smiling29.)

Here you see few records of the data:
Data-Click here

that is loaded in 'df_s' dataframe. After that I create new column :

df_s['ID1'] = range(1, len(df_s.index)+1)

df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','THYROID STIMULATING HORMONE (TSH)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])]
 
df_s3=df_s2.set_index(['ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index()
 
idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'],  ])
#, df_s3['Unit of Measure']
df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='')
df_s5.columns.name = None
df_s6= df_s5.reset_index()
df_s6.head(100)

Thank you for your response. I added the full code and data used.Hope this helps.Please let me know if you need more details from me.

**Larz60+** · Oct-14-2019, 07:18 PM

Got the data,

Code is not runnable as presented, don't have time to figure out unless so.

Smiling29 · (This post was last modified: Oct-14-2019, 07:49 PM by Smiling29.)

import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
from IPython.display import display
import re
import datetime
from _datetime import *

df = pd.read_excel (r'<file path and filename.xlsx')
df_s = df.copy()[['ID1','ID', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Name', 'Units']]
df_s['ID1'] = range(1, len(df_s.index)+1)
df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','THYROID STIMULATING HORMONE (TSH)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])]
  
df_s3=df_s2.set_index(['ID1','ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index()
  
idx = pd.MultiIndex.from_arrays([df_s3['ID1'],[df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'],  ])
#, df_s3['Unit of Measure']
df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='')
df_s5.columns.name = None
df_s6= df_s5.reset_index()
df_s6.head(100)

Sorry I just realised, new col ID1 is not added in the code. Please try now if possible

I think I understood the error

As the original issue is saying there are duplicates because there are no unique values , I created a unique col ID1

say the data is like this:
ID1 ID Test Test_Result
1 Re001 T3 0.3
2 Re001 T4 0.4
3 Re002 TSH 4

Now after transforming may be it is not able to determine which value of ID1 to pick in case on Re001 should it be 1 or 2 ? Iam not sure if this is the error but appears to be, Also not sure how to solve original error. is there any other technique that we can apply?

ID1 ID T3 T4 TSH
? Re001 0.3 0.4
2 Re002 4.0

@Larz60+Thank you very much for you help so far. appreciate the time looking into this.

**Larz60+** · Oct-14-2019, 08:13 PM

I have to go out to run an errand, will try when I get back.

Smiling29 · Oct-15-2019, 07:17 PM

Thank you very much for taking time into looking into this. I have resolved this issue , it was bothering me from last 2 weeks.
here is what I changed, instead of adding unique col ID1 at the beginning , I ve added after picking up necessary rows from Test column (df_s2). Resolving this issue.

df = pd.read_excel (r'<file path and filename.xlsx')
df_s = df.copy()[['ID1','ID', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Name', 'Units']]
df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','THYROID STIMULATING HORMONE (TSH)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])]
[df_s2['ID1'] = range(1, len(df_s2.index)+1)
   
df_s2.set_index(['ID1'])
   
idx = pd.MultiIndex.from_arrays([df_s2['ID1'],[df_s2['ID'], df_s2['Name'], df_s2['Age'], df_s2['Sex'], df_s2['CT'],df_s2['RT'], df_s2['Units'], df_s2['Test'],  ])

df_s5 = df_s2.set_index(idx).Test_Result.unstack(fill_value='')
df_s5.columns.name = None
df_s6= df_s5.reset_index()
df_s6.head(100)

**Larz60+** · Oct-15-2019, 07:24 PM

Glad to hear all is well

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	pyscript index error while calling input from html form	pyscript_dude	2	973	May-21-2023, 08:17 AM Last Post: snippsat
	Index error help	MRsquared	1	762	May-15-2023, 03:28 PM Last Post: buran
	I'm getting a String index out of range error	debian77	7	2,316	Jun-26-2022, 09:50 AM Last Post: deanhystad
	Numpy reshape	mr_byte31	1	1,100	Apr-22-2022, 06:33 PM Last Post: deanhystad
	Reshape txt file into particular format using python	shantanu97	0	1,423	Dec-10-2021, 11:44 AM Last Post: shantanu97
	Python Error List Index Out of Range	abhi1vaishnav	3	2,299	Sep-03-2021, 08:40 PM Last Post: abhi1vaishnav
	Strange error ValueError: dimension mismatch	Anldra12	0	1,960	Aug-17-2021, 07:54 AM Last Post: Anldra12
	Index error - columns vs non-column	Vinny	3	4,913	Aug-09-2021, 04:46 PM Last Post: snippsat
	How to resolve Index Error in my code?	codify110	6	3,011	May-22-2021, 11:04 AM Last Post: supuflounder
	Why getting ValueError : Math domain error in trig. function, math.asin() ?	jahuja73	3	3,758	Feb-24-2021, 05:09 PM Last Post: bowlofred

ValueError: Index contains duplicate entries, cannot reshape” error when I try to use

User Panel Messages

Announcements