ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use (/thread-21762.html) Pages:
1
2
|
ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Smiling29 - Oct-13-2019 I have data which includes id , gender , collected time test name and Test values , Units of measurement Test Names will include all tests that a patient taken and Value col will have its corresponding test result. I want to analysis on only certain tests and retrieve corresponding test values from "value" col . The analysis will be on those tests and their values , so I thought it would be good idea to pivot on those test names and test values. However when I add TS col I get an error and adding any other test name in the multiindex code does not throw an error. Steps: Steps: df_s.head(30).dropna() Here in the below screenshot we can see there multiple test taken for each requisition id: In the below code Iam only getting tests which I want to do analysis 1# df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])]2# Resetting the index: df_s3=df_s2.set_index(['ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index()3# applything multiindex idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'], ]) #, df_s3['Unit of Measure'] df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='') df_s5.columns.name = None df_s6= df_s5.reset_index() df_s6.head(100)I get this result if do not add TSH (from Test Col) Code with TSH test: Retry 1# with TSHdf_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','THYROID STIMULATING HORMONE (TSH)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])] df_s3=df_s2.set_index(['ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index() idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'], ]) #, df_s3['Unit of Measure'] df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='') df_s5.columns.name = None df_s6= df_s5.reset_index() df_s6.head(100) Question1 (Retry 1# with TSH ): Please help me with the correct approach, what I understand the error is because once it convert it is not finding any unique index but not sure how to resolve it.Question2: When I proceeded to go ahead without tsh, after conversion of test col- rows to cols, I get blank values in respective test col ( example T4 col) because 1) the person has taken the test but there is no value in the dataset(python is treating it as Null value and can be imputed/rejected - no issue 2) the patient has not taken this test but has taken atleast one other tests may be T3, hcg etc but not this test- this is considered as string '' . I want to get rid of these rows for amy analysis .. is there an approach while transforming the data to take care of so that I only want the result of the code to have T4 and its value( numeric or null). I do not want a scenario where the person has not taken test at all. OR is there a way to impute these values so I will know the person has taken T4, T3 but not Hcg , bilrubin etc? Please advise. Long questions but I hope it this explanatory
RE: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Larz60+ - Oct-13-2019 On lines 6 and 7, the only column that is different is THYROXINE and (T4). Thus a duplicate index. RE: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Smiling29 - Oct-14-2019 (Oct-13-2019, 06:21 PM)Larz60+ Wrote: On lines 6 and 7, the only column that is different is THYROXINE and (T4). Thus a duplicate index.Thank you Larz60! I think ID is not unique to consider so I added and new ID1 col to have a unique value by adding this code to the initial dataframe (first one after importing data ) and the rest of the are same df['ID1'] = range(1, len(df.index)+1) Canyou please help.
RE: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Larz60+ - Oct-14-2019 Need more information if you want detailed help.
RE: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Smiling29 - Oct-14-2019 Here you see few records of the data: Data-Click here that is loaded in 'df_s' dataframe. After that I create new column : df_s['ID1'] = range(1, len(df_s.index)+1) df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','THYROID STIMULATING HORMONE (TSH)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])] df_s3=df_s2.set_index(['ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index() idx = pd.MultiIndex.from_arrays([df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'], ]) #, df_s3['Unit of Measure'] df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='') df_s5.columns.name = None df_s6= df_s5.reset_index() df_s6.head(100)Thank you for your response. I added the full code and data used.Hope this helps.Please let me know if you need more details from me. RE: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Larz60+ - Oct-14-2019 Got the data, Code is not runnable as presented, don't have time to figure out unless so. RE: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Smiling29 - Oct-14-2019 import pandas as pd import numpy as np import seaborn as sns %matplotlib inline import matplotlib.pyplot as plt from matplotlib.ticker import StrMethodFormatter from IPython.display import display import re import datetime from _datetime import * df = pd.read_excel (r'<file path and filename.xlsx') df_s = df.copy()[['ID1','ID', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Name', 'Units']] df_s['ID1'] = range(1, len(df_s.index)+1) df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','THYROID STIMULATING HORMONE (TSH)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])] df_s3=df_s2.set_index(['ID1','ID', 'Name', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Result', 'Units']).reset_index() idx = pd.MultiIndex.from_arrays([df_s3['ID1'],[df_s3['ID'], df_s3['Name'], df_s3['Age'], df_s3['Sex'], df_s3['CT'],df_s3['RT'], df_s3['Units'], df_s3['Test'], ]) #, df_s3['Unit of Measure'] df_s5 = df_s3.set_index(idx).Test_Result.unstack(fill_value='') df_s5.columns.name = None df_s6= df_s5.reset_index() df_s6.head(100)Sorry I just realised, new col ID1 is not added in the code. Please try now if possible I think I understood the error As the original issue is saying there are duplicates because there are no unique values , I created a unique col ID1 say the data is like this: ID1 ID Test Test_Result 1 Re001 T3 0.3 2 Re001 T4 0.4 3 Re002 TSH 4 Now after transforming may be it is not able to determine which value of ID1 to pick in case on Re001 should it be 1 or 2 ? Iam not sure if this is the error but appears to be, Also not sure how to solve original error. is there any other technique that we can apply? ID1 ID T3 T4 TSH ? Re001 0.3 0.4 2 Re002 4.0 @Larz60+Thank you very much for you help so far. appreciate the time looking into this. RE: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Larz60+ - Oct-14-2019 I have to go out to run an errand, will try when I get back. RE: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Smiling29 - Oct-15-2019 Thank you very much for taking time into looking into this. I have resolved this issue , it was bothering me from last 2 weeks. here is what I changed, instead of adding unique col ID1 at the beginning , I ve added after picking up necessary rows from Test column (df_s2). Resolving this issue. df = pd.read_excel (r'<file path and filename.xlsx') df_s = df.copy()[['ID1','ID', 'Age', 'Sex', 'CT', 'RT', 'Test', 'Test_Name', 'Units']] df_s2 = df_s[df_s['Test'].isin(['TOTAL TRIIODOTHYRONINE (T3)','TOTAL THYROXINE (T4)','THYROID STIMULATING HORMONE (TSH)','FREE THYROID 3','FREE THYROID 4','Human Chorionic Gonadotropin (hCG)','BILRUBIN'])] [df_s2['ID1'] = range(1, len(df_s2.index)+1) df_s2.set_index(['ID1']) idx = pd.MultiIndex.from_arrays([df_s2['ID1'],[df_s2['ID'], df_s2['Name'], df_s2['Age'], df_s2['Sex'], df_s2['CT'],df_s2['RT'], df_s2['Units'], df_s2['Test'], ]) df_s5 = df_s2.set_index(idx).Test_Result.unstack(fill_value='') df_s5.columns.name = None df_s6= df_s5.reset_index() df_s6.head(100) RE: ValueError: Index contains duplicate entries, cannot reshape” error when I try to use - Larz60+ - Oct-15-2019 Glad to hear all is well |