Python Forum
Problems with converting to numeric
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problems with converting to numeric
#11
By setting up the separators correctly you don't need this:
Quote:MIC_SIGN = a variable for getting rid of the '=' from MIC1 that is dropped from the dataframe.
There won't be any "=" in the dataframe.
Reply
#12
This is not my code, and it does work so I think it is applicable.
Reply
#13
I can't wrap my head around this BBCode stuff right now. These are the actual columns and some of the data. I'll put the full coding below it.

SUBJID SAMPLEID BENCH WHO MIC1 BMD1 POS_CTRL1 NEG_CTRL3
MC_RUN_001 7305 Clinical Lefamulin LMU 0.125 . POS NEG
MC_RUN_001 7308 Clinical Lefamulin LMU 0.25 . POS NEG
MC_RUN_006 20490 Challenge Lefamulin LMU 0.125 . POS NEG
MC_RUN_006 20581 Challenge Lefamulin LMU <=0.015625 . POS NEG

#Reading the file:
folder = "exports"
filename = "RESULTS.xlsx"
fPath = project + "/" + folder + "/" + filename
print(fPath)

bmd_df.columns = map(lambda x: str(x).upper(), bmd_df.columns)
bmd_df=bmd_df[bmd_df['SAMPLEID'].isna()!=True]
bmd_df['SAMPLEID']=bmd_df['SAMPLEID'].astype(str)
bmd_df['MIC1']=bmd_df['MIC1'].astype(str)

bmd_df

----------------------------------------------------------------------------------------------------------------------------------

# Create a MIC variable (floating format) for the MIC
bmd_df['MIC_FLOAT'] = np.where(bmd_df['WHO']=='Lefamulin LMU',bmd_df['MIC1'].str.split().str[1],np.nan)
bmd_df['MIC_FLOAT'] = pd.to_numeric(bmd_df['MIC_FLOAT'], errors='coerce')

# Create a variable with the sign before the MIC, then create another variables for the ranges comparison (add 0.1 to the MIC > x)
bmd_df['MIC_SIGN']=np.where(bmd_df['WHO']=='Lefamulin LMU',bmd_df['MIC1'].str[0],None)
bmd_df['MIC_FOR_RANGES']=np.where(bmd_df['MIC_SIGN']=='>',bmd_df['MIC_FLOAT']+0.1,bmd_df['MIC_FLOAT'])
bmd_df=bmd_df.drop(['MIC_SIGN'], axis=1)

-----------------------------------------------------------------------------------------------------
buran write Apr-14-2023, 06:56 AM:
Please, use proper tags when post code, traceback, output, etc.
See BBcode help for more info.
Reply
#14
This doesn't make sense. There appears to be 8 column headers, but 9 columns in the data
SUBJID SAMPLEID BENCH WHO MIC1 BMD1 POS_CTRL1 NEG_CTRL3
MC_RUN_001 7308 Clinical Lefamulin LMU 0.25 . POS NEG
MC_RUN_006 20490 Challenge Lefamulin LMU 0.125 . POS NEG
MC_RUN_006 20581 Challenge Lefamulin LMU <=0.015625 . POS NEG
This gives a mapping of values to columns that is this:
SUBJID = MC_RUN_001
SAMPLEID = 7308
BENCH = Clinical
WHO = Lefamulin
MIC1 = LMU
MBD1 = 0.25
POS_CTRL_1 = .    <-- What is this period doing here?
NEG_CTRL_3 = POS
??? = NEG  <-- No columns left
And then you have this:
MC_RUN_006 20581 Challenge Lefamulin LMU <=0.015625 . POS NEG
What is "<=0.015625"? I know it is an inequality, but I don't think there is a type for that. How should it appear in the dataframe?

You need to describe the information you have coming in, and what you want it to look like after it is processed. Do not post any more code. Describe what is supposed to happen.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020