![]() |
Problems with converting to numeric - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Homework (https://python-forum.io/forum-9.html) +--- Thread: Problems with converting to numeric (/thread-39774.html) Pages:
1
2
|
Problems with converting to numeric - EmBeck87 - Apr-12-2023 This code for converting the values to numeric is also removing characters, I believe. Can someone advise on how to alter it so it keeps the characters, except for the first line of code that removes the last '0;, that can stay as is. Thank you. Here is the Before and After values, something in the code has caused Mic/Result to drop most of the values in the new MIC column: Mic/Result: MIC: 0.125 NaN 0.125 NaN <=0.03125 0.03125 0.125 NaN 0.125 NaN # Remove 0 at the end for MIC/RESULT . results_df['MIC/RESULT']=np.where(results_df['WHO CODE']=='LMU', results_df['MIC/RESULT'].str.rstrip('.0'),results_df['MIC/RESULT']) #Convert MIC/RESULT to numeric results_df['MIC']=np.where(results_df['WHO CODE']=='LMU',results_df['MIC/RESULT'].str.split(' ').str[1],results_df['MIC/RESULT'].replace(' ','')) results_df['MIC']=pd.to_numeric(results_df['MIC']) RE: Problems with converting to numeric - deanhystad - Apr-13-2023 Could you provide some data to work with? It is difficult to diagnose when I don't know what kind of values are in MIC/RESULT. In particular I don't know why you would want to do "str.split(' ').str[1]" or ".replace(' ', ''). I am only fairly confident that one or both are wrong or you wouldn't be posting your question. They also don't look like the kind of processing I am used to seeing in a dataframe. How about showing examples of what kind of stings might appear in MIC/RESULT, and what you want to appear in "MIC" after processing. From your example I tried this: import pandas as pd df = pd.DataFrame({"A": ["0.125"]}) df["B"] = df["A"].str.split(' ').str[1] df["C"] = df["A"].replace(' ', '') print(df) I would expect an error, not NaN in "B". def["A"].str.split(" ") returns a list. You cannot perform string operations on a list, so str[1] should result in a type error or value error.
RE: Problems with converting to numeric - EmBeck87 - Apr-13-2023 Let me clarify. I think what I need to do is replace Nan values in a column with the actual floating point values in the dataframe. How can I convert Nan values to show the actual values? RE: Problems with converting to numeric - deanhystad - Apr-13-2023 Please show what kind of strings you expect to see in the MIC/RESULT column and what the corresponding values should be in the MIC column. RE: Problems with converting to numeric - EmBeck87 - Apr-13-2023 This is from another project that I'm using for a base the numbers are very similar. ID MIC1 MIC_FLOAT MIC_FOR_RANGES 1 = 0.0625 0.0625 0.0625 2 =0.5 0.5000 0.5000 3 =0.0625 0.0625 0.0625 4 =0.0625 0.0625 0.0625 MIC1= raw data values MIC_FLOAT= floating format for the MIC1 (see code pasted in first post) MIC_SIGN = a variable for getting rid of the '=' from MIC1 that is dropped from the dataframe. MIC_FOR_RANGES= a variable created for the range comparison (see code past in first post). RE: Problems with converting to numeric - EmBeck87 - Apr-13-2023 (Apr-13-2023, 02:07 PM)EmBeck87 Wrote: This is from another project that I'm using for a base the numbers are very similar. RE: Problems with converting to numeric - deanhystad - Apr-13-2023 If these are representative values, why are you using split(" ")? I don't see where split() would ever be applicable. RE: Problems with converting to numeric - EmBeck87 - Apr-13-2023 To take out spaces I think. RE: Problems with converting to numeric - deanhystad - Apr-13-2023 Split does not take out spaces. Split makes multiple splits a string into multiple substrings using a delimiter. string = '123-456-7890' substrings = string.split('-') print(substrings) So we now know that this makes no sense and you should never do it:results_df['MIC/RESULT'].str.split(' ').str[1]How about this? Does it ever make sense to do this? results_df['MIC/RESULT'].replace(' ','')I made a dataframe that has strings that contain numbers and digits and I apply try your code. import pandas as pd df = pd.DataFrame({"A": [" 1.23 ", "1 23"]}) df["B"] = df['A'].replace(' ','') print(df) As expected, the replace code does nothing because none of the rows in "A" are a single blank. Is that what you wanted to happen? Find rows in MIC/RESULT that are a single space and replace that with an empty string?And what do you mean by this? Quote:ID MIC1 MIC_FLOAT MIC_FOR_RANGESIs this a file that you are reading into a dataframe? I copied the above to a file named text.txt and use pandas.read_csv to load it into a dataframe. This is a little tricky since there two separators in this file: an equal sign that may be preceeded or followed by space, and a space. This requires using a special separator what is a regular expression. import pandas as pd df = pd.read_csv("test.txt", sep=r"\s*=\s*|\s+", engine="python") print(df) print(df.dtypes) The sep argument is a regular expression that says the separator might be an equal sign with leading and trailing spaces (\s*=\s*) or it might be a space (\s). Since this appears to be a poorly formatted file (not always using the same formatting) I decided to expand the second separator to one or more spaces (\s+). I had to set the parser engine to "python" so I could use an expression for the separator.https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html Notice that the datatypes are all numbers without having to do any processing of the dataframe columns. RE: Problems with converting to numeric - EmBeck87 - Apr-13-2023 I will send you the actual file. That was just a mockup. I do need the coding mentioned accomplished, not just the values generated, but the objectives accomplished: MIC_FLOAT with floating format for the MIC1, and MIC_FOR_RANGES created for range comparison. I will add the full code also. MIC_FLOAT= floating format for the MIC1 (see code pasted in first post) MIC_SIGN = a variable for getting rid of the '=' from MIC1 that is dropped from the dataframe. MIC_FOR_RANGES= a variable created for the range comparison (see code past in first post). |