Hello, I have a dumb question- can someone tell me in layperson terms what this code is doing? I'm worried it is excluding the values I want 'LMU' instead of including them. Thank you.
#Remove all the spaces for thing1 and thing2 in order to compare them
intermediate3_df['thing1']=np.where(intermediate3_df['CODE']=='LMU',intermediate3_df['thing1'].str.replace(" ",""),intermediate3_df['thing1'])
intermediate3_df['thing2']=np.where(intermediate3_df['CODE']=='LMU',intermediate3_df['thing2'].str.replace(" ",""),intermediate3_df['thing2'])
The documentation for numpy.where()
https://numpy.org/doc/stable/reference/g...where.html
Quote:numpy.where
numpy.where(condition, [x, y, ]/)
Return elements chosen from x or y depending on condition.
Note
When only condition is provided, this function is a shorthand for np.asarray(condition).nonzero(). Using nonzero directly should be preferred, as it behaves correctly for subclasses. The rest of this documentation covers only the case where all three arguments are provided.
Parameters:
conditionarray_like, bool
Where True, yield x, otherwise yield y.
x, yarray_like
Values from which to choose. x, y and condition need to be broadcastable to some shape.
Returns:
outndarray
An array with elements from x where condition is True, and elements from y elsewhere.
This is your condition
intermediate3_df['CODE']=='LMU
This is "x"
intermediate3_df['thing1'].str.replace(" ","")
This is "y"
intermediate3_df['thing1'
If your condition is True (df["CODE"] == "LSU"), "where" will select x (with stri.replace), othewise it will select y (no str.replace).
A simple example:
import pandas as pd
import numpy as np
df = pd.DataFrame({"Numbers": range(1, 11)})
df["Even"] = np.where(df["Numbers"] % 2, None, df["Numbers"])
df["Odd"] = np.where(df["Numbers"] % 2, df["Numbers"], None)
print(df)
Output:
Numbers Even Odd
0 1 None 1
1 2 2 None
2 3 None 3
3 4 4 None
4 5 None 5
5 6 6 None
6 7 None 7
7 8 8 None
8 9 None 9
9 10 10 None
Does this exclude LMU? All I really need to know, thanks.
It does nothing to LMU other than use it in the comparison. These are the parts that are changed (in red)
intermediate3_df['thing1']=np.where(intermediate3_df['CODE']=='LMU',intermediate3_df['thing1'].str.replace(" ",""),intermediate3_df['thing1'])
intermediate3_df['thing2']=np.where(intermediate3_df['CODE']=='LMU',intermediate3_df['thing2'].str.replace(" ",""),intermediate3_df['thing2'])
The only changes it can make is this:
str.replace(" ","")
Replace space with no space.