Aug-13-2019, 08:58 AM
(This post was last modified: Aug-13-2019, 08:59 AM by ashishstats.)
Hi Yoriz
little updates from my side whatever I have done.
**********************
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
csv1="multiple_responses.csv"
df1 = pd.read_csv(csv1, index_col='id' , na_values = [' '] , low_memory=False)
method_names = ['female_condoms', 'emergency', 'male_condoms', 'pill', 'injectables', 'iud', 'male_sterilization', 'female_sterilization']
for method in method_names:
print(method)
for method in method_names:
df1[method]=df1["methods_discussed"].str.contains(pat = method)
df1.head(10)
output
id | methods_discussed | female_condoms | emergency | male_condoms | pill | injectables | iud | male_sterilization | female_sterilization
1 | emergency | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE
2 | female_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE
3 | male_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE
4 | iud | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE
5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN
6 | injectables male_condoms | FALSE | FALSE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE
7 | male_condoms | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE
8 | female_sterilization male_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE
9 | injectables | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE
10 | iud male_condoms | FALSE | FALSE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE
Problem description
I used CSV file (link of CSV file is https://github.com/pandas-dev/pandas/fil...ponses.zip)
which contains two columns "id" and "methods_discussed". After running above code the ouput shown is wrong as at index [2] column male_sterilization shows TRUE (I have made it bold and italic. It should be FALSE as "methods_discussed" contains only female_sterilization.
Expected Output
id | methods_discussed | female_condoms | emergency | male_condoms | pill | injectables | iud | male_sterilization | female_sterilization
1 | emergency | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE
2 | female_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE
3 | male_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE
4 | iud | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE
5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN
6 | injectables male_condoms | FALSE | FALSE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE
7 | male_condoms | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE
8 | female_sterilization male_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE
9 | injectables | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE
10 | iud male_condoms | FALSE | FALSE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE
I have also used str.match but it did not work for me.
Any idea if I don't want to generate values if methods_discussed contains NaN.
Thanks
Ashish
little updates from my side whatever I have done.
**********************
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
csv1="multiple_responses.csv"
df1 = pd.read_csv(csv1, index_col='id' , na_values = [' '] , low_memory=False)
method_names = ['female_condoms', 'emergency', 'male_condoms', 'pill', 'injectables', 'iud', 'male_sterilization', 'female_sterilization']
for method in method_names:
print(method)
for method in method_names:
df1[method]=df1["methods_discussed"].str.contains(pat = method)
df1.head(10)
output
id | methods_discussed | female_condoms | emergency | male_condoms | pill | injectables | iud | male_sterilization | female_sterilization
1 | emergency | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE
2 | female_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE
3 | male_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE
4 | iud | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE
5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN
6 | injectables male_condoms | FALSE | FALSE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE
7 | male_condoms | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE
8 | female_sterilization male_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE
9 | injectables | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE
10 | iud male_condoms | FALSE | FALSE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE
Problem description
I used CSV file (link of CSV file is https://github.com/pandas-dev/pandas/fil...ponses.zip)
which contains two columns "id" and "methods_discussed". After running above code the ouput shown is wrong as at index [2] column male_sterilization shows TRUE (I have made it bold and italic. It should be FALSE as "methods_discussed" contains only female_sterilization.
Expected Output
id | methods_discussed | female_condoms | emergency | male_condoms | pill | injectables | iud | male_sterilization | female_sterilization
1 | emergency | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE
2 | female_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE
3 | male_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE
4 | iud | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE
5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN
6 | injectables male_condoms | FALSE | FALSE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE
7 | male_condoms | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE
8 | female_sterilization male_sterilization | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE
9 | injectables | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE
10 | iud male_condoms | FALSE | FALSE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE
I have also used str.match but it did not work for me.
Any idea if I don't want to generate values if methods_discussed contains NaN.
Thanks
Ashish