Help on String variable

ashishstats · (This post was last modified: Aug-04-2019, 11:53 AM by ashishstats.)

Hi everyone

I am new to learn python but have experience in dealing data management and data analysis in Stata for last four years. While learning the codes in python (that already been developed in Stata) I got stuck in developing code, details are as follows:

In stata, I have a local macro called methods which contains 8 family planning method names separated with space: local methods "female_condoms emergency male_condoms pill injectables iud male_sterilization female_sterilization". Further I have a string variable called method_discussed may contain no method name (blank), 1 to 8 method names separated with space from above macro based upon the responses from respondents from a survey i.e., method_discussed is multiple choice question. A sample of 5 observations is as follows where index 3 is blank (Assume respondent did not tell the any method name:

index method_discussed
1 iud male_condoms pill
2 male_condoms
3
4 female_sterilization male_sterilization
5 male_sterilization iud injectables
.
.
.
.
so on.

While jumping to Python from Stata, I made a list,say, method_name=['female_condoms' 'emergency' 'male_condoms' 'pill' 'injectables' 'iud' 'male_sterilization' 'female_sterilization']. What I want to do is I want to generate 8 variables based on the name of items in list (method name actually) have data points as yes or no (1 or 0) if item of list is present in variable method_discussed. For eaxample

index method_discussed female_sterilization male_sterilization iud injectables antra_inj chhaya_pill pill male_condoms emergency female_condoms
0 0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 1 1 1 1 0
0 0 1 0 1 0 0 0 0 0

1 iud male_condoms pill
2 male_condoms
3
4 female_sterilization male_sterilization
5 male_sterilization iud injectables

Hi everyone

I am new to learn python 3.6, but, have experience in dealing data management and data analysis in Stata for last four years. While learning the codes in python (that already been developed in Stata) I got stuck in developing code, details are as follows:

In stata 15, I have a local macro called methods which contains 8 family planning method names separated with space: local methods "female_condoms emergency male_condoms pill injectables iud male_sterilization female_sterilization". Further I have a string variable called method_discussed may contain no method name (blank), 1 to 8 method names separated with space from above macro based upon the responses from respondents from a survey i.e., method_discussed is multiple choice question. A sample of 5 observations is as follows where index 3 is blank (Assume respondent did not tell the any method name:

index method_discussed
1 iud male_condoms pill
2 male_condoms
3
4 female_sterilization male_sterilization
5 male_sterilization iud injectables
.
.
.
.
so on.

While jumping to Python from Stata, I made a list,say, method_name=['female_condoms' 'emergency' 'male_condoms' 'pill' 'injectables' 'iud' 'male_sterilization' 'female_sterilization']. What I want to do is I want to generate 8 variables based on the name of items in list (method name actually) have data points as yes or no (1 or 0) if item of list is present in variable method_discussed. For eaxample, expected output should be like this

Data Input Expected output
index method_discussed female_condoms emergency male_condoms pill injectables iud male_sterilization female_sterilization
1 iud male_condoms pill 0 0 1 1 0 1 0 0
2 male_condoms 0 0 1 0 0 0 0 0
3
4 female_sterilization male_sterilization 0 0 0 0 0 0 1 1
5 male_sterilization iud injectables 0 0 0 0 1 1 1 0
.
.
.
.
so on.

I am not able to understand how to proceed.

Anticipating help from your side

Ashish

**Yoriz** · (This post was last modified: Aug-04-2019, 12:45 PM by Yoriz.)

If i understand correctly, the following might be along the lines of what you are looking for.

method_names = ['female_condoms', 'emergency', 'male_condoms', 'pill',
                'injectables', 'iud', 'male_sterilization', 'female_sterilization']

methods_discussed = [['iud', 'male_condoms', 'pill'],
                     ['male_condoms'],
                     [],
                     ['female_sterilization', 'male_sterilization'],
                     ['male_sterilization', 'iud', 'injectables']]

data_points = []

for method_dicuseed in methods_discussed:
    points = []
    for method_name in method_names:
        points.append(int(method_name in method_dicuseed))
    data_points.append(points)

print(data_points)

Output:
[[0, 0, 1, 1, 0, 1, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 1, 1, 1, 0]]

With the additional comments

import pprint

method_names = ['female_condoms', 'emergency', 'male_condoms', 'pill',
                'injectables', 'iud', 'male_sterilization', 'female_sterilization']

methods_discussed = [['iud', 'male_condoms', 'pill'],
                     ['male_condoms'],
                     [],
                     ['female_sterilization', 'male_sterilization'],
                     ['male_sterilization', 'iud', 'injectables']]

data_points = []

for index, method_dicuseed in enumerate(methods_discussed):
    points = [index+1, method_dicuseed]
    for method_name in method_names:
        points.append(int(method_name in method_dicuseed))
    data_points.append(points)


pprint.pprint(data_points)

Output:[[1, ['iud', 'male_condoms', 'pill'], 0, 0, 1, 1, 0, 1, 0, 0],
 [2, ['male_condoms'], 0, 0, 1, 0, 0, 0, 0, 0],
 [3, [], 0, 0, 0, 0, 0, 0, 0, 0],
 [4, ['female_sterilization', 'male_sterilization'], 0, 0, 0, 0, 0, 0, 1, 1],
 [5, ['male_sterilization', 'iud', 'injectables'], 0, 0, 0, 0, 1, 1, 1, 0]]

ashishstats · Aug-05-2019, 09:39 AM

Thanks Yoriz for prompt help.

Actually data is in csv file came from more than 5000 respondents. one of the variable is method_discussed having more than 5000 data points and these data points may be of any/all combination of items from dictionary
method_names = ['female_condoms', 'emergency', 'male_condoms', 'pill', 'injectables', 'iud', 'male_sterilization', 'female_sterilization'].
For eaxample

rspondent method_discussed
respondent1 female_condoms injectables
respondent2 male_sterilization pill
respondent3 blank (no method)
.
.
.
so on
respondent5000 male_sterilization female_sterilization
.
.

I imported pandas as pd read the csv file and made a dictionary of these 8 methods. I want to generate 8 variables based on name of these 8 items in dictionary whose data points are 0 (absence of particular item in method_discussed) and 1 (presence of particular item in method_discussed), as you have done but not in memory but in same csv file and save it.

I dont want these results in memory as you have done bit in dataframe. Second thing I want to bring in your notice that I dont want to assign method_discuused as you have done for only 5 cases

methods_discussed = [['iud', 'male_condoms', 'pill'],
['male_condoms'],
[],
['female_sterilization', 'male_sterilization'],
['male_sterilization', 'iud', 'injectables']]

as I said it has more than 5000 cases (data points), in other words, method_discussed take any combination of items from dictionary above.

If you need I can send the csv file with expected outcome in EXCEL.

Thanks

Ashish

ashishstats · (This post was last modified: Aug-13-2019, 08:59 AM by ashishstats.)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Replacing String Variable with a new String Name	kevv11	2	771	Jul-29-2023, 12:03 PM Last Post: snippsat
	Need help on how to include single quotes on data of variable string	hani_hms	5	2,011	Jan-10-2023, 11:26 AM Last Post: codinglearner
	python r string for variable	mg24	3	2,785	Oct-28-2022, 04:19 AM Last Post: deanhystad
	USE string data as a variable NAME	rokorps	1	956	Sep-30-2022, 01:08 PM Last Post: deanhystad
	Removing Space between variable and string in Python	coder_sw99	6	6,266	Aug-23-2022, 01:15 PM Last Post: louries
	Remove a space between a string and variable in print	sie	5	1,764	Jul-27-2022, 02:36 PM Last Post: deanhystad
	Split string using variable found in a list	japo85	2	1,295	Jul-11-2022, 08:52 AM Last Post: japo85
	Can you print a string variable to printer	hammer	2	1,935	Apr-30-2022, 11:48 PM Last Post: hammer
	How to convert string to variable?	chatguy	5	2,371	Apr-12-2022, 08:31 PM Last Post: buran
	I want to search a variable for a string D90	lostbit	3	2,615	Mar-31-2021, 07:14 PM Last Post: lostbit

Help on String variable

User Panel Messages

Announcements