Dec-19-2019, 05:43 PM
(This post was last modified: Dec-19-2019, 05:43 PM by pythonidae.)
Unfortunately the text contains other unrelated numbers, such as
25 items, 2" long, 4 inches deep
so I only want the values when they match the regex I provided. I also seem to have a common use case for "OR" regex group matching for extracting other data (e.g. extracting an ID from a text field when it takes one or another discreet pattern). The other way I see to achieve it is to run str.extract
for each group creating as many new columns as match groups, and then combine these afterwards. This just seemed inefficient, but perhaps this is the only way possible with str.extract
.regex = r'(\d)\"\s*deep|(\d)\"\s*depth|(\d)\sinches\sdeep') df['depth1']=df['text'].str.extract(r'(\d)\"\s*deep') df['depth2']=df['text'].str.extract(r'(\d)\"\s*depth') df['depth3']=df['text'].str.extract(r'(\d)\sinches\sdeep') df['depth_final'] = df['depth1'].where(df['depth1'].notnull(), df['depth2']) df['depth_final'] = df['depth_final'].where(df['depth_final'].notnull(), df['depth3']) df = df.drop(['depth1','depth2', 'depth3'],axis=1)