Python Forum
pandas str.extract multiple regex groups with OR
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pandas str.extract multiple regex groups with OR
#1
Hi. I have a column of text with inputs like, 3" deep, 4 inches deep, 5" depth. I wanted to extract just the value and put this in a new column.

From regex101.com, I confirm the following captures any of these for all my examples:
r'(\d)\"\s*deep|(\d)\"\s*depth|(\d)\sinches\sdeep'
However, when I try to use this regex pattern with str.extract as follows
df['depth']=df['text'].str.extract(r'(\d)\"\s*deep|(\d)\"\s*depth|(\d)\sinches\sdeep')
Error:
ValueError: Wrong number of items passed 3, placement implies 1
The docs say it should just return the first match. I tried str.extractall just in case, but this produces
Error:
TypeError: incompatible index of inserted column with frame index
How do I pass regex OR patterns to str.extract? Or, how should I match each of these cases to create a new column?

Thanks!
Reply
#2
With sample dataset one can achieve desired result as follows:

>>> df = pd.DataFrame('3" deep, 4 inches deep, 5" depth'.split(','), columns=['Depth'])
>>> df
          Depth
0         3" deep
1   4 inches deep
2        5" depth
>>> df.Depth.str.extract('(\d+)')                                     
   0
0  3
1  4
2  5
EDIT: if numeric value is needed then it probably should be int (or float) datatype. So one can do:

>>> df['Depth_number'] = df.Depth.str.extract('(\d+)').astype(int)
>>> df
            Depth  Depth_number
0         3" deep             3
1   4 inches deep             4
2        5" depth             5
>>> df.dtypes
Depth           object
Depth_number     int64
dtype: object
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
Unfortunately the text contains other unrelated numbers, such as 25 items, 2" long, 4 inches deep so I only want the values when they match the regex I provided. I also seem to have a common use case for "OR" regex group matching for extracting other data (e.g. extracting an ID from a text field when it takes one or another discreet pattern). The other way I see to achieve it is to run str.extract for each group creating as many new columns as match groups, and then combine these afterwards. This just seemed inefficient, but perhaps this is the only way possible with str.extract.

regex = r'(\d)\"\s*deep|(\d)\"\s*depth|(\d)\sinches\sdeep')
df['depth1']=df['text'].str.extract(r'(\d)\"\s*deep')
df['depth2']=df['text'].str.extract(r'(\d)\"\s*depth')
df['depth3']=df['text'].str.extract(r'(\d)\sinches\sdeep')

df['depth_final'] = df['depth1'].where(df['depth1'].notnull(), df['depth2'])
df['depth_final'] = df['depth_final'].where(df['depth_final'].notnull(), df['depth3'])
df = df.drop(['depth1','depth2', 'depth3'],axis=1)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Import multiple CSV files into pandas Krayna 0 1,717 May-20-2021, 04:56 PM
Last Post: Krayna
  Weighted average with multiple weights and groups amyd 0 2,110 Oct-11-2019, 10:30 AM
Last Post: amyd
  Reading Multiple Sheets using Pandas dhiliptcs 1 4,045 Sep-30-2019, 11:26 PM
Last Post: scidam
  Handling multiple errors when using datafiles in Pandas alphanov 1 1,845 Jul-16-2019, 03:17 AM
Last Post: scidam
  How to extract different data groups from multiple CSV files using python Rafiz 3 3,242 Jun-04-2019, 05:20 PM
Last Post: jefsummers
  extract specific content in a pandas dataframe with a regex? steve1040 0 13,538 Oct-05-2017, 03:17 AM
Last Post: steve1040

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020