extract specific content in a pandas dataframe with a regex?

steve1040 · (This post was last modified: Oct-05-2017, 03:57 AM by steve1040.)

I'm trying to extract a few words from a large Text field and place result in a new column.
After creating the new column, I'll then run another expression looking for a numerical value between 1 and 29 on either side of the word m_m_s_e. This is a score that I need to capture to another column

I'm stuck at stage 1 - Please help

The following code creates a new column that excludes the target text I want

MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.replace(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", "")

So I tried the following which throws an error

MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.extract(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", expand=True)

Error

Error:ValueError                                Traceback (most recent call last)
<ipython-input-98-93cd99ac572d> in <module>()
      1 #MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.replace(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", "")
      2 
----> 3 MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.extract(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", expand=True)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in extract(self, pat, flags, expand)
   1706     @copy(str_extract)
   1707     def extract(self, pat, flags=0, expand=None):
-> 1708         return str_extract(self, pat, flags=flags, expand=expand)
   1709 
   1710     @copy(str_extractall)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in str_extract(arr, pat, flags, expand)
    689         raise ValueError("expand must be True or False")
    690     if expand:
--> 691         return _str_extract_frame(arr._orig, pat, flags=flags)
    692     else:
    693         result, name = _str_extract_noexpand(arr._data, pat, flags=flags)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in _str_extract_frame(arr, pat, flags)
    581 
    582     regex = re.compile(pat, flags=flags)
--> 583     groups_or_na = _groups_or_na_fun(regex)
    584     names = dict(zip(regex.groupindex.values(), regex.groupindex.keys()))
    585     columns = [names.get(1 + i, i) for i in range(regex.groups)]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in _groups_or_na_fun(regex)
    524     """Used in both extract_noexpand and extract_frame"""
    525     if regex.groups == 0:
--> 526         raise ValueError("pattern contains no capture groups")
    527     empty_row = [np.nan] * regex.groups
    528 

ValueError: pattern contains no capture groups

I got it working using the following

# A function to get the target test from a Provider notes.
import re
def get_targettext(notes):
   
  #  note_search = re.search(' ([A-Za-z]+)\.', name)
   note_search =  re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", notes)
    
    # If the title exists, extract and return it.
   if note_search:
        return note_search.group()
   return ""

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Add NER output to pandas dataframe	dg3000	0	76	Apr-22-2024, 08:14 PM Last Post: dg3000
	HTML Decoder pandas dataframe column	mbrown009	3	1,053	Sep-29-2023, 05:56 PM Last Post: deanhystad
	Use pandas to obtain cartesian product between a dataframe of int and equations?	haihal	0	1,127	Jan-06-2023, 10:53 PM Last Post: haihal
	Training a model to identify specific SMS types and extract relevant data?	lord_of_cinder	0	980	Oct-10-2022, 04:35 AM Last Post: lord_of_cinder
	Pandas Dataframe Filtering based on rows	mvdlm	0	1,445	Apr-02-2022, 06:39 PM Last Post: mvdlm
	Pandas dataframe: calculate metrics by year	mcva	1	2,326	Mar-02-2022, 08:22 AM Last Post: mcva
	Pandas dataframe comparing	anto5	0	1,271	Jan-30-2022, 10:21 AM Last Post: anto5
	PANDAS: DataFrame \| Replace and others questions	moduki1	2	1,808	Jan-10-2022, 07:19 PM Last Post: moduki1
	PANDAS: DataFrame \| Saving the wrong value	moduki1	0	1,558	Jan-10-2022, 04:42 PM Last Post: moduki1
	Remove specific values from dataframe	jonah88888	0	1,716	Sep-24-2021, 05:09 AM Last Post: jonah88888

extract specific content in a pandas dataframe with a regex?

User Panel Messages

Announcements