I'm trying to extract a few words from a large Text field and place result in a new column.
After creating the new column, I'll then run another expression looking for a numerical value between 1 and 29 on either side of the word m_m_s_e. This is a score that I need to capture to another column
I'm stuck at stage 1 - Please help
The following code creates a new column that excludes the target text I want
So I tried the following which throws an error
I got it working using the following
After creating the new column, I'll then run another expression looking for a numerical value between 1 and 29 on either side of the word m_m_s_e. This is a score that I need to capture to another column
I'm stuck at stage 1 - Please help
The following code creates a new column that excludes the target text I want
MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.replace(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", "")
So I tried the following which throws an error
MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.extract(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", expand=True)Error
Error:ValueError Traceback (most recent call last)
<ipython-input-98-93cd99ac572d> in <module>()
1 #MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.replace(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", "")
2
----> 3 MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.extract(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", expand=True)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in extract(self, pat, flags, expand)
1706 @copy(str_extract)
1707 def extract(self, pat, flags=0, expand=None):
-> 1708 return str_extract(self, pat, flags=flags, expand=expand)
1709
1710 @copy(str_extractall)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in str_extract(arr, pat, flags, expand)
689 raise ValueError("expand must be True or False")
690 if expand:
--> 691 return _str_extract_frame(arr._orig, pat, flags=flags)
692 else:
693 result, name = _str_extract_noexpand(arr._data, pat, flags=flags)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in _str_extract_frame(arr, pat, flags)
581
582 regex = re.compile(pat, flags=flags)
--> 583 groups_or_na = _groups_or_na_fun(regex)
584 names = dict(zip(regex.groupindex.values(), regex.groupindex.keys()))
585 columns = [names.get(1 + i, i) for i in range(regex.groups)]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in _groups_or_na_fun(regex)
524 """Used in both extract_noexpand and extract_frame"""
525 if regex.groups == 0:
--> 526 raise ValueError("pattern contains no capture groups")
527 empty_row = [np.nan] * regex.groups
528
ValueError: pattern contains no capture groups
I got it working using the following
# A function to get the target test from a Provider notes. import re def get_targettext(notes): # note_search = re.search(' ([A-Za-z]+)\.', name) note_search = re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", notes) # If the title exists, extract and return it. if note_search: return note_search.group() return ""