Python Forum
Filter rows by multiple text conditions in another data frame i.e contains strings an - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Filter rows by multiple text conditions in another data frame i.e contains strings an (/thread-27505.html)



Filter rows by multiple text conditions in another data frame i.e contains strings an - Pan - Jun-09-2020

I have a scheme of conditions in a data frame Condition which contains more than 100 rows and 100 columns, something like below.

Contain Exclude_1 Exclude_2
0 mansard roof Demolition mansard roof above rear projection
1 mansard style roof Demolition external alterations to existing
2 extension at roof level Demolition

I would like to filter the rows in a column 'Development Description' in another data frame LDD if the rows contain the strings and do not contain any strings in the exclude columns in the condition data frame. But both conditions should be met i.e. stated in the same row of the Condition

What I have done

1. Turned the conditions into two lists.
2. Filtered the LDD dataset with the keywords in the list contain_list
ldd_contain = ldd_2009[ldd_2009.apply(lambda r: any([kw in r['Development Description'] for kw in contain_list]), axis=1)]
3. Filtered the ldd_contain data frame with the list exclude_list
ldd_exclude = ldd_contain[ldd_contain.apply(lambda r: any([kw in r['Development Description'] for kw in exclude_list]), axis=1)]
Merged the above generated two data frames and dropped those rows only in the ldd_exclude data frame
df_all = ldd_contain.merge(ldd_exclude.drop_duplicates(), 
                           on=['Borough Reference','Development Description'], 
                           how='left', indicator=True)
df_all_adjusted = df_all[df_all['_merge'] == 'left_only']
df_all_adjusted_removed = df_all_adjusted.drop(['_merge'], axis=1)
Problems

But I realised my method would essentially exclude all the rows whenever it contains the strings in the exclude column without considering the strings in the contains column at the same time.

For instance, the below row is not selected even though it contains 'extension at roof level' (row 3 in the condition data frame) because it contains 'external alterations to the existing'(which is on the 'exclude' column but row 2).

But a row should only be excluded if it contains both 'mansard style roof' and 'external alterations to existing' or 'demolition' i.e the same row in the conditiondata frame.

[inline]Proposed residential development comprising the change of use of the existing offices on the second to eighth floor of New Malden House and its part four, part two story extension at roof level for 93 residential units with associated external alterations to the existing building
[/inline]
I hope I have explained what I want to achieve clearly. I don't have any idea how to fix this problem so any help would be appreciated.