Python Forum
Filter rows by multiple text conditions in another data frame i.e contains strings an
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Filter rows by multiple text conditions in another data frame i.e contains strings an
I have a scheme of conditions in a data frame Condition which contains more than 100 rows and 100 columns, something like below.

Contain Exclude_1 Exclude_2
0 mansard roof Demolition mansard roof above rear projection
1 mansard style roof Demolition external alterations to existing
2 extension at roof level Demolition

I would like to filter the rows in a column 'Development Description' in another data frame LDD if the rows contain the strings and do not contain any strings in the exclude columns in the condition data frame. But both conditions should be met i.e. stated in the same row of the Condition

What I have done

1. Turned the conditions into two lists.
2. Filtered the LDD dataset with the keywords in the list contain_list
ldd_contain = ldd_2009[ldd_2009.apply(lambda r: any([kw in r['Development Description'] for kw in contain_list]), axis=1)]
3. Filtered the ldd_contain data frame with the list exclude_list
ldd_exclude = ldd_contain[ldd_contain.apply(lambda r: any([kw in r['Development Description'] for kw in exclude_list]), axis=1)]
Merged the above generated two data frames and dropped those rows only in the ldd_exclude data frame
df_all = ldd_contain.merge(ldd_exclude.drop_duplicates(), 
                           on=['Borough Reference','Development Description'], 
                           how='left', indicator=True)
df_all_adjusted = df_all[df_all['_merge'] == 'left_only']
df_all_adjusted_removed = df_all_adjusted.drop(['_merge'], axis=1)

But I realised my method would essentially exclude all the rows whenever it contains the strings in the exclude column without considering the strings in the contains column at the same time.

For instance, the below row is not selected even though it contains 'extension at roof level' (row 3 in the condition data frame) because it contains 'external alterations to the existing'(which is on the 'exclude' column but row 2).

But a row should only be excluded if it contains both 'mansard style roof' and 'external alterations to existing' or 'demolition' i.e the same row in the conditiondata frame.

[inline]Proposed residential development comprising the change of use of the existing offices on the second to eighth floor of New Malden House and its part four, part two story extension at roof level for 93 residential units with associated external alterations to the existing building
I hope I have explained what I want to achieve clearly. I don't have any idea how to fix this problem so any help would be appreciated.

Possibly Related Threads…
Thread Author Replies Views Last Post
  how to filter data frame dynamically with the columns psahay 0 168 Aug-24-2020, 01:10 PM
Last Post: psahay
  Dropping Rows From A Data Frame Based On A Variable JoeDainton123 1 269 Aug-03-2020, 02:05 AM
Last Post: scidam
  How to shift data frame rows of specified column Mekala 0 434 Jul-21-2020, 02:42 PM
Last Post: Mekala
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 291 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  HELP- DATA FRAME INTO TIME SERIES- BASIC bntayfur 0 235 Jul-11-2020, 09:04 PM
Last Post: bntayfur
  Pandas data frame creation from Kafka Topic vboppa 0 208 Jul-01-2020, 04:23 PM
Last Post: vboppa
  Least-squares fit multiple data sets multiverse22 1 447 Jun-06-2020, 01:38 AM
Last Post: Larz60+
  How can I convert time-series data in rows into column srvmig 0 335 Apr-11-2020, 05:40 AM
Last Post: srvmig
  Displaying Result from Data Frame from Function eagle 1 440 Apr-08-2020, 11:58 PM
Last Post: eagle
  add formatted column to pandas data frame alkaline3 0 344 Mar-22-2020, 06:44 PM
Last Post: alkaline3

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020