Python Forum
Filter rows by multiple text conditions in another data frame i.e contains strings an
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Filter rows by multiple text conditions in another data frame i.e contains strings an
#1
I have a scheme of conditions in a data frame Condition which contains more than 100 rows and 100 columns, something like below.

Contain Exclude_1 Exclude_2
0 mansard roof Demolition mansard roof above rear projection
1 mansard style roof Demolition external alterations to existing
2 extension at roof level Demolition

I would like to filter the rows in a column 'Development Description' in another data frame LDD if the rows contain the strings and do not contain any strings in the exclude columns in the condition data frame. But both conditions should be met i.e. stated in the same row of the Condition

What I have done

1. Turned the conditions into two lists.
2. Filtered the LDD dataset with the keywords in the list contain_list
ldd_contain = ldd_2009[ldd_2009.apply(lambda r: any([kw in r['Development Description'] for kw in contain_list]), axis=1)]
3. Filtered the ldd_contain data frame with the list exclude_list
ldd_exclude = ldd_contain[ldd_contain.apply(lambda r: any([kw in r['Development Description'] for kw in exclude_list]), axis=1)]
Merged the above generated two data frames and dropped those rows only in the ldd_exclude data frame
df_all = ldd_contain.merge(ldd_exclude.drop_duplicates(), 
                           on=['Borough Reference','Development Description'], 
                           how='left', indicator=True)
df_all_adjusted = df_all[df_all['_merge'] == 'left_only']
df_all_adjusted_removed = df_all_adjusted.drop(['_merge'], axis=1)
Problems

But I realised my method would essentially exclude all the rows whenever it contains the strings in the exclude column without considering the strings in the contains column at the same time.

For instance, the below row is not selected even though it contains 'extension at roof level' (row 3 in the condition data frame) because it contains 'external alterations to the existing'(which is on the 'exclude' column but row 2).

But a row should only be excluded if it contains both 'mansard style roof' and 'external alterations to existing' or 'demolition' i.e the same row in the conditiondata frame.

[inline]Proposed residential development comprising the change of use of the existing offices on the second to eighth floor of New Malden House and its part four, part two story extension at roof level for 93 residential units with associated external alterations to the existing building
[/inline]
I hope I have explained what I want to achieve clearly. I don't have any idea how to fix this problem so any help would be appreciated.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 607 Jan-06-2024, 03:55 PM
Last Post: deanhystad
  Filtering Data Frame, with another value NewBiee 9 1,331 Aug-21-2023, 10:53 AM
Last Post: NewBiee
  Exporting data frame to excel dyerlee91 0 1,606 Oct-05-2021, 11:34 AM
Last Post: dyerlee91
  Pandas Data frame column condition check based on length of the value aditi06 1 2,655 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  Adding a new column to a Panda Data Frame rsherry8 2 2,083 Jun-06-2021, 06:49 PM
Last Post: jefsummers
  grouped data frame glitter 0 1,577 Feb-02-2021, 11:22 AM
Last Post: glitter
  Matching multiple conditions and mapping multiple results, between two dataframes daveyg33 2 1,761 Jan-02-2021, 10:32 AM
Last Post: daveyg33
  how to filter data frame dynamically with the columns psahay 0 2,379 Aug-24-2020, 01:10 PM
Last Post: psahay
  Dropping Rows From A Data Frame Based On A Variable JoeDainton123 1 2,188 Aug-03-2020, 02:05 AM
Last Post: scidam
  How to shift data frame rows of specified column Mekala 0 1,861 Jul-21-2020, 02:42 PM
Last Post: Mekala

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020