Remove an item from a list contained in another item in python

CompleteNewb · Nov-06-2021, 08:17 PM

I have this randomly generated list in python:

l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]

and i want to remove the items [2,3] and [2] because they are completely contained in the first item [2,3,4,5].

newl = [[2,3,4,5],[7,8],[4,5,6]]

How do I do that?

supuflounder · Nov-06-2021, 09:15 PM

(Nov-06-2021, 08:17 PM)CompleteNewb Wrote: I have this randomly generated list in python:

l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]

and i want to remove the items [2,3] and [2] because they are completely contained in the first item [2,3,4,5].

newl = [[2,3,4,5],[7,8],[4,5,6]]

How do I do that?

What about [4,3], [2,5], [6,4]. [4,2,5]?

It is looking like O(n^2); it may be worse. Knowing the actual rejection criteria is important, because that determines the complexity. It is not possible to infer the desired result from the examples given.

CompleteNewb · Nov-06-2021, 09:37 PM

(Nov-06-2021, 09:15 PM)supuflounder Wrote:
(Nov-06-2021, 08:17 PM)CompleteNewb Wrote: I have this randomly generated list in python:

l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]

and i want to remove the items [2,3] and [2] because they are completely contained in the first item [2,3,4,5].

newl = [[2,3,4,5],[7,8],[4,5,6]]

How do I do that?

What about [4,3], [2,5], [6,4]. [4,2,5]?

It is looking like O(n^2); it may be worse. Knowing the actual rejection criteria is important, because that determines the complexity. It is not possible to infer the desired result from the examples given.

Well yes you can because i said i want to remove items that are completly contain in another item, which means [4,3] and [2,5] is contained in [2,3,4,5]. [6,4] is an item you invented which would be removed , because it is contained in [4,5,6] and [2,4,5] would be removed because it is contained in [2,3,4,5]. And in my example the item [4,5,6] is not completely contained in a previous item of "l" because the 6 is not an element of [2,3,4,5]. And I don't understand what you mean by O(n^2). You have to think about it like Venn diagrams. If an an ensemble is within a bigger ensemble, remove the ensemble within. Imagine another list c = ["Cat", "Catnip"] remove 'Cat' because it is contained in "Catnip" so the result would be c = ["Catnip"]

supuflounder · Nov-06-2021, 09:58 PM

(Nov-06-2021, 09:37 PM)CompleteNewb Wrote:
(Nov-06-2021, 09:15 PM)supuflounder Wrote: What about [4,3], [2,5], [6,4]. [4,2,5]?

It is looking like O(n^2); it may be worse. Knowing the actual rejection criteria is important, because that determines the complexity. It is not possible to infer the desired result from the examples given.

Well yes you can because i said i want to remove items that are completly contain in another item, which means [4,3] and [2,5] is contained in [2,3,4,5]. [6,4] is an item you invented which would be removed , because it is contained in [4,5,6] and [2,4,5] would be removed because it is contained in [2,3,4,5]. And in my example the item [4,5,6] is not completely contained in a previous item of "l" because the 6 is not an element of [2,3,4,5]. And I don't understand what you mean by O(n^2). You have to think about it like Venn diagrams. If an an ensemble is within a bigger ensemble, remove the ensemble within. Imagine another list c = ["Cat", "Catnip"] remove 'Cat' because it is contained in "Catnip" so the result would be c = ["Catnip"]

But removing "Cat" when "Catnip" is present is not the same as removing "tin", because "tin" is fully contained in "catnip".

The naïve algorithm works like this:

Take the first element of the list. This is the "current element"

Now, for each remaining element ("selected element") of the list, extract each value, and ask if the value is in the current element. If, for all values in the selected element, they are in the current element, eliminate the selected element.

This could be done by taking each value in the selected element, and if it appears in the current element, remove it from the (copy of) the selected element. If, at the end of the elements of the selected element, the (copy of) the selected element list is empty, then remove the selected element from the main list.

Now, take an element from the remaining list, call it the current element, and repeat for all other elements from the list. You have to deal with the fact that removing the selected element from the list changes your indexing. So, essentially, each time you remove an element from the list, you have to start at the beginning. There are some optimizations that are possible, but this begins to look like O(n^2) for the basic comparisons, and then O(n) for having to start over at the beginning each time, so I get O(n^3) where n is the total number of values contained in the list.

It ain't pretty, or fast, but it works. Depending on the sizes of your lists, this may run in acceptable time bounds.

Key here is to write out the algorithm in terms of little boxes with pieces of paper in them with numbers written on the pieces of paper. Or letters. It doesn't matter. Explain to a first-grader how to eliminate any box whose values are all contained in another box. Tell them that they cannot just lay out all the pieces of paper, but have to reach in, extract a piece of paper, then, one at a time, pull from the "current box" in front of them each value, in turn. If the values match, throw away the paper from the selected box; if they don't match, put the paper down in front of the selected box. If, at the end, the selected box has no pieces of paper left in front of it, throw the box away. Write this out so a reasonably literate first-grader can read it. Once you have this debugged (easier if you have access to a first-grader), just translate it into Python.

If you can write that down, you can program it in Python, because that is just syntax.

bowlofred · Nov-06-2021, 10:31 PM

Does order matter? Can there be duplicates?

If order doesn't matter and there are no duplicates, make each one a set and use issubset() on each one to see if it's contained.

**Gribouillis** · Nov-06-2021, 11:06 PM

>>> l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]
>>> S = sorted((set(x) for x in l), key=len, reverse=True)
>>> S
[{2, 3, 4, 5}, {4, 5, 6}, {2, 3}, {8, 7}, {2}]
>>> roots = []
>>> for s in S:
...     if not any(s.issubset(r) for r in roots):
...         roots.append(s)
... 
>>> roots
[{2, 3, 4, 5}, {4, 5, 6}, {8, 7}]

As supuflounder said, this is a O(n^2) algorithm, which means here that if the initial list has length N, we way call issubset() about N*(N-1)/2 times.

CompleteNewb · Nov-09-2021, 12:43 AM

(Nov-06-2021, 11:06 PM)Gribouillis Wrote:
>>> l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]
>>> S = sorted((set(x) for x in l), key=len, reverse=True)
>>> S
[{2, 3, 4, 5}, {4, 5, 6}, {2, 3}, {8, 7}, {2}]
>>> roots = []
>>> for s in S:
...     if not any(s.issubset(r) for r in roots):
...         roots.append(s)
... 
>>> roots
[{2, 3, 4, 5}, {4, 5, 6}, {8, 7}]
As supuflounder said, this is a O(n^2) algorithm, which means here that if the initial list has length N, we way call issubset() about N*(N-1)/2 times.

I'm just a beginner in python. Is ther e any other way to write that : if not any(s.issubset® for r in roots)? I don't understand how that for loop works... Or even how you call that technique

**deanhystad** · Nov-09-2021, 06:12 AM

How about learning what sets are and how any works? Sounds like they might come in useful.

CompleteNewb · (This post was last modified: Nov-09-2021, 06:47 PM by CompleteNewb.)

(Nov-09-2021, 06:12 AM)deanhystad Wrote: How about learning what sets are and how any works? Sounds like they might come in useful.

I know what sets are and I did my research on any()... I'm talking about the "for r in roots". Why is it on the same line and how can you have a loop in an empty lists?

**deanhystad** · Nov-09-2021, 07:17 PM

Ahh, I understand your confusion now. It is clever, and I guess potentially confusing.

You are correct that roots starts out as an empty list the first time through the "for s in S:" loop.

roots = [], s = {2, 3, 4, 5}

Since roots is empty, s is not going to be a subset of any of the sets in roots and is appended to roots.

roots = [{2, 3, 4, 5}], s = {4, 5, 6}

Second time through the outer loop roots contains {2, 3, 4, 5}. s is not a subset of {2, 3, 4, 5}, so it is appended to roots.

roots = [{2, 3, 4, 5}, {4, 5, 6}] s = {2, 3}

Third time through the outer loop {2, 3} is a subset of {2, 3, 4, 5}, so it is not appended to roots.

Do follow how this goes now? Each time through the outer loop we test if s is a subset of any of the sets in root. If s is a subset of one of the sets in root, it is not appended to root. If s is not a subset of any set in root, s is appended to root.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Accessing item in the system tray on Windows 11	mariox	0	603	Aug-22-2024, 05:11 PM Last Post: mariox
	remove duplicates from dicts with list values	wardancer84	27	5,347	May-27-2024, 04:54 PM Last Post: wardancer84
	unable to remove all elements from list based on a condition	sg_python	3	1,623	Jan-27-2024, 04:03 PM Last Post: deanhystad
	return next item each time a function is executed	User3000	19	5,384	Aug-06-2023, 02:29 PM Last Post: deanhystad
	Copy item from one dict to another	Pavel_47	3	1,807	Dec-23-2022, 11:19 AM Last Post: Pavel_47
	Remove numbers from a list	menator01	4	3,536	Nov-13-2022, 01:27 AM Last Post: menator01
	code to send attachments contained on the drive.	stefanoste78	1	1,528	Oct-12-2022, 02:16 AM Last Post: Larz60+
	How to sort .csv file test log which item first fail and paint color	SamLiu	24	8,740	Sep-03-2022, 07:32 AM Last Post: Pedroski55
	Finding string in list item	jesse68	8	3,031	Jun-30-2022, 08:27 AM Last Post: Gribouillis
	how to mouse click a specific item in pygame?	Frankduc	5	2,775	May-03-2022, 06:22 PM Last Post: Frankduc

Remove an item from a list contained in another item in python

User Panel Messages

Announcements