Python Forum
Remove an item from a list contained in another item in python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Remove an item from a list contained in another item in python
#1
I have this randomly generated list in python:

l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]

and i want to remove the items [2,3] and [2] because they are completely contained in the first item [2,3,4,5].

newl = [[2,3,4,5],[7,8],[4,5,6]]

How do I do that?
Reply
#2
(Nov-06-2021, 08:17 PM)CompleteNewb Wrote: I have this randomly generated list in python:

l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]

and i want to remove the items [2,3] and [2] because they are completely contained in the first item [2,3,4,5].

newl = [[2,3,4,5],[7,8],[4,5,6]]

How do I do that?

What about [4,3], [2,5], [6,4]. [4,2,5]?

It is looking like O(n^2); it may be worse. Knowing the actual rejection criteria is important, because that determines the complexity. It is not possible to infer the desired result from the examples given.
Gribouillis likes this post
Reply
#3
(Nov-06-2021, 09:15 PM)supuflounder Wrote:
(Nov-06-2021, 08:17 PM)CompleteNewb Wrote: I have this randomly generated list in python:

l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]

and i want to remove the items [2,3] and [2] because they are completely contained in the first item [2,3,4,5].

newl = [[2,3,4,5],[7,8],[4,5,6]]

How do I do that?

What about [4,3], [2,5], [6,4]. [4,2,5]?

It is looking like O(n^2); it may be worse. Knowing the actual rejection criteria is important, because that determines the complexity. It is not possible to infer the desired result from the examples given.

Well yes you can because i said i want to remove items that are completly contain in another item, which means [4,3] and [2,5] is contained in [2,3,4,5]. [6,4] is an item you invented which would be removed , because it is contained in [4,5,6] and [2,4,5] would be removed because it is contained in [2,3,4,5]. And in my example the item [4,5,6] is not completely contained in a previous item of "l" because the 6 is not an element of [2,3,4,5]. And I don't understand what you mean by O(n^2). You have to think about it like Venn diagrams. If an an ensemble is within a bigger ensemble, remove the ensemble within. Imagine another list c = ["Cat", "Catnip"] remove 'Cat' because it is contained in "Catnip" so the result would be c = ["Catnip"]
Reply
#4
(Nov-06-2021, 09:37 PM)CompleteNewb Wrote:
(Nov-06-2021, 09:15 PM)supuflounder Wrote: What about [4,3], [2,5], [6,4]. [4,2,5]?

It is looking like O(n^2); it may be worse. Knowing the actual rejection criteria is important, because that determines the complexity. It is not possible to infer the desired result from the examples given.

Well yes you can because i said i want to remove items that are completly contain in another item, which means [4,3] and [2,5] is contained in [2,3,4,5]. [6,4] is an item you invented which would be removed , because it is contained in [4,5,6] and [2,4,5] would be removed because it is contained in [2,3,4,5]. And in my example the item [4,5,6] is not completely contained in a previous item of "l" because the 6 is not an element of [2,3,4,5]. And I don't understand what you mean by O(n^2). You have to think about it like Venn diagrams. If an an ensemble is within a bigger ensemble, remove the ensemble within. Imagine another list c = ["Cat", "Catnip"] remove 'Cat' because it is contained in "Catnip" so the result would be c = ["Catnip"]

But removing "Cat" when "Catnip" is present is not the same as removing "tin", because "tin" is fully contained in "catnip".

The naïve algorithm works like this:

Take the first element of the list. This is the "current element"

Now, for each remaining element ("selected element") of the list, extract each value, and ask if the value is in the current element. If, for all values in the selected element, they are in the current element, eliminate the selected element.

This could be done by taking each value in the selected element, and if it appears in the current element, remove it from the (copy of) the selected element. If, at the end of the elements of the selected element, the (copy of) the selected element list is empty, then remove the selected element from the main list.

Now, take an element from the remaining list, call it the current element, and repeat for all other elements from the list. You have to deal with the fact that removing the selected element from the list changes your indexing. So, essentially, each time you remove an element from the list, you have to start at the beginning. There are some optimizations that are possible, but this begins to look like O(n^2) for the basic comparisons, and then O(n) for having to start over at the beginning each time, so I get O(n^3) where n is the total number of values contained in the list.

It ain't pretty, or fast, but it works. Depending on the sizes of your lists, this may run in acceptable time bounds.

Key here is to write out the algorithm in terms of little boxes with pieces of paper in them with numbers written on the pieces of paper. Or letters. It doesn't matter. Explain to a first-grader how to eliminate any box whose values are all contained in another box. Tell them that they cannot just lay out all the pieces of paper, but have to reach in, extract a piece of paper, then, one at a time, pull from the "current box" in front of them each value, in turn. If the values match, throw away the paper from the selected box; if they don't match, put the paper down in front of the selected box. If, at the end, the selected box has no pieces of paper left in front of it, throw the box away. Write this out so a reasonably literate first-grader can read it. Once you have this debugged (easier if you have access to a first-grader), just translate it into Python.

If you can write that down, you can program it in Python, because that is just syntax.
Reply
#5
Does order matter? Can there be duplicates?

If order doesn't matter and there are no duplicates, make each one a set and use issubset() on each one to see if it's contained.
Reply
#6
>>> l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]
>>> S = sorted((set(x) for x in l), key=len, reverse=True)
>>> S
[{2, 3, 4, 5}, {4, 5, 6}, {2, 3}, {8, 7}, {2}]
>>> roots = []
>>> for s in S:
...     if not any(s.issubset(r) for r in roots):
...         roots.append(s)
... 
>>> roots
[{2, 3, 4, 5}, {4, 5, 6}, {8, 7}]
As supuflounder said, this is a O(n^2) algorithm, which means here that if the initial list has length N, we way call issubset() about N*(N-1)/2 times.
Reply
#7
(Nov-06-2021, 11:06 PM)Gribouillis Wrote:
>>> l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]
>>> S = sorted((set(x) for x in l), key=len, reverse=True)
>>> S
[{2, 3, 4, 5}, {4, 5, 6}, {2, 3}, {8, 7}, {2}]
>>> roots = []
>>> for s in S:
...     if not any(s.issubset(r) for r in roots):
...         roots.append(s)
... 
>>> roots
[{2, 3, 4, 5}, {4, 5, 6}, {8, 7}]
As supuflounder said, this is a O(n^2) algorithm, which means here that if the initial list has length N, we way call issubset() about N*(N-1)/2 times.

I'm just a beginner in python. Is ther e any other way to write that : if not any(s.issubset® for r in roots)? I don't understand how that for loop works... Or even how you call that technique
Reply
#8
How about learning what sets are and how any works? Sounds like they might come in useful.
ndc85430 likes this post
Reply
#9
(Nov-09-2021, 06:12 AM)deanhystad Wrote: How about learning what sets are and how any works? Sounds like they might come in useful.


I know what sets are and I did my research on any()... I'm talking about the "for r in roots". Why is it on the same line and how can you have a loop in an empty lists?
Reply
#10
Ahh, I understand your confusion now. It is clever, and I guess potentially confusing.

You are correct that roots starts out as an empty list the first time through the "for s in S:" loop.

roots = [], s = {2, 3, 4, 5}

Since roots is empty, s is not going to be a subset of any of the sets in roots and is appended to roots.

roots = [{2, 3, 4, 5}], s = {4, 5, 6}

Second time through the outer loop roots contains {2, 3, 4, 5}. s is not a subset of {2, 3, 4, 5}, so it is appended to roots.

roots = [{2, 3, 4, 5}, {4, 5, 6}] s = {2, 3}

Third time through the outer loop {2, 3} is a subset of {2, 3, 4, 5}, so it is not appended to roots.

Do follow how this goes now? Each time through the outer loop we test if s is a subset of any of the sets in root. If s is a subset of one of the sets in root, it is not appended to root. If s is not a subset of any set in root, s is appended to root.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Remove empty keys in a python list python_student 7 260 Jan-12-2022, 10:23 PM
Last Post: python_student
  how to easily create a list of already existing item CompleteNewb 15 534 Jan-06-2022, 12:48 AM
Last Post: CompleteNewb
  TypeError: sequence item 0: expected str instance, float found Error Query eddywinch82 1 1,272 Sep-04-2021, 09:16 PM
Last Post: eddywinch82
  count item in list korenron 8 1,171 Aug-18-2021, 06:40 AM
Last Post: naughtyCat
  checking for last item in for loop Skaperen 11 1,387 Jul-09-2021, 11:40 PM
Last Post: Skaperen
  Time.sleep: stop appending item to the list if time is early quest 0 793 Apr-13-2021, 11:44 AM
Last Post: quest
  .remove() from a list - request for explanation InputOutput007 3 1,027 Jan-28-2021, 04:21 PM
Last Post: InputOutput007
  How to run a pytest test for each item in a list arielma 0 956 Jan-06-2021, 10:40 PM
Last Post: arielma
  How to modify item in dictionary? Winfried 7 1,265 Nov-21-2020, 07:12 PM
Last Post: bowlofred
  Remove double quotes from the list ? PythonDev 22 3,725 Nov-05-2020, 04:53 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020