Posts: 54
Threads: 14
Joined: Dec 2020
I have this randomly generated list in python:
l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]
and i want to remove the items [2,3] and [2] because they are completely contained in the first item [2,3,4,5].
newl = [[2,3,4,5],[7,8],[4,5,6]]
How do I do that?
Posts: 50
Threads: 2
Joined: Jan 2021
(Nov062021, 08:17 PM)CompleteNewb Wrote: I have this randomly generated list in python:
l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]
and i want to remove the items [2,3] and [2] because they are completely contained in the first item [2,3,4,5].
newl = [[2,3,4,5],[7,8],[4,5,6]]
How do I do that?
What about [4,3], [2,5], [6,4]. [4,2,5]?
It is looking like O(n^2); it may be worse. Knowing the actual rejection criteria is important, because that determines the complexity. It is not possible to infer the desired result from the examples given.
Gribouillis likes this post
Posts: 54
Threads: 14
Joined: Dec 2020
(Nov062021, 09:15 PM)supuflounder Wrote: (Nov062021, 08:17 PM)CompleteNewb Wrote: I have this randomly generated list in python:
l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]
and i want to remove the items [2,3] and [2] because they are completely contained in the first item [2,3,4,5].
newl = [[2,3,4,5],[7,8],[4,5,6]]
How do I do that?
What about [4,3], [2,5], [6,4]. [4,2,5]?
It is looking like O(n^2); it may be worse. Knowing the actual rejection criteria is important, because that determines the complexity. It is not possible to infer the desired result from the examples given.
Well yes you can because i said i want to remove items that are completly contain in another item, which means [4,3] and [2,5] is contained in [2,3,4,5]. [6,4] is an item you invented which would be removed , because it is contained in [4,5,6] and [2,4,5] would be removed because it is contained in [2,3,4,5]. And in my example the item [4,5,6] is not completely contained in a previous item of "l" because the 6 is not an element of [2,3,4,5]. And I don't understand what you mean by O(n^2). You have to think about it like Venn diagrams. If an an ensemble is within a bigger ensemble, remove the ensemble within. Imagine another list c = ["Cat", "Catnip"] remove 'Cat' because it is contained in "Catnip" so the result would be c = ["Catnip"]
Posts: 50
Threads: 2
Joined: Jan 2021
(Nov062021, 09:37 PM)CompleteNewb Wrote: (Nov062021, 09:15 PM)supuflounder Wrote: What about [4,3], [2,5], [6,4]. [4,2,5]?
It is looking like O(n^2); it may be worse. Knowing the actual rejection criteria is important, because that determines the complexity. It is not possible to infer the desired result from the examples given.
Well yes you can because i said i want to remove items that are completly contain in another item, which means [4,3] and [2,5] is contained in [2,3,4,5]. [6,4] is an item you invented which would be removed , because it is contained in [4,5,6] and [2,4,5] would be removed because it is contained in [2,3,4,5]. And in my example the item [4,5,6] is not completely contained in a previous item of "l" because the 6 is not an element of [2,3,4,5]. And I don't understand what you mean by O(n^2). You have to think about it like Venn diagrams. If an an ensemble is within a bigger ensemble, remove the ensemble within. Imagine another list c = ["Cat", "Catnip"] remove 'Cat' because it is contained in "Catnip" so the result would be c = ["Catnip"]
But removing "Cat" when "Catnip" is present is not the same as removing "tin", because "tin" is fully contained in "catnip".
The naïve algorithm works like this:
Take the first element of the list. This is the "current element"
Now, for each remaining element ("selected element") of the list, extract each value, and ask if the value is in the current element. If, for all values in the selected element, they are in the current element, eliminate the selected element.
This could be done by taking each value in the selected element, and if it appears in the current element, remove it from the (copy of) the selected element. If, at the end of the elements of the selected element, the (copy of) the selected element list is empty, then remove the selected element from the main list.
Now, take an element from the remaining list, call it the current element, and repeat for all other elements from the list. You have to deal with the fact that removing the selected element from the list changes your indexing. So, essentially, each time you remove an element from the list, you have to start at the beginning. There are some optimizations that are possible, but this begins to look like O(n^2) for the basic comparisons, and then O(n) for having to start over at the beginning each time, so I get O(n^3) where n is the total number of values contained in the list.
It ain't pretty, or fast, but it works. Depending on the sizes of your lists, this may run in acceptable time bounds.
Key here is to write out the algorithm in terms of little boxes with pieces of paper in them with numbers written on the pieces of paper. Or letters. It doesn't matter. Explain to a firstgrader how to eliminate any box whose values are all contained in another box. Tell them that they cannot just lay out all the pieces of paper, but have to reach in, extract a piece of paper, then, one at a time, pull from the "current box" in front of them each value, in turn. If the values match, throw away the paper from the selected box; if they don't match, put the paper down in front of the selected box. If, at the end, the selected box has no pieces of paper left in front of it, throw the box away. Write this out so a reasonably literate firstgrader can read it. Once you have this debugged (easier if you have access to a firstgrader), just translate it into Python.
If you can write that down, you can program it in Python, because that is just syntax.
Posts: 1,381
Threads: 3
Joined: Mar 2020
Does order matter? Can there be duplicates?
If order doesn't matter and there are no duplicates, make each one a set and use issubset() on each one to see if it's contained.
Posts: 3,291
Threads: 45
Joined: Jan 2018
>>> l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]
>>> S = sorted((set(x) for x in l), key=len, reverse=True)
>>> S
[{2, 3, 4, 5}, {4, 5, 6}, {2, 3}, {8, 7}, {2}]
>>> roots = []
>>> for s in S:
... if not any(s.issubset(r) for r in roots):
... roots.append(s)
...
>>> roots
[{2, 3, 4, 5}, {4, 5, 6}, {8, 7}] As supuflounder said, this is a O(n^2) algorithm, which means here that if the initial list has length N, we way call issubset() about N*(N1)/2 times.
Posts: 54
Threads: 14
Joined: Dec 2020
(Nov062021, 11:06 PM)Gribouillis Wrote: >>> l = [[2,3,4,5],[2,3],[7,8],[2],[4,5,6]]
>>> S = sorted((set(x) for x in l), key=len, reverse=True)
>>> S
[{2, 3, 4, 5}, {4, 5, 6}, {2, 3}, {8, 7}, {2}]
>>> roots = []
>>> for s in S:
... if not any(s.issubset(r) for r in roots):
... roots.append(s)
...
>>> roots
[{2, 3, 4, 5}, {4, 5, 6}, {8, 7}] As supuflounder said, this is a O(n^2) algorithm, which means here that if the initial list has length N, we way call issubset() about N*(N1)/2 times.
I'm just a beginner in python. Is ther e any other way to write that : if not any(s.issubset® for r in roots)? I don't understand how that for loop works... Or even how you call that technique
Posts: 2,690
Threads: 12
Joined: Feb 2020
How about learning what sets are and how any works? Sounds like they might come in useful.
Posts: 54
Threads: 14
Joined: Dec 2020
Nov092021, 06:47 PM
(This post was last modified: Nov092021, 06:47 PM by CompleteNewb.)
(Nov092021, 06:12 AM)deanhystad Wrote: How about learning what sets are and how any works? Sounds like they might come in useful.
I know what sets are and I did my research on any()... I'm talking about the "for r in roots". Why is it on the same line and how can you have a loop in an empty lists?
Posts: 2,690
Threads: 12
Joined: Feb 2020
Ahh, I understand your confusion now. It is clever, and I guess potentially confusing.
You are correct that roots starts out as an empty list the first time through the "for s in S:" loop.
roots = [], s = {2, 3, 4, 5}
Since roots is empty, s is not going to be a subset of any of the sets in roots and is appended to roots.
roots = [{2, 3, 4, 5}], s = {4, 5, 6}
Second time through the outer loop roots contains {2, 3, 4, 5}. s is not a subset of {2, 3, 4, 5}, so it is appended to roots.
roots = [{2, 3, 4, 5}, {4, 5, 6}] s = {2, 3}
Third time through the outer loop {2, 3} is a subset of {2, 3, 4, 5}, so it is not appended to roots.
Do follow how this goes now? Each time through the outer loop we test if s is a subset of any of the sets in root. If s is a subset of one of the sets in root, it is not appended to root. If s is not a subset of any set in root, s is appended to root.
