Posts: 50
Threads: 10
Joined: Dec 2017
Doing some, to me tricky, data processing. And I'm wondering if it might be related to some general (math, logic, science, ...) problem.
And if anyone might know some helpfull search words, or even potential related forum topics on this.
Potential related external links are appreciated too.
The data to be processed are little chunks/records/lists like:
['aaa','bbb,'eee']
['bbb','ccc','eee']
['ccc','ddd,'eee']
Where the record elements are ordered based on there relative ( source)positions.
The target is to try to find the final solution/( source).
Which in this case would be ['aaa','bbb,'ccc','ddd','eee']
Posts: 12,022
Threads: 484
Joined: Sep 2016
This is one way to get these results:
- convert the lists to sets,
- take the union of all three sets
- convert union to set
- then convert that set to a list
- sort the list
Example:
taba = ['aligators','bats','eagles']
tabb = ['bats','cats','eagles']
tabc = ['cats','dogs','eagles']
tlist = list(set(set(taba).union((set(tabb).union(set(tabc))))))
tlist.sort()
print(f"tlist: {tlist}") Output: tlist: ['aligators', 'bats', 'cats', 'dogs', 'eagles']
Posts: 50
Threads: 10
Joined: Dec 2017
Apr-28-2021, 03:39 PM
(This post was last modified: Apr-28-2021, 03:43 PM by MvGulik.)
But that's not what I'm trying to do.
In this case:
['bats','eagles','aligators'] ( moved 'aligators' from begin to end)
['bats','cats','eagles']
['cats','dogs','eagles']
It should return:
['bats', 'cats', 'dogs', 'eagles', 'aligators']
The output result is/should-be based on the element positions relative to each other ( inside the sub-sets).
Posts: 12,022
Threads: 484
Joined: Sep 2016
sure looks like what's stated in post #1?
No?
replace the beasts with letters, what do you see?
Posts: 1,583
Threads: 3
Joined: Mar 2020
Looks like you're looking for topological sorting. If you search for that term, you can find lots of info on it, including algorithms and modules.
import toposort
from collections import defaultdict
data = [
["bats", "eagles", "aligators"],
["bats", "cats", "eagles"],
["cats", "dogs", "eagles"],
]
ordering = defaultdict(set)
for chunk in data:
for index in range(len(chunk) - 1):
ordering[chunk[index + 1]].add(chunk[index])
print(toposort.toposort_flatten(ordering)) Output: ['bats', 'cats', 'dogs', 'eagles', 'aligators']
Posts: 50
Threads: 10
Joined: Dec 2017
Apr-28-2021, 05:52 PM
(This post was last modified: Apr-28-2021, 05:53 PM by MvGulik.)
(Apr-28-2021, 04:37 PM)bowlofred Wrote: Looks like you're looking for topological sorting. As first glance that seems to be related yes.
Although I'm not well versed in that area, I'll give it my best shot.
Definitely going to take some serious reading time. :-)
Will play around with your included code later, ... after a good night sleep.
Thanks.
Posts: 50
Threads: 10
Joined: Dec 2017
After some reading, thinking and coding. I managed to reduce my source records by 90% with the help of 'toposort'.
Thanks Again.
Posts: 1,583
Threads: 3
Joined: Mar 2020
(Apr-29-2021, 06:27 PM)MvGulik Wrote: After some reading, thinking and coding. I managed to reduce my source records by 90% with the help of 'toposort'.
Thanks Again.
Neato. I remember being introduced to topological sorting as something useful to implement parallel "make" in CS. You can get all the independent items and give them a thread. Or if you only have one thread, you don't have to later process dependencies since they're already taken care of. A cool thing to have in the bag of tricks.
Posts: 3
Threads: 0
Joined: Apr 2021
Looks like you're looking for topological sorting.
Posts: 50
Threads: 10
Joined: Dec 2017
(Apr-29-2021, 06:50 PM)MvGulik Wrote: ... with the help of 'toposort'. toposort => " Implements a topological sort algorithm"
|