Python Forum

Full Version: Looking for data/info on a perticular data-proccesing problem.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Doing some, to me tricky, data processing. And I'm wondering if it might be related to some general (math, logic, science, ...) problem.
And if anyone might know some helpfull search words, or even potential related forum topics on this.
Potential related external links are appreciated too.

The data to be processed are little chunks/records/lists like:
['aaa','bbb,'eee']
['bbb','ccc','eee']
['ccc','ddd,'eee']

Where the record elements are ordered based on there relative (source)positions.
The target is to try to find the final solution/(source).
Which in this case would be ['aaa','bbb,'ccc','ddd','eee']
This is one way to get these results:

  1. convert the lists to sets,
  2. take the union of all three sets
  3. convert union to set
  4. then convert that set to a list
  5. sort the list

Example:
taba = ['aligators','bats','eagles']
tabb = ['bats','cats','eagles']
tabc = ['cats','dogs','eagles']

tlist = list(set(set(taba).union((set(tabb).union(set(tabc))))))
tlist.sort()
print(f"tlist: {tlist}")
Output:
tlist: ['aligators', 'bats', 'cats', 'dogs', 'eagles']
But that's not what I'm trying to do.

In this case:
['bats','eagles','aligators'] (moved 'aligators' from begin to end)
['bats','cats','eagles']
['cats','dogs','eagles']

It should return:
['bats', 'cats', 'dogs', 'eagles', 'aligators']

The output result is/should-be based on the element positions relative to each other (inside the sub-sets).
sure looks like what's stated in post #1?
No?

replace the beasts with letters, what do you see?
Looks like you're looking for topological sorting. If you search for that term, you can find lots of info on it, including algorithms and modules.

import toposort
from collections import defaultdict

data = [
    ["bats", "eagles", "aligators"],
    ["bats", "cats", "eagles"],
    ["cats", "dogs", "eagles"],
]

ordering = defaultdict(set)
for chunk in data:
    for index in range(len(chunk) - 1):
        ordering[chunk[index + 1]].add(chunk[index])

print(toposort.toposort_flatten(ordering))
Output:
['bats', 'cats', 'dogs', 'eagles', 'aligators']
(Apr-28-2021, 04:37 PM)bowlofred Wrote: [ -> ]Looks like you're looking for topological sorting.
As first glance that seems to be related yes.
Although I'm not well versed in that area, I'll give it my best shot.
Definitely going to take some serious reading time. :-)

Will play around with your included code later, ... after a good night sleep.

Thanks.
After some reading, thinking and coding. I managed to reduce my source records by 90% with the help of 'toposort'. Smile

Thanks Again.
(Apr-29-2021, 06:27 PM)MvGulik Wrote: [ -> ]After some reading, thinking and coding. I managed to reduce my source records by 90% with the help of 'toposort'. Smile

Thanks Again.

Neato. I remember being introduced to topological sorting as something useful to implement parallel "make" in CS. You can get all the independent items and give them a thread. Or if you only have one thread, you don't have to later process dependencies since they're already taken care of. A cool thing to have in the bag of tricks.
Looks like you're looking for topological sorting.
(Apr-29-2021, 06:50 PM)MvGulik Wrote: [ -> ]... with the help of 'toposort'.
toposort => "Implements a topological sort algorithm"