Python Forum
Looking for data/info on a perticular data-proccesing problem. - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Looking for data/info on a perticular data-proccesing problem. (/thread-33476.html)



Looking for data/info on a perticular data-proccesing problem. - MvGulik - Apr-28-2021

Doing some, to me tricky, data processing. And I'm wondering if it might be related to some general (math, logic, science, ...) problem.
And if anyone might know some helpfull search words, or even potential related forum topics on this.
Potential related external links are appreciated too.

The data to be processed are little chunks/records/lists like:
['aaa','bbb,'eee']
['bbb','ccc','eee']
['ccc','ddd,'eee']

Where the record elements are ordered based on there relative (source)positions.
The target is to try to find the final solution/(source).
Which in this case would be ['aaa','bbb,'ccc','ddd','eee']


RE: Looking for data/info on a perticular data-proccesing problem. - Larz60+ - Apr-28-2021

This is one way to get these results:

  1. convert the lists to sets,
  2. take the union of all three sets
  3. convert union to set
  4. then convert that set to a list
  5. sort the list

Example:
taba = ['aligators','bats','eagles']
tabb = ['bats','cats','eagles']
tabc = ['cats','dogs','eagles']

tlist = list(set(set(taba).union((set(tabb).union(set(tabc))))))
tlist.sort()
print(f"tlist: {tlist}")
Output:
tlist: ['aligators', 'bats', 'cats', 'dogs', 'eagles']



RE: Looking for data/info on a perticular data-proccesing problem. - MvGulik - Apr-28-2021

But that's not what I'm trying to do.

In this case:
['bats','eagles','aligators'] (moved 'aligators' from begin to end)
['bats','cats','eagles']
['cats','dogs','eagles']

It should return:
['bats', 'cats', 'dogs', 'eagles', 'aligators']

The output result is/should-be based on the element positions relative to each other (inside the sub-sets).


RE: Looking for data/info on a perticular data-proccesing problem. - Larz60+ - Apr-28-2021

sure looks like what's stated in post #1?
No?

replace the beasts with letters, what do you see?


RE: Looking for data/info on a perticular data-proccesing problem. - bowlofred - Apr-28-2021

Looks like you're looking for topological sorting. If you search for that term, you can find lots of info on it, including algorithms and modules.

import toposort
from collections import defaultdict

data = [
    ["bats", "eagles", "aligators"],
    ["bats", "cats", "eagles"],
    ["cats", "dogs", "eagles"],
]

ordering = defaultdict(set)
for chunk in data:
    for index in range(len(chunk) - 1):
        ordering[chunk[index + 1]].add(chunk[index])

print(toposort.toposort_flatten(ordering))
Output:
['bats', 'cats', 'dogs', 'eagles', 'aligators']



RE: Looking for data/info on a perticular data-proccesing problem. - MvGulik - Apr-28-2021

(Apr-28-2021, 04:37 PM)bowlofred Wrote: Looks like you're looking for topological sorting.
As first glance that seems to be related yes.
Although I'm not well versed in that area, I'll give it my best shot.
Definitely going to take some serious reading time. :-)

Will play around with your included code later, ... after a good night sleep.

Thanks.


RE: Looking for data/info on a perticular data-proccesing problem. - MvGulik - Apr-29-2021

After some reading, thinking and coding. I managed to reduce my source records by 90% with the help of 'toposort'. Smile

Thanks Again.


RE: Looking for data/info on a perticular data-proccesing problem. - bowlofred - Apr-29-2021

(Apr-29-2021, 06:27 PM)MvGulik Wrote: After some reading, thinking and coding. I managed to reduce my source records by 90% with the help of 'toposort'. Smile

Thanks Again.

Neato. I remember being introduced to topological sorting as something useful to implement parallel "make" in CS. You can get all the independent items and give them a thread. Or if you only have one thread, you don't have to later process dependencies since they're already taken care of. A cool thing to have in the bag of tricks.


RE: Looking for data/info on a perticular data-proccesing problem. - bernardand - Apr-30-2021

Looks like you're looking for topological sorting.


RE: Looking for data/info on a perticular data-proccesing problem. - MvGulik - May-01-2021

(Apr-29-2021, 06:50 PM)MvGulik Wrote: ... with the help of 'toposort'.
toposort => "Implements a topological sort algorithm"