Python Forum
Looking for data/info on a perticular data-proccesing problem.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Looking for data/info on a perticular data-proccesing problem.
#1
Doing some, to me tricky, data processing. And I'm wondering if it might be related to some general (math, logic, science, ...) problem.
And if anyone might know some helpfull search words, or even potential related forum topics on this.
Potential related external links are appreciated too.

The data to be processed are little chunks/records/lists like:
['aaa','bbb,'eee']
['bbb','ccc','eee']
['ccc','ddd,'eee']

Where the record elements are ordered based on there relative (source)positions.
The target is to try to find the final solution/(source).
Which in this case would be ['aaa','bbb,'ccc','ddd','eee']
Reply
#2
This is one way to get these results:

  1. convert the lists to sets,
  2. take the union of all three sets
  3. convert union to set
  4. then convert that set to a list
  5. sort the list

Example:
taba = ['aligators','bats','eagles']
tabb = ['bats','cats','eagles']
tabc = ['cats','dogs','eagles']

tlist = list(set(set(taba).union((set(tabb).union(set(tabc))))))
tlist.sort()
print(f"tlist: {tlist}")
Output:
tlist: ['aligators', 'bats', 'cats', 'dogs', 'eagles']
Reply
#3
But that's not what I'm trying to do.

In this case:
['bats','eagles','aligators'] (moved 'aligators' from begin to end)
['bats','cats','eagles']
['cats','dogs','eagles']

It should return:
['bats', 'cats', 'dogs', 'eagles', 'aligators']

The output result is/should-be based on the element positions relative to each other (inside the sub-sets).
Reply
#4
sure looks like what's stated in post #1?
No?

replace the beasts with letters, what do you see?
Reply
#5
Looks like you're looking for topological sorting. If you search for that term, you can find lots of info on it, including algorithms and modules.

import toposort
from collections import defaultdict

data = [
    ["bats", "eagles", "aligators"],
    ["bats", "cats", "eagles"],
    ["cats", "dogs", "eagles"],
]

ordering = defaultdict(set)
for chunk in data:
    for index in range(len(chunk) - 1):
        ordering[chunk[index + 1]].add(chunk[index])

print(toposort.toposort_flatten(ordering))
Output:
['bats', 'cats', 'dogs', 'eagles', 'aligators']
MvGulik likes this post
Reply
#6
(Apr-28-2021, 04:37 PM)bowlofred Wrote: Looks like you're looking for topological sorting.
As first glance that seems to be related yes.
Although I'm not well versed in that area, I'll give it my best shot.
Definitely going to take some serious reading time. :-)

Will play around with your included code later, ... after a good night sleep.

Thanks.
Reply
#7
After some reading, thinking and coding. I managed to reduce my source records by 90% with the help of 'toposort'. Smile

Thanks Again.
Reply
#8
(Apr-29-2021, 06:27 PM)MvGulik Wrote: After some reading, thinking and coding. I managed to reduce my source records by 90% with the help of 'toposort'. Smile

Thanks Again.

Neato. I remember being introduced to topological sorting as something useful to implement parallel "make" in CS. You can get all the independent items and give them a thread. Or if you only have one thread, you don't have to later process dependencies since they're already taken care of. A cool thing to have in the bag of tricks.
Reply
#9
Looks like you're looking for topological sorting.
Reply
#10
(Apr-29-2021, 06:50 PM)MvGulik Wrote: ... with the help of 'toposort'.
toposort => "Implements a topological sort algorithm"
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help with to check an Input list data with a data read from an external source sacharyya 3 318 Mar-09-2024, 12:33 PM
Last Post: Pedroski55
  Take data from web page problem codeweak 5 861 Nov-01-2023, 12:29 AM
Last Post: codeweak
  Input network device connection info from data file edroche3rd 6 913 Oct-12-2023, 02:18 AM
Last Post: edroche3rd
  What is all the info in the info window in Idle? Pedroski55 3 647 Jul-08-2023, 11:26 AM
Last Post: DeaD_EyE
  Matplot / numpy noisy data problem the57chambers 1 664 Feb-09-2023, 03:27 AM
Last Post: deanhystad
  Write sql data or CSV Data into parquet file mg24 2 2,356 Sep-26-2022, 08:21 AM
Last Post: ibreeden
  Load multiple Jason data in one Data Frame vijays3 6 1,500 Aug-12-2022, 05:17 PM
Last Post: vijays3
  Django: Adding Row Data To Existing Model Instance Question/Problem. Steven_Pinkerton 1 1,221 Aug-09-2022, 10:46 AM
Last Post: Addweb
  Issue in changing data format (2 bytes) into a 16 bit data. GiggsB 11 2,560 Jul-25-2022, 03:19 PM
Last Post: deanhystad
  Adding shifted data set to data set xquad 3 1,472 Dec-22-2021, 10:20 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020