Python Forum
From flat to nested structure
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
From flat to nested structure
#1
Most Venerable Coders,

I've got seemingly innocent problem I cannot crack properly (I'm not proficient in Python though). I've got a flat data structure in the form of (kind of) key-value pairing:

[inline]a : one
b : two
c : three
d : four
a : five
b : six
c : seven
d : eight
e : nine[/inline]

So, it's recurring group of the same keys associated with different values (number of key-values pairs can differ per one record though). I'm trying to build a corresponding dictionary out of it. I presume it has to be some kind of nested dictionary in the form:

[inline]{ 'OUTER KEY 1': {'a':'one', 'b':'two', 'c':'three', 'd':'four'},
'OUTER KEY 2': {'a':'five', 'b':'six', 'c':'seven', 'd':'eight', 'e':'nine'},
...
}[/inline]

Where outer keys can be just values of incremental counter. I imagine logic for looping the original structure would be something like:

[inline]If item == 'a':
Create outer key
Populate inner value with keys and values until next 'a'[/inline]

I was able to process the text file and extract the key-value pairs in the form of list or tuple but cannot properly build a dictionary out of it. For example, I've got list of tuples like that:

[('a', 'one'), ('b', 'two'), ('c', 'three'), ('d', 'four')]

Associating keys with values using zip won't help as keys repeat themselves. I've tried defaultdict and grouping around keys but that gives all values associated with one key - clever but not exactly what I need. Does anyone could point me to a right direction?
Reply
#2
What about a dict, with the value being a list of every occurrence?

>>> data = '''a : one
...  b : two
...  c : three
...  d : four
...  a : five
...  b : six
...  c : seven
...  d : eight
...  e : nine'''.split('\n')
>>> data
['a : one', ' b : two', ' c : three', ' d : four', ' a : five', ' b : six', ' c : seven', ' d : eight', ' e : nine']
>>> parsed = {}
>>> for item in data:
...   key, value = item.split(':')
...   key = key.strip()
...   value = value.strip()
...   if key not in parsed:
...     parsed[key] = []
...   parsed[key].append(value)
...
>>> parsed
{'c': ['three', 'seven'], 'd': ['four', 'eight'], 'b': ['two', 'six'], 'a': ['one', 'five'], 'e': ['nine']}
Reply
#3
Hey nilamo,

Thanks for that - I was able to achieve similar result by using defaultdict. After splitting key-values on colon I created and zipped two list with keys and corresponding values and then:

grouped = defaultdict(list)
for key, val in record:
     grouped[key].append(val)
But if I associate recurrent keys with all its values I cannot see the way of differentiating single records, say one from from 'a' to 'd' and second from 'a' to 'e'. How do you know if 'e' belongs to first or second record? All values will be clustered around single keys. My aim here is to construct JSON out of input data and I cannot see an easy way of producing that out of values grouped around single keys.
Reply
#4
The json would look exactly like your grouped dict, right? If it actually matters what order the elements are found in, then... mind sharing what your expected json output would be?
Reply
#5
Hey nilamo,

Yes, JSON would be exactly the same as dictionary. Once I've got properly structured dict I would just convert it into JSON using standard libraries and voilà. Obviously, any collection that does the same job would be nice as well. What matters is the endpoint - JSON string. I assume the simplest way of doing so is by using as intermediate structure dict as Python object but that may not be the case.

Expected final json would be pretty much the same as dict:

[inline]{
"OUTER KEY 1": {
"a": "one",
"b": "two",
"c": "three",
"d": "four"
},
"OUTER KEY 2": {
"a": "five",
"b": "six",
"c": "seven",
"d": "eight",
"e": "nine"
}
}[/inline]
Reply
#6
That looks super wacky, but sure whatever lol.

I guess a list of dicts, which represents the different sequences of characters might work. As you go through the data, you only ever modify the newest sequence (index -1 is the last index). Then, once a key is found that already exists in that dict, you start a new sequence.

Something like this?
>>> data = '''a : one
...  b : two
...  c : three
...  d : four
...  a : five
...  b : six
...  c : seven
...  d : eight
...  e : nine'''.split('\n')
>>> data
['a : one', ' b : two', ' c : three', ' d : four', ' a : five', ' b : six', ' c : seven', ' d : eight', ' e : nine']
>>> sequences = [{}]
>>> for item in data:
...   pair = item.split(':')
...   key = pair[0].strip()
...   value = pair[1].strip()
...
>>> for item in data:
...   pair = item.split(':')
...   key = pair[0].strip()
...   value = pair[1].strip()
...   if key in sequences[-1]:
...     # this key has already been seen, so we're starting a new sequence
...     sequences.append({})
...   sequences[-1][key] = value
...
>>> sequences
[{'d': 'four', 'b': 'two', 'a': 'one', 'c': 'three'}, {'d': 'eight', 'b': 'six', 'e': 'nine', 'a': 'five', 'c': 'seven'}]
>>> import pprint
>>> pprint.pprint(sequences)
[{'a': 'one', 'b': 'two', 'c': 'three', 'd': 'four'},
 {'a': 'five', 'b': 'six', 'c': 'seven', 'd': 'eight', 'e': 'nine'}]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  I’m Flat out struggling to understand list indexes gr3yali3n 7 2,852 Jul-20-2020, 07:18 PM
Last Post: princetonits
  Nested Data structure question arjunfen 7 4,201 Feb-22-2019, 02:18 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020