Mar-20-2020, 07:35 PM
Hello to all,
Maybe someone could help me with this:
I have this file, for which I want to tabulate its values. Keys from a to c begin a new sequence of values (a block) and these 3 keys are always present. After keys a, b and could come values d to g.
My goal is to tabulate it like image below using the list structure Pandas needs:
![[Image: table.jpg?raw=1]](https://www.dropbox.com/s/iyqxwx7wdejufmi/table.jpg?raw=1)
I'm currently able to store the file content in a list (lst) and then I try to group-by that list, getting this output(m2):
My issue is that the correct input(m2) to feed Pandas dataframe would be like this:
That needs a kind of fill down(only for keys a, b, c) and fill with blanks(for keys d to g) when needed.
Already asked on SO but no answers.
Maybe someone could help me with this:
I have this file, for which I want to tabulate its values. Keys from a to c begin a new sequence of values (a block) and these 3 keys are always present. After keys a, b and could come values d to g.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
SOME TEXT SOME TEXT SOME TEXT SOME TEXT SOME TEXT SOME TEXT SOME TEXT SOME TEXT a = 1 b = 5 c = 3 d = 0 e = 0 d = 4 e = 1 g = 1 blah blah blah blah / / / FINISH a = 3 b = 2 c = 8 d = 6 e = 9 f = 3 blah blah blah blah / / / FINISH a = 7 b = 2 c = 2 d = 9 e = 0 d = 1 e = 4 d = 7 e = 0 f = 1 d = 1 g = 8 blah blah blah blah / / / FINISH |
![[Image: table.jpg?raw=1]](https://www.dropbox.com/s/iyqxwx7wdejufmi/table.jpg?raw=1)
I'm currently able to store the file content in a list (lst) and then I try to group-by that list, getting this output(m2):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import re, pprint from collections import defaultdict file = 'file.txt' f = open ( file , "r" ).read().splitlines() lst = [] for line in f: if re.match(r '[ \t]' , line): lst.append(line.replace( ' ' , ' ').split(' = ')) print (lst) m2 = defaultdict( list ) for k, v in lst: m2[k].append(v) >>> pprint.pprint(m2) defaultdict(< class 'list' >, { 'a' : [ 1 , 3 , 7 ], 'b' : [ 5 , 2 , 2 ], 'c' : [ 3 , 8 , 2 ], 'd' : [ 0 , 4 , 6 , 9 , 1 , 7 , 1 ], 'e' : [ 0 , 1 , 9 , 0 , 4 , 0 ], 'f' : [ 3 , 1 ], 'g' : [ 1 , 8 ]}) |
1 2 3 4 5 6 7 8 9 |
m2 = { 'a' : [ 1 , 1 , 3 , 7 , 7 , 7 , 7 ], 'b' : [ 5 , 5 , 2 , 2 , 2 , 2 , 2 ], 'c' : [ 3 , 3 , 8 , 2 , 2 , 2 , 2 ], 'd' : [ 0 , 4 , 6 , 9 , 1 , 7 , 1 ], 'e' : [ 0 , 1 , 9 , 0 , 4 , 0 ,''], 'f' : [' ',' ',3,' ',' ',1,' '], 'g' : [ 1 ,' ',' ',' ',' ',' ', 8 ], } |
Already asked on SO but no answers.