Python Forum

Full Version: Duplicate information
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi everyone, I am new to this and maybe someone help me with this super simple question.

I have a table with this information

_c0 _c4
0 b
0 g
0 f
1 a
1 f
1 c
2 f
2 e
2 a
2 c

and I need to move this way. I mean, How can group this duplicate information in a list? :

_c0 lista
0 b,f,g
1 a,c,f
2 a,c,e,f

Thanks for support me!!!

Jalena.
Please show how the data is arranged in python.
for example: co = ['0 b', '0 g', '0 f', '1 a', '1 f', '1 c', '2 f', '2 e', '2 a', '2 c']
or some other format.
Hi,

It is dataframe and I am importing with pandas. Look

Output:
%%writefile tbl1.tsv _c0 _c4 0 b 0 g 0 f 1 a 1 f 1 c 2 f 2 e 2 a 2 c
import pandas as pd
tbl1 = pd.read_csv('tbl1.tsv', sep='\t')
tbl1.dtypes
tbl1
Thanks!
I don't have a copy of tabl1.tsv
Help me to give help please.
I attach it the table

thanks!
Something like this.
import pandas as pd
from io import StringIO

data = StringIO("""\
_c0,_c4
0,b
0,g
0,f
1,a
1,f
1,c
2,f
2,e
2,a
2,c
""")

df = pd.read_csv(data, sep=",")
>>> df = df.groupby('_c0')
>>> grouped_list = df["_c4"].apply(list)
>>> grouped_list
_c0
0       [b, g, f]
1       [a, f, c]
2    [f, e, a, c]
Name: _c4, dtype: object
>>> 
>>> df = grouped_list.reset_index()
>>> df
   _c0           _c4
0    0     [b, g, f]
1    1     [a, f, c]
2    2  [f, e, a, c]
Here's one way to go about it:
array = ['_c0 lista']

with open ('tbl1.csv', 'r') as in_file :
	in_file.readline ()
	for line in in_file :
		for index  in range (len (array)) :
			if line [0] == array [index][0] :
				array [index] = array [index] + f',{line [2]}'
				break
		else :
			array.append (line.strip ())

with open ('tbl2.csv', 'w') as out_file :
	for line in array :
		out_file.write (f'{line}\n')
Output:
@LapTop:~$ cat tbl2.csv _c0 lista 0,b,g,f 1,a,f,c 2,f,e,a,c
Thanks snippsat!! for helping me Smile