Python Forum
Two text files, want to add a column value - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Two text files, want to add a column value (/thread-37496.html)



Two text files, want to add a column value - zxcv101 - Jun-19-2022

Hi guys, a basic question. I have two text files which contain data like the following. There will always be only two columns, the first being date/time and the second being a number. Both files will have the same number of rows and the first column (date/time) will always be the same for both. The only difference is the value in the second column. I want to produce a new file which contains the date/time and a sum of the second column.
The screenshot below makes it clear.

Noobie question - what is the best way to do this? All advice greatly appreciated.
Many thanks in advance.


RE: Two text files, want to add a column value - XavierPlatinum - Jun-19-2022

I'm rusty at this, but it looks like string.split() function is the what you are looking for.

text = "bunch of stuff, other stuff"

x = text.split(", ")

print(x) 
output:
['bunch of stuff', 'other stuff']
You get a list. You can refer to each part by index text[0], text[1].

Is that helpful?


RE: Two text files, want to add a column value - zxcv101 - Jun-19-2022

Thanks for the reply, but this won't do it. I have two files and not a string made up of two different parts.
Your solution would be great if I had a string of text and wanted to split this up.


RE: Two text files, want to add a column value - deanhystad - Jun-19-2022

If it is guaranteed that the two files will contain the same timestamps you can do this:
import pandas as pd
import io

f1 = io.StringIO("""
20220617 14:30:00, 1.1
20220617 14:40:00, 1.3
20220617 14:50:00, 1.7
20220617 15:00:00, 1.4
""")

f2 = io.StringIO("""
20220617 14:30:00, 2.1
20220617 14:40:00, 2.3
20220617 14:50:00, 2.7
20220617 15:00:00, 2.4
""")

df1 = pd.read_csv(f1, names=("Date", "Value"))
df2 = pd.read_csv(f2, names=("Date", "Value"))

print(df1)
print(df2)
print(df1 + df2)
Output:
Date Value 0 20220617 14:30:00 1.1 1 20220617 14:40:00 1.3 2 20220617 14:50:00 1.7 3 20220617 15:00:00 1.4 Date Value 0 20220617 14:30:00 2.1 1 20220617 14:40:00 2.3 2 20220617 14:50:00 2.7 3 20220617 15:00:00 2.4 Date Value 0 20220617 14:30:0020220617 14:30:00 3.2 1 20220617 14:40:0020220617 14:40:00 3.6 2 20220617 14:50:0020220617 14:50:00 4.4 3 20220617 15:00:0020220617 15:00:00 3.8
Instead of printing you would (df1 + df2).write_csv(file_path)


RE: Two text files, want to add a column value - zxcv101 - Jun-19-2022

Thanks, this looks more like what I need.

Question - how do I remove the hardcoded values in this bit :

f1 = io.StringIO("""
20220617 14:30:00, 1.1
20220617 14:40:00, 1.3
20220617 14:50:00, 1.7
20220617 15:00:00, 1.4
""")

...and instead, specify a filename on my directory. For example, I'd love to say
f1 = io.StringIO(filename)

but of course this does not work. I've read up on the online documentation for StringIO and couldn't find an example
of reading an actual file.


RE: Two text files, want to add a column value - snippsat - Jun-19-2022

(Jun-19-2022, 09:05 PM)zxcv101 Wrote: but of course this does not work. I've read up on the online documentation for StringIO and couldn't find an example
of reading an actual file.
You shall not use StringIO when read a files,StringIO is an in-memory file-like object used to show visible(an working) example how the files could look.
It like this with files.
import pandas as pd

df1 = pd.read_csv('one.txt', names=("Date", "Value"))
df2 = pd.read_csv('two.txt', names=("Date", "Value"))

print(df1)
print(df2)
print(df1 + df2)
Output:
Date Value 0 date 1 1 date 2 2 date 3 Date Value 0 date 3 1 date 4 2 date 5 Date Value 0 datedate 4 1 datedate 6 2 datedate 8



RE: Two text files, want to add a column value - deanhystad - Jun-20-2022

You read the pandas documentation that describes how to use the read_csv and write_csv functions. When I read them again I realized the command to write a dataframe to a csv file is df.to_csv(), not df.write_csv().

When correcting file writing I noticed that the datetime column is messed up when adding the two dataframes. And I don't like that there is a requirement that the files have identical datetimes. The code below merges the two dataframes based on their date so you end up with a new dataframe that has three columns. Then I add the two "Value" columns to get a third. First I merge the dataframes using the "Date" column to decide how to merge the frames. If a data is only in one frame I want it included, so I set "how" to "outer".
import pandas as pd
import io

f1 = io.StringIO("""
20220617 14:30:00, 1.1
20220617 14:40:00, 1.3
20220617 14:50:00, 1.7
20220617 15:05:00, 1.4
""")

f2 = io.StringIO("""
20220617 14:30:00, 2.1
20220617 14:40:00, 2.3
20220617 14:50:00, 2.7
20220617 15:00:00, 2.4
""")

df1 = pd.read_csv(f1, names=("Date", "Value"))  # can replace StringIO with a file path
df2 = pd.read_csv(f2, names=("Date", "Value"))
df3 = df1.merge(df2, on="Date", how="outer")
print(df3)
Output:
0 20220617 14:30:00 1.1 2.1 1 20220617 14:40:00 1.3 2.3 2 20220617 14:50:00 1.7 2.7 3 20220617 15:05:00 1.4 NaN 4 20220617 15:00:00 NaN 2.4
Notice that Python fills empty cells with Nan (Not a Number). I cannot add numbers and NaN's, so I convert the NaN's to 0.0 before I sum the value columns to make a new column named "Total".
df3 = df3.fillna(0.0)
df3["Total"] = df3["Value_x"] + df3["Value_y"]
print(df3)
Output:
Date Value_x Value_y Total 0 20220617 14:30:00 1.1 2.1 3.2 1 20220617 14:40:00 1.3 2.3 3.6 2 20220617 14:50:00 1.7 2.7 4.4 3 20220617 15:05:00 1.4 0.0 1.4 4 20220617 15:00:00 0.0 2.4 2.4
Now it is time to write the combined frames to a csv file. You only want to write the "Date" and "Total" columns, and you want to remove the header and index column. Luckily you can specify all this in the to_csv() call.
df3.to_csv("data.csv", columns=("Date", "Total"), header=False, index=False)
20220617 14:30:00,3.2
20220617 14:40:00,3.5999999999999996
20220617 14:50:00,4.4
20220617 15:05:00,1.4
20220617 15:00:00,2.4
Reading the documentation not only fixed the write error, but resulted in a better solution. Now you can sum files with arbitrary timestamps and files of differing lengths.


RE: Two text files, want to add a column value - zxcv101 - Jun-20-2022

@deanhystad, thank you so much for the replies and helping me to do this. Your solution worked. It is as easy as doing this :

import pandas as pd

df1 = pd.read_csv('file1.txt', names=("Date", "Value"))
df2 = pd.read_csv('file2.txt', names=("Date", "Value"))

print(df1)
print(df2)
print(df1 + df2)

df3 = df1.merge(df2, on="Date", how="outer")
df3['Total'] = df3['Value_x']+df3['Value_y']

print(df3)

and this produces the following table which is exactly what I wanted :

Date Value_x Value_y Total
0 20220617 17:00:00 1.1 2.1 3.2
1 20220617 17:10:00 1.4 2.4 3.8
2 20220617 17:20:00 1.7 2.7 4.4
3 20220617 17:30:00 1.3 2.3 3.6


RE: Two text files, want to add a column value - deanhystad - Jun-20-2022

Please wrap code in Python tags and output in Output tags. There are buttons for generating these tags in the editor.