Python Forum
Two text files, want to add a column value
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Two text files, want to add a column value
#1
Hi guys, a basic question. I have two text files which contain data like the following. There will always be only two columns, the first being date/time and the second being a number. Both files will have the same number of rows and the first column (date/time) will always be the same for both. The only difference is the value in the second column. I want to produce a new file which contains the date/time and a sum of the second column.
The screenshot below makes it clear.

Noobie question - what is the best way to do this? All advice greatly appreciated.
Many thanks in advance.

Attached Files

Thumbnail(s)
   
Reply
#2
I'm rusty at this, but it looks like string.split() function is the what you are looking for.

text = "bunch of stuff, other stuff"

x = text.split(", ")

print(x) 
output:
['bunch of stuff', 'other stuff']
You get a list. You can refer to each part by index text[0], text[1].

Is that helpful?
Reply
#3
Thanks for the reply, but this won't do it. I have two files and not a string made up of two different parts.
Your solution would be great if I had a string of text and wanted to split this up.
Reply
#4
If it is guaranteed that the two files will contain the same timestamps you can do this:
import pandas as pd
import io

f1 = io.StringIO("""
20220617 14:30:00, 1.1
20220617 14:40:00, 1.3
20220617 14:50:00, 1.7
20220617 15:00:00, 1.4
""")

f2 = io.StringIO("""
20220617 14:30:00, 2.1
20220617 14:40:00, 2.3
20220617 14:50:00, 2.7
20220617 15:00:00, 2.4
""")

df1 = pd.read_csv(f1, names=("Date", "Value"))
df2 = pd.read_csv(f2, names=("Date", "Value"))

print(df1)
print(df2)
print(df1 + df2)
Output:
Date Value 0 20220617 14:30:00 1.1 1 20220617 14:40:00 1.3 2 20220617 14:50:00 1.7 3 20220617 15:00:00 1.4 Date Value 0 20220617 14:30:00 2.1 1 20220617 14:40:00 2.3 2 20220617 14:50:00 2.7 3 20220617 15:00:00 2.4 Date Value 0 20220617 14:30:0020220617 14:30:00 3.2 1 20220617 14:40:0020220617 14:40:00 3.6 2 20220617 14:50:0020220617 14:50:00 4.4 3 20220617 15:00:0020220617 15:00:00 3.8
Instead of printing you would (df1 + df2).write_csv(file_path)
zxcv101 and Larz60+ like this post
Reply
#5
Thanks, this looks more like what I need.

Question - how do I remove the hardcoded values in this bit :

f1 = io.StringIO("""
20220617 14:30:00, 1.1
20220617 14:40:00, 1.3
20220617 14:50:00, 1.7
20220617 15:00:00, 1.4
""")

...and instead, specify a filename on my directory. For example, I'd love to say
f1 = io.StringIO(filename)

but of course this does not work. I've read up on the online documentation for StringIO and couldn't find an example
of reading an actual file.
Reply
#6
(Jun-19-2022, 09:05 PM)zxcv101 Wrote: but of course this does not work. I've read up on the online documentation for StringIO and couldn't find an example
of reading an actual file.
You shall not use StringIO when read a files,StringIO is an in-memory file-like object used to show visible(an working) example how the files could look.
It like this with files.
import pandas as pd

df1 = pd.read_csv('one.txt', names=("Date", "Value"))
df2 = pd.read_csv('two.txt', names=("Date", "Value"))

print(df1)
print(df2)
print(df1 + df2)
Output:
Date Value 0 date 1 1 date 2 2 date 3 Date Value 0 date 3 1 date 4 2 date 5 Date Value 0 datedate 4 1 datedate 6 2 datedate 8
Larz60+ and zxcv101 like this post
Reply
#7
You read the pandas documentation that describes how to use the read_csv and write_csv functions. When I read them again I realized the command to write a dataframe to a csv file is df.to_csv(), not df.write_csv().

When correcting file writing I noticed that the datetime column is messed up when adding the two dataframes. And I don't like that there is a requirement that the files have identical datetimes. The code below merges the two dataframes based on their date so you end up with a new dataframe that has three columns. Then I add the two "Value" columns to get a third. First I merge the dataframes using the "Date" column to decide how to merge the frames. If a data is only in one frame I want it included, so I set "how" to "outer".
import pandas as pd
import io

f1 = io.StringIO("""
20220617 14:30:00, 1.1
20220617 14:40:00, 1.3
20220617 14:50:00, 1.7
20220617 15:05:00, 1.4
""")

f2 = io.StringIO("""
20220617 14:30:00, 2.1
20220617 14:40:00, 2.3
20220617 14:50:00, 2.7
20220617 15:00:00, 2.4
""")

df1 = pd.read_csv(f1, names=("Date", "Value"))  # can replace StringIO with a file path
df2 = pd.read_csv(f2, names=("Date", "Value"))
df3 = df1.merge(df2, on="Date", how="outer")
print(df3)
Output:
0 20220617 14:30:00 1.1 2.1 1 20220617 14:40:00 1.3 2.3 2 20220617 14:50:00 1.7 2.7 3 20220617 15:05:00 1.4 NaN 4 20220617 15:00:00 NaN 2.4
Notice that Python fills empty cells with Nan (Not a Number). I cannot add numbers and NaN's, so I convert the NaN's to 0.0 before I sum the value columns to make a new column named "Total".
df3 = df3.fillna(0.0)
df3["Total"] = df3["Value_x"] + df3["Value_y"]
print(df3)
Output:
Date Value_x Value_y Total 0 20220617 14:30:00 1.1 2.1 3.2 1 20220617 14:40:00 1.3 2.3 3.6 2 20220617 14:50:00 1.7 2.7 4.4 3 20220617 15:05:00 1.4 0.0 1.4 4 20220617 15:00:00 0.0 2.4 2.4
Now it is time to write the combined frames to a csv file. You only want to write the "Date" and "Total" columns, and you want to remove the header and index column. Luckily you can specify all this in the to_csv() call.
df3.to_csv("data.csv", columns=("Date", "Total"), header=False, index=False)
20220617 14:30:00,3.2
20220617 14:40:00,3.5999999999999996
20220617 14:50:00,4.4
20220617 15:05:00,1.4
20220617 15:00:00,2.4
Reading the documentation not only fixed the write error, but resulted in a better solution. Now you can sum files with arbitrary timestamps and files of differing lengths.
ndc85430 and zxcv101 like this post
Reply
#8
@deanhystad, thank you so much for the replies and helping me to do this. Your solution worked. It is as easy as doing this :

import pandas as pd

df1 = pd.read_csv('file1.txt', names=("Date", "Value"))
df2 = pd.read_csv('file2.txt', names=("Date", "Value"))

print(df1)
print(df2)
print(df1 + df2)

df3 = df1.merge(df2, on="Date", how="outer")
df3['Total'] = df3['Value_x']+df3['Value_y']

print(df3)

and this produces the following table which is exactly what I wanted :

Date Value_x Value_y Total
0 20220617 17:00:00 1.1 2.1 3.2
1 20220617 17:10:00 1.4 2.4 3.8
2 20220617 17:20:00 1.7 2.7 4.4
3 20220617 17:30:00 1.3 2.3 3.6
Reply
#9
Please wrap code in Python tags and output in Output tags. There are buttons for generating these tags in the editor.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  azure TTS from text files to mp3s mutantGOD 2 2,239 Jan-17-2023, 03:20 AM
Last Post: mutantGOD
  Writing into 2 text files from the same function paul18fr 4 2,097 Jul-28-2022, 04:34 AM
Last Post: ndc85430
  Delete empty text files [SOLVED] AlphaInc 5 2,053 Jul-09-2022, 02:15 PM
Last Post: DeaD_EyE
  select files such as text file RolanRoll 2 1,551 Jun-25-2022, 08:07 PM
Last Post: RolanRoll
  select Eof extension files based on text list of filenames with if condition RolanRoll 1 1,848 Apr-04-2022, 09:29 PM
Last Post: Larz60+
  Append files and add column with last part of each filename NiKirk 0 3,081 Feb-04-2022, 07:35 AM
Last Post: NiKirk
  Separate text files and convert into csv marfer 6 3,643 Dec-10-2021, 12:09 PM
Last Post: marfer
  Sorting and Merging text-files [SOLVED] AlphaInc 10 6,038 Aug-20-2021, 05:42 PM
Last Post: snippsat
  Replace String in multiple text-files [SOLVED] AlphaInc 5 9,532 Aug-08-2021, 04:59 PM
Last Post: Axel_Erfurt
  Several pdf files to text mfernandes 10 6,922 Jul-07-2021, 11:39 PM
Last Post: Pedroski55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020