How to do a mass replace within a CSV file?

How to do a mass replace within a CSV file? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to do a mass replace within a CSV file? (/thread-37162.html)

How to do a mass replace within a CSV file? - cubangt - May-06-2022

I have the following code that im using to open a CSV and then create a new filtered csv for importing into excel.

# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-04-28  7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
ndf = df.loc[mask]

The above works great and does what is needed, NOW we are trying to do a mass replace for a key string value so that it can update the pivot table we have in excel when the data is imported.

In the 3rd column of the file ("Comment") there is a value for each record that has 2 different values based on the row, so just need to replace this string:

Someone Else:    

with

Completed:

And replace this string:

Davids:

with

Dave:

Is it possible to do a find and replace or do we have to do it per line?

Currently we open the file in NotePad++ and just do a find and replace, but would love to have that built into the creation of the new file so we dont have to remember to do that step.

RE: How to do a mass replace within a CSV file? - deanhystad - May-06-2022

Which OS? There are tools for doing this kind of thing. You don't have to write any code.

RE: How to do a mass replace within a CSV file? - cubangt - May-06-2022

im in using windows, the point of doing it within the code is so that no one has to touch the file multiple times..

What tools are you referring to?

RE: How to do a mass replace within a CSV file? - deanhystad - May-06-2022

On linux you could use tools like sed or awk to process one file at a time, all files in a folder, or all files in a folder and any subdirectories. With Windows I am at a loss.

RE: How to do a mass replace within a CSV file? - ndc85430 - May-08-2022

On Windows, you can look at Windows Subsystem for Linux to run those kinds of tools.

I think it's well worth learning these things because they're made for these tasks. Those tools are present on any typical UNIX system (Mac OS as well as Linux, for example).

RE: How to do a mass replace within a CSV file? - snippsat - May-08-2022

When you already use Pandas,then do of course Pandas have method that dos this task.
Test.

import pandas as pd

df = pd.read_csv("data_test.csv")

>>> df
                  Date       Comment
0  2022-04-28  7:06 AM            NO
1  2022-04-28  7:06 AM  Someone Else
2  2022-04-28  7:06 AM        Davids
3  2022-04-28  7:06 AM            NO
4  2022-04-28  7:06 AM  Someone Else
5  2022-04-28  7:06 AM        Davids
6  2022-04-28  7:06 AM  Someone Else
>>> 
>>> replace_values = {'Someone Else': 'Completed', 'Davids': 'David'} 
>>> df = df.replace({"Comment": replace_values})
>>> df
                  Date    Comment
0  2022-04-28  7:06 AM         NO
1  2022-04-28  7:06 AM  Completed
2  2022-04-28  7:06 AM      David
3  2022-04-28  7:06 AM         NO
4  2022-04-28  7:06 AM  Completed
5  2022-04-28  7:06 AM      David
6  2022-04-28  7:06 AM  Completed

Read more about this in link under,a lot of methods accept Regex so can do all kind more advance task than this.
How to use Regex in Pandas
It's also task that can do plain Python,a ok training task to write some lines that fix this.

RE: How to do a mass replace within a CSV file? - cubangt - May-09-2022

I tried implementing the above in my existing code and nothing gets replaced.

# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])


replace_values = {'Someone Else': 'Dan', 'This one':'Ros'}
df = df.replace({"Comment": replace_values})


# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])

# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-05-02  7:06 AM')

# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]

# DROP TEMP COLUMN BEFORE WRITING CSV
df['DT'].drop

Everything in this code works except the replace lines. Did i miss something? Im replacing after reading in the CSV..

RE: How to do a mass replace within a CSV file? - snippsat - May-09-2022

(May-09-2022, 01:56 PM)cubangt Wrote: I tried implementing the above in my existing code and nothing gets replaced.

How dos the DataFrame look?
Use print(df.head()) is Dafaframe is big.
In my Example so do Comment column only contained the word that need to replaced,
if it's a word in i middle of sentence then need to write Regex for that.

RE: How to do a mass replace within a CSV file? - cubangt - May-09-2022

So yes in my "comment" column its a free form text field, so it can be anything from 1 word to a whole paragraph

I ran the above suggestions and this is what is returned:

Date Time Comment
0 1/18/22 2:22 PM Someone Else: so for a Christmas back then
1 1/18/22 2:25 PM Someone Else: <Media omitted>
2 1/18/22 2:32 PM Davids: i dont care about the gifts

So im guessing Regex is my only option...??

RE: How to do a mass replace within a CSV file? - snippsat - May-09-2022

(May-09-2022, 05:50 PM)cubangt Wrote: So im guessing Regex is my only option...??

Yes it's the easiest option for this can this and df.replace take Regex with parameter regex=True.
So with Regex is word boundary character \b used when want to match a exact word only.
To make a example.

>>> df
                  Date                                        Comment
0  2022-04-28  7:06 AM                                             NO
1  2022-04-28  7:06 AM  PM Someone Else: so for a Christmas back then
2  2022-04-28  7:06 AM                                Davids are cool
3  2022-04-28  7:06 AM                                             NO
4  2022-04-28  7:06 AM                         hello Someone Else 123
5  2022-04-28  7:06 AM                            Welcome Davids 9999
6  2022-04-28  7:06 AM                                   Someone Else
>>> 
>>> replace_values = {'Someone Else': 'Completed', 'Davids': 'David'}
>>> replace_values = {rf'\b{k}\b': v for k, v in replace_values.items()}
>>> df.replace({"Comment": replace_values}, regex=True)
                  Date                                     Comment
0  2022-04-28  7:06 AM                                          NO
1  2022-04-28  7:06 AM  PM Completed: so for a Christmas back then
2  2022-04-28  7:06 AM                              David are cool
3  2022-04-28  7:06 AM                                          NO
4  2022-04-28  7:06 AM                         hello Completed 123
5  2022-04-28  7:06 AM                          Welcome David 9999
6  2022-04-28  7:06 AM                                   Completed