How to do a mass replace within a CSV file?

cubangt · May-06-2022, 07:15 PM

I have the following code that im using to open a CSV and then create a new filtered csv for importing into excel.

# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-04-28  7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
ndf = df.loc[mask]

The above works great and does what is needed, NOW we are trying to do a mass replace for a key string value so that it can update the pivot table we have in excel when the data is imported.

In the 3rd column of the file ("Comment") there is a value for each record that has 2 different values based on the row, so just need to replace this string:

Someone Else:    

with

Completed:

And replace this string:

Davids:

with

Dave:

Is it possible to do a find and replace or do we have to do it per line?

Currently we open the file in NotePad++ and just do a find and replace, but would love to have that built into the creation of the new file so we dont have to remember to do that step.

**deanhystad** · May-06-2022, 08:50 PM

Which OS? There are tools for doing this kind of thing. You don't have to write any code.

cubangt · May-06-2022, 08:53 PM

im in using windows, the point of doing it within the code is so that no one has to touch the file multiple times..

What tools are you referring to?

**deanhystad** · May-06-2022, 09:31 PM

On linux you could use tools like sed or awk to process one file at a time, all files in a folder, or all files in a folder and any subdirectories. With Windows I am at a loss.

ndc85430 · May-08-2022, 05:18 AM

On Windows, you can look at Windows Subsystem for Linux to run those kinds of tools.

I think it's well worth learning these things because they're made for these tasks. Those tools are present on any typical UNIX system (Mac OS as well as Linux, for example).

***snippsat*** · (This post was last modified: May-08-2022, 08:48 AM by snippsat.)

When you already use Pandas,then do of course Pandas have method that dos this task.
Test.

import pandas as pd

df = pd.read_csv("data_test.csv")

>>> df
                  Date       Comment
0  2022-04-28  7:06 AM            NO
1  2022-04-28  7:06 AM  Someone Else
2  2022-04-28  7:06 AM        Davids
3  2022-04-28  7:06 AM            NO
4  2022-04-28  7:06 AM  Someone Else
5  2022-04-28  7:06 AM        Davids
6  2022-04-28  7:06 AM  Someone Else
>>> 
>>> replace_values = {'Someone Else': 'Completed', 'Davids': 'David'} 
>>> df = df.replace({"Comment": replace_values})
>>> df
                  Date    Comment
0  2022-04-28  7:06 AM         NO
1  2022-04-28  7:06 AM  Completed
2  2022-04-28  7:06 AM      David
3  2022-04-28  7:06 AM         NO
4  2022-04-28  7:06 AM  Completed
5  2022-04-28  7:06 AM      David
6  2022-04-28  7:06 AM  Completed

Read more about this in link under,a lot of methods accept Regex so can do all kind more advance task than this.
How to use Regex in Pandas
It's also task that can do plain Python,a ok training task to write some lines that fix this.

cubangt · May-09-2022, 01:56 PM

I tried implementing the above in my existing code and nothing gets replaced.

# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])


replace_values = {'Someone Else': 'Dan', 'This one':'Ros'}
df = df.replace({"Comment": replace_values})


# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])

# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-05-02  7:06 AM')

# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]

# DROP TEMP COLUMN BEFORE WRITING CSV
df['DT'].drop

Everything in this code works except the replace lines. Did i miss something? Im replacing after reading in the CSV..

***snippsat*** · May-09-2022, 04:20 PM

(May-09-2022, 01:56 PM)cubangt Wrote: I tried implementing the above in my existing code and nothing gets replaced.

How dos the DataFrame look?
Use print(df.head()) is Dafaframe is big.
In my Example so do Comment column only contained the word that need to replaced,
if it's a word in i middle of sentence then need to write Regex for that.

cubangt · May-09-2022, 05:50 PM

So yes in my "comment" column its a free form text field, so it can be anything from 1 word to a whole paragraph

I ran the above suggestions and this is what is returned:

Date Time Comment
0 1/18/22 2:22 PM Someone Else: so for a Christmas back then
1 1/18/22 2:25 PM Someone Else: <Media omitted>
2 1/18/22 2:32 PM Davids: i dont care about the gifts

So im guessing Regex is my only option...??

***snippsat*** · (This post was last modified: May-09-2022, 06:53 PM by snippsat.)

(May-09-2022, 05:50 PM)cubangt Wrote: So im guessing Regex is my only option...??

Yes it's the easiest option for this can this and df.replace take Regex with parameter regex=True.
So with Regex is word boundary character \b used when want to match a exact word only.
To make a example.

>>> df
                  Date                                        Comment
0  2022-04-28  7:06 AM                                             NO
1  2022-04-28  7:06 AM  PM Someone Else: so for a Christmas back then
2  2022-04-28  7:06 AM                                Davids are cool
3  2022-04-28  7:06 AM                                             NO
4  2022-04-28  7:06 AM                         hello Someone Else 123
5  2022-04-28  7:06 AM                            Welcome Davids 9999
6  2022-04-28  7:06 AM                                   Someone Else
>>> 
>>> replace_values = {'Someone Else': 'Completed', 'Davids': 'David'}
>>> replace_values = {rf'\b{k}\b': v for k, v in replace_values.items()}
>>> df.replace({"Comment": replace_values}, regex=True)
                  Date                                     Comment
0  2022-04-28  7:06 AM                                          NO
1  2022-04-28  7:06 AM  PM Completed: so for a Christmas back then
2  2022-04-28  7:06 AM                              David are cool
3  2022-04-28  7:06 AM                                          NO
4  2022-04-28  7:06 AM                         hello Completed 123
5  2022-04-28  7:06 AM                          Welcome David 9999
6  2022-04-28  7:06 AM                                   Completed

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[SOLVED] [Beautiful Soup] Replace tag.string from another file?	Winfried	2	584	May-01-2025, 03:43 PM Last Post: Winfried
	Replace values in Yaml file with value in dictionary	PelleH	1	2,379	Feb-11-2025, 09:51 AM Last Post: alexjordan
	Replace a text/word in docx file using Python	Devan	4	25,203	Oct-17-2023, 06:03 PM Last Post: Devan
	Need to replace a string with a file (HTML file)	tester_V	1	2,045	Aug-30-2023, 03:42 AM Last Post: Larz60+
	Replace columns indexes reading a XSLX file	Larry1888	2	1,843	Nov-18-2022, 10:16 PM Last Post: Pedroski55
	I get an FileNotFouerror while try to open(file,"rt"). My goal is to replace str	decoded	1	2,143	May-06-2022, 01:44 PM Last Post: Larz60+
	Cloning a directory and using a .CSV file as a reference to search and replace	bg25lam	2	2,969	May-31-2021, 07:00 AM Last Post: bowlofred
	So, a mass of errors trying to run a zipline-dependant program on 3.5 env	Evalias123	2	3,391	Jan-21-2021, 02:22 AM Last Post: Evalias123
	Iterate 2 large text files across lines and replace lines in second file	medatib531	13	9,452	Aug-10-2020, 11:01 PM Last Post: medatib531
	Replace XML tag and write to the same file	krish216	4	10,979	Mar-26-2020, 07:39 AM Last Post: snippsat

How to do a mass replace within a CSV file?

User Panel Messages

Announcements