Python Forum
How to do a mass replace within a CSV file?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to do a mass replace within a CSV file?
#1
I have the following code that im using to open a CSV and then create a new filtered csv for importing into excel.

# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-04-28  7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
ndf = df.loc[mask]
The above works great and does what is needed, NOW we are trying to do a mass replace for a key string value so that it can update the pivot table we have in excel when the data is imported.

In the 3rd column of the file ("Comment") there is a value for each record that has 2 different values based on the row, so just need to replace this string:

Someone Else:    

with

Completed:
And replace this string:

Davids:

with

Dave:
Is it possible to do a find and replace or do we have to do it per line?

Currently we open the file in NotePad++ and just do a find and replace, but would love to have that built into the creation of the new file so we dont have to remember to do that step.
Reply
#2
Which OS? There are tools for doing this kind of thing. You don't have to write any code.
ndc85430 likes this post
Reply
#3
im in using windows, the point of doing it within the code is so that no one has to touch the file multiple times..

What tools are you referring to?
Reply
#4
On linux you could use tools like sed or awk to process one file at a time, all files in a folder, or all files in a folder and any subdirectories. With Windows I am at a loss.
ndc85430 likes this post
Reply
#5
On Windows, you can look at Windows Subsystem for Linux to run those kinds of tools.

I think it's well worth learning these things because they're made for these tasks. Those tools are present on any typical UNIX system (Mac OS as well as Linux, for example).
Reply
#6
When you already use Pandas,then do of course Pandas have method that dos this task.
Test.
import pandas as pd

df = pd.read_csv("data_test.csv")
>>> df
                  Date       Comment
0  2022-04-28  7:06 AM            NO
1  2022-04-28  7:06 AM  Someone Else
2  2022-04-28  7:06 AM        Davids
3  2022-04-28  7:06 AM            NO
4  2022-04-28  7:06 AM  Someone Else
5  2022-04-28  7:06 AM        Davids
6  2022-04-28  7:06 AM  Someone Else
>>> 
>>> replace_values = {'Someone Else': 'Completed', 'Davids': 'David'} 
>>> df = df.replace({"Comment": replace_values})
>>> df
                  Date    Comment
0  2022-04-28  7:06 AM         NO
1  2022-04-28  7:06 AM  Completed
2  2022-04-28  7:06 AM      David
3  2022-04-28  7:06 AM         NO
4  2022-04-28  7:06 AM  Completed
5  2022-04-28  7:06 AM      David
6  2022-04-28  7:06 AM  Completed
Read more about this in link under,a lot of methods accept Regex so can do all kind more advance task than this.
How to use Regex in Pandas
It's also task that can do plain Python,a ok training task to write some lines that fix this.
Gribouillis likes this post
Reply
#7
I tried implementing the above in my existing code and nothing gets replaced.

# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])


replace_values = {'Someone Else': 'Dan', 'This one':'Ros'}
df = df.replace({"Comment": replace_values})


# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])

# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-05-02  7:06 AM')

# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]

# DROP TEMP COLUMN BEFORE WRITING CSV
df['DT'].drop
Everything in this code works except the replace lines. Did i miss something? Im replacing after reading in the CSV..
Reply
#8
(May-09-2022, 01:56 PM)cubangt Wrote: I tried implementing the above in my existing code and nothing gets replaced.
How dos the DataFrame look?
Use print(df.head()) is Dafaframe is big.
In my Example so do Comment column only contained the word that need to replaced,
if it's a word in i middle of sentence then need to write Regex for that.
Reply
#9
So yes in my "comment" column its a free form text field, so it can be anything from 1 word to a whole paragraph

I ran the above suggestions and this is what is returned:

Date Time Comment
0 1/18/22 2:22 PM Someone Else: so for a Christmas back then
1 1/18/22 2:25 PM Someone Else: <Media omitted>
2 1/18/22 2:32 PM Davids: i dont care about the gifts

So im guessing Regex is my only option...??
Reply
#10
(May-09-2022, 05:50 PM)cubangt Wrote: So im guessing Regex is my only option...??
Yes it's the easiest option for this can this and df.replace take Regex with parameter regex=True.
So with Regex is word boundary character \b used when want to match a exact word only.
To make a example.
>>> df
                  Date                                        Comment
0  2022-04-28  7:06 AM                                             NO
1  2022-04-28  7:06 AM  PM Someone Else: so for a Christmas back then
2  2022-04-28  7:06 AM                                Davids are cool
3  2022-04-28  7:06 AM                                             NO
4  2022-04-28  7:06 AM                         hello Someone Else 123
5  2022-04-28  7:06 AM                            Welcome Davids 9999
6  2022-04-28  7:06 AM                                   Someone Else
>>> 
>>> replace_values = {'Someone Else': 'Completed', 'Davids': 'David'}
>>> replace_values = {rf'\b{k}\b': v for k, v in replace_values.items()}
>>> df.replace({"Comment": replace_values}, regex=True)
                  Date                                     Comment
0  2022-04-28  7:06 AM                                          NO
1  2022-04-28  7:06 AM  PM Completed: so for a Christmas back then
2  2022-04-28  7:06 AM                              David are cool
3  2022-04-28  7:06 AM                                          NO
4  2022-04-28  7:06 AM                         hello Completed 123
5  2022-04-28  7:06 AM                          Welcome David 9999
6  2022-04-28  7:06 AM                                   Completed
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Replace a text/word in docx file using Python Devan 4 3,466 Oct-17-2023, 06:03 PM
Last Post: Devan
  Need to replace a string with a file (HTML file) tester_V 1 775 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  Replace columns indexes reading a XSLX file Larry1888 2 996 Nov-18-2022, 10:16 PM
Last Post: Pedroski55
  I get an FileNotFouerror while try to open(file,"rt"). My goal is to replace str decoded 1 1,413 May-06-2022, 01:44 PM
Last Post: Larz60+
  Cloning a directory and using a .CSV file as a reference to search and replace bg25lam 2 2,148 May-31-2021, 07:00 AM
Last Post: bowlofred
  So, a mass of errors trying to run a zipline-dependant program on 3.5 env Evalias123 2 2,414 Jan-21-2021, 02:22 AM
Last Post: Evalias123
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 5,878 Aug-10-2020, 11:01 PM
Last Post: medatib531
  Replace XML tag and write to the same file krish216 4 7,900 Mar-26-2020, 07:39 AM
Last Post: snippsat
  Replace words in a file bitwo 3 2,437 Jan-22-2020, 04:05 PM
Last Post: bitwo
  How do you replace a word after a match from a list of words in each line of a file? vijju56 1 3,477 Oct-17-2019, 03:04 PM
Last Post: baquerik

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020