Posts: 4
Threads: 1
Joined: Apr 2017
Apr-21-2017, 08:58 PM
(This post was last modified: Apr-21-2017, 09:00 PM by metulburr.)
I am trying to replace commas with a "^" in a string that is already quoted. Below you can see input and output. However my code does not replace the string.
Any help would be appreciated.
Input:
Output: Id,Category,Description,Date
1,Test,Red Cars,02/12/2017
2,Test,Blue Cars,03/01/2017
3,Test,"Green, big cars",01/05/2016
Output should be:
Output: Id,Category,Description,Date
1,Test,Red Cars,02/12/2017
2,Test,Blue Cars,03/01/2017
3,Test,"Green^ big cars",01/05/2016
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
import csv
ifile = open ( 'C:/Users/jpilon/Documents/test.csv' , 'r' )
reader = csv.reader(ifile,delimiter = ',' )
ofile = open ( 'C:/Users/jpilon/Documents/test_new.csv' , 'w' )
writer = csv.writer(ofile, delimiter = ',' )
findlist = [ '"*,*"' ]
replacelist = [ '"*^*"' ]
rep = dict ( zip (findlist, replacelist))
def findReplace(find, replace):
s = ifile.read()
s = s.replace(find, replace)
ofile.write(s)
for item in findlist:
findReplace(item, rep[item])
ifile.close()
ofile.close()
|
Posts: 1,298
Threads: 38
Joined: Sep 2016
Could just be me, but your input looks exactly like your output.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Posts: 7,320
Threads: 123
Joined: Sep 2016
Apr-22-2017, 04:06 PM
(This post was last modified: Apr-22-2017, 04:06 PM by snippsat.)
You don't use reader that csv module makes.
In line 15 you read() all in as string.
So read in and make a nested list,then can replace values and keep csv structure.
1 2 3 4 5 |
import csv
with open ( 'in.csv' ) as f:
reader = csv.reader(f, delimiter = ',' )
cars_info = [i for i in reader]
|
Test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
>>> cars_info
[[ 'Id' , 'Category' , 'Description' , 'Date' ],
[ '1' , 'Test' , 'Red Cars' , '02/12/2017' ],
[ '2' , 'Test' , 'Blue Cars' , '03/01/2017' ],
[ '3' , 'Test' , 'Green, big cars' , '01/05/2016' ]]
>>> cars_info[ 3 ][ 2 ]
'Green, big cars'
>>> cars_info[ 3 ][ 2 ] = "Green^ big cars"
>>> cars_info
[[ 'Id' , 'Category' , 'Description' , 'Date' ],
[ '1' , 'Test' , 'Red Cars' , '02/12/2017' ],
[ '2' , 'Test' , 'Blue Cars' , '03/01/2017' ],
[ '3' , 'Test' , 'Green^ big cars' , '01/05/2016' ]]
|
Posts: 4
Threads: 1
Joined: Apr 2017
Thanks snippsat. I am new to Python, so forgive my ignorance. I understand what you are doing, I guess where I am lost is how to apply it to a global find and replace in my code.
If I were certain that a column 3 could potentially have the double quotes, then how would I replace any that met that criteria?
Posts: 3,458
Threads: 101
Joined: Sep 2016
Using the csv module is probably the way to go, since it'll handle the quotes for you. But as usual, a regular expression also works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
>>> text =
>>> import re
>>> regex = re. compile (r '("[^",]*),([^",]*")' )
>>> print (regex.sub(r '\1^\2' , text))
Id ,Category,Description,Date
1 ,Test,Red Cars, 02 / 12 / 2017
2 ,Test,Blue Cars, 03 / 01 / 2017
3 ,Test, "Green^ big cars" , 01 / 05 / 2016
|
Posts: 4
Threads: 1
Joined: Apr 2017
The regular expression seemed to work on my example data, but when I tried a larger file with more columns, it did not replace the comma within the quotes.
Was the regular expression someone pointing to column 2 only?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import re
with open ( 'file.csv' ) as f:
s = f.read() + '\n'
regex = re. compile (r '("[^",]*),([^",]*")' )
s1 = (regex.sub(r '\1^\2' , s))
print (s1)
f = open ( 'file.csv' , "w" )
f.write(s1)
f.close()
|
Posts: 2,953
Threads: 48
Joined: Sep 2016
Just use csv module
1 2 3 4 5 6 |
import csv
with open ( '/tmp/input.csv' , 'r' ) as in_file:
data = csv.reader(in_file, delimiter = ',' )
for row in data:
print ([col.replace( ',' , '^' ) for col in row])
|
Output: ['Id', 'Category', 'Description', 'Date']
['1', 'Test', 'Red Cars', '02/12/2017']
['2', 'Test', 'Blue Cars', '03/01/2017']
['3', 'Test', 'Green^ big cars', '01/05/2016']
Posts: 4
Threads: 1
Joined: Apr 2017
I did finally figure out how to read the csv and write new values to csv using my test file. Thanks for all your help!
Now when I try this on a 1 gig csv file, I run into memory error. I know there are ways to do this in chunks, but that should be a question in a new thread.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import csv
new_rows_list = []
f1 = open ( 'in_file' , 'r' )
reader = csv.reader(f1, delimiter = ',' )
for row in reader:
new_row = ([col.replace( ',' , '^' ) for col in row])
new_rows_list.append(new_row)
f2 = open ( 'out_file' , 'w' )
writer = csv.writer(f2)
writer.writerows(new_rows_list)
f2.close()
f1.close()
|
Posts: 3,458
Threads: 101
Joined: Sep 2016
Apr-26-2017, 05:08 AM
(This post was last modified: Apr-26-2017, 05:09 AM by nilamo.)
(Apr-25-2017, 08:18 PM)jmpatx Wrote:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import csv
new_rows_list = []
f1 = open ( 'in_file' , 'r' )
reader = csv.reader(f1, delimiter = ',' )
for row in reader:
new_row = ([col.replace( ',' , '^' ) for col in row])
new_rows_list.append(new_row)
f2 = open ( 'out_file' , 'w' )
writer = csv.writer(f2)
writer.writerows(new_rows_list)
f2.close()
f1.close()
|
Don't store the whole file in memory, just work on it line-by-line:
1 2 3 4 5 6 7 8 9 |
import csv
with open ( "in_file" , "r" , newline = "") as f1:
reader = csv.reader(f1, delimiter = "," )
with open ( "out_file" , "w" , newline = "") as f2:
writer = csv.writer(f2)
for row in reader:
new_row = [col.replace( "," , "^" ) for col in row]
writer.writerow(new_row)
|
Posts: 7,320
Threads: 123
Joined: Sep 2016
(Apr-26-2017, 05:08 AM)nilamo Wrote: Don't store the whole file in memory, just work on it line-by-line: Yepp better.
Can also write it like this,one with is enough.
1 2 3 4 5 6 7 8 |
import csv
with open ( "in.csv" ) as f1, open ( "out.csv" , "w" , newline = "") as f2:
reader = csv.reader(f1, delimiter = "," )
writer = csv.writer(f2)
for row in reader:
new_row = [col.replace( "," , "^" ) for col in row]
writer.writerow(new_row)
|
|