Hi,
I have a report that contains a lot of Mac machine names. Users seem to love apostrophes in the name.
I am getting this in my csv
administrator′s Mac
It should read
administrator’s Mac
This is my code. What can I do about that?
with open(full_report_path, 'w') as f:
writer = csv.writer(f)
writer.writerow(['High alerts found', len(list_of_high_alerts)])
writer.writerow(['Medium alerts found', len(list_of_medium_alerts)])
writer.writerow(report_column_names)
# Sets the column order
with open(full_report_path, 'a+', encoding='utf-8', newline='') as output_file:
dict_writer = csv.DictWriter(output_file, report_column_order)
dict_writer.writerows(computer_list)
In some cases users are using emoji for names. One machine is an ⚓️. I doubt there is much I can do about those ones!!
Don't use utf-8 encoding. It is too restrictive. You should probably specify a dialect.
Maybe regex the offending bits away?
import re
mystring = 'administrator′s Mac'
ns = re.sub('′', '\'', mystring)
Output:
>>> ns
"administrator's Mac"
>>>
Use
utf-8
both when read and write file,usually now these days is Python not the problem,
but outside stuff like files(use utf-8),Os,Editors...
When moved to Python 2 to 3 Unicode was one biggest changes.
Example file saved as utf-8:
some.csv
Output:
administrator’s, Mac
Boat, ⚓️
Rain, ☂️
はるさんハウスはどこですか, jap
import csv
with open('some.csv', newline='', encoding='utf-8') as f, open('out.csv', 'w', encoding='utf-8') as f_out:
reader = csv.reader(f)
writer = csv.writer(f_out)
for row in reader:
print(row[0], row[1])
writer.writerow(row)
Output:
administrator’s Mac
Boat ⚓️
Rain ☂️
はるさんハウスはどこですか jap
In
out.csv
:
Output:
administrator’s, Mac
Boat, ⚓️
Rain, ☂️
はるさんハウスはどこですか, jap
(Aug-17-2022, 09:18 PM)deanhystad Wrote: [ -> ]Don't use utf-8 encoding. It is too restrictive. You should probably specify a dialect.
Thanks for reply. I have fixed it partly by doing this. I was not encoding the first write as the others below pointed out. I am now doing this.
with open(full_report_path, 'w',encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['High alerts found', len(list_of_high_alerts)])
writer.writerow(['Medium alerts found', len(list_of_medium_alerts)])
writer.writerow(report_column_names)
# Sets the column order
with open(full_report_path, 'a+', encoding='utf-8', newline='') as output_file:
dict_writer = csv.DictWriter(output_file, report_column_order)
dict_writer.writerows(computer_list)
This works fine for Numbers, but Excel is still an issue. Can you please explain a bit more about dialects?
When I read in one of the reports on a Mac it works fine, I get the correct character. When I read the same file on a PC I get a charmap error. Why would Python be different on a Mac or PC?
![[Image: forum.png]](https://reedcricketclub.co.uk/forum.png)
(Aug-22-2022, 07:55 AM)bazcurtis Wrote: [ -> ]Why would Python be different on a Mac or PC?
I'm not a Windows user, but I recall that problems were reported with Python's output in the Windows console in Python versions older than 3.6. Make sure your Python is recent.
(Aug-22-2022, 07:55 AM)bazcurtis Wrote: [ -> ]When I read the same file on a PC I get a charmap error. Why would Python be different on a Mac or PC?
Windows will sometime choice wrong encoding(as
charmap
),so most be specify that shall use
encoding='utf-8'
.
Then it will work,also be aware that should save file(editors can mess it up) as
utf-8
.
Newer Python version also force OS to utf-8 in more places
PEP 538
Quote:PEP 528 and PEP 529 were implemented to bypass the operating system supplied interfaces
for binary data handling and force the use of UTF-8
instead.
Example and can also use
chardet
# Python version
E:\div_code
λ python -V
Python 3.10.5
# Test for what encoding used
E:\div_code
λ chardetect unicode.txt
unicode.txt: utf-8 with confidence 0.99
# Run file
E:\div_code
λ python uni.py
Crème and Spicy jalapeño ☂ ⛄日本語のキ
Code used.
unicode.txt
Output:
Crème and Spicy jalapeño ☂ ⛄日本語のキ
# uni.txt
with open('unicode.txt', encoding='utf-8') as f:
data = f.read()
print(data)
Same when write the same,always specify encoding.
s = 'Crème and Spicy jalapeño ☂ ⛄日本語のキ'
with open('unicode.txt', 'w', encoding='utf-8') as f_out:
f_out.write(s)
Thanks for all the replies. The CSV write code is above. This is the read code. I am running Python 3.10.6
if os.path.exists(full_report_path):
print(f'Reading report {full_report_path}')
with open(full_report_path, 'r', encoding='utf-8') as data:
for machine in csv.DictReader(data):
print(f"Sub Estate - {machine['Sub Estate']}. Hostname - {machine['Hostname']}")
if machine['Sub EstateID'] != '':
list_of_machines_to_delete.append(machine)