Python Forum

Hi,

I have a report that contains a lot of Mac machine names. Users seem to love apostrophes in the name.

I am getting this in my csv

administrator‚Ä≤s Mac

It should read

administrator’s Mac

This is my code. What can I do about that?

 with open(full_report_path, 'w') as f:
        writer = csv.writer(f)
        writer.writerow(['High alerts found', len(list_of_high_alerts)])
        writer.writerow(['Medium alerts found', len(list_of_medium_alerts)])
        writer.writerow(report_column_names)
    # Sets the column order
    with open(full_report_path, 'a+', encoding='utf-8', newline='') as output_file:
        dict_writer = csv.DictWriter(output_file, report_column_order)
        dict_writer.writerows(computer_list)

In some cases users are using emoji for names. One machine is an ⚓️. I doubt there is much I can do about those ones!!

Don't use utf-8 encoding. It is too restrictive. You should probably specify a dialect.

Maybe regex the offending bits away?

import re
mystring = 'administrator‚Ä≤s Mac'
ns = re.sub('‚Ä≤', '\'', mystring)

Output:>>> ns
"administrator's Mac"
>>>

Use utf-8 both when read and write file,usually now these days is Python not the problem,
but outside stuff like files(use utf-8),Os,Editors...
When moved to Python 2 to 3 Unicode was one biggest changes.

Example file saved as utf-8:
some.csv

Output:administrator’s, Mac
Boat, ⚓️
Rain, ☂️
はるさんハウスはどこですか, jap

import csv

with open('some.csv', newline='', encoding='utf-8') as f, open('out.csv', 'w', encoding='utf-8') as f_out:
    reader = csv.reader(f)
    writer = csv.writer(f_out)
    for row in reader:
        print(row[0], row[1])
        writer.writerow(row)

Output:administrator’s  Mac
Boat  ⚓️
Rain  ☂️
はるさんハウスはどこですか  jap

In out.csv:

Output:administrator’s, Mac
Boat, ⚓️
Rain, ☂️
はるさんハウスはどこですか, jap

(Aug-17-2022, 09:18 PM)deanhystad Wrote: [ -> ]Don't use utf-8 encoding. It is too restrictive. You should probably specify a dialect.

Thanks for reply. I have fixed it partly by doing this. I was not encoding the first write as the others below pointed out. I am now doing this.

 with open(full_report_path, 'w',encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['High alerts found', len(list_of_high_alerts)])
        writer.writerow(['Medium alerts found', len(list_of_medium_alerts)])
        writer.writerow(report_column_names)
    # Sets the column order
    with open(full_report_path, 'a+', encoding='utf-8', newline='') as output_file:
        dict_writer = csv.DictWriter(output_file, report_column_order)
        dict_writer.writerows(computer_list)

This works fine for Numbers, but Excel is still an issue. Can you please explain a bit more about dialects?

When I read in one of the reports on a Mac it works fine, I get the correct character. When I read the same file on a PC I get a charmap error. Why would Python be different on a Mac or PC?

[Image: forum.png]

(Aug-22-2022, 07:55 AM)bazcurtis Wrote: [ -> ]Why would Python be different on a Mac or PC?

I'm not a Windows user, but I recall that problems were reported with Python's output in the Windows console in Python versions older than 3.6. Make sure your Python is recent.

(Aug-22-2022, 07:55 AM)bazcurtis Wrote: [ -> ]When I read the same file on a PC I get a charmap error. Why would Python be different on a Mac or PC?

Windows will sometime choice wrong encoding(as charmap),so most be specify that shall use encoding='utf-8'.
Then it will work,also be aware that should save file(editors can mess it up) as utf-8 .
Newer Python version also force OS to utf-8 in more places PEP 538

Quote:PEP 528 and PEP 529 were implemented to bypass the operating system supplied interfaces
for binary data handling and force the use of UTF-8 instead.

Example and can also use chardet

# Python version
E:\div_code
λ python -V
Python 3.10.5

# Test for what encoding used 
E:\div_code
λ chardetect unicode.txt
unicode.txt: utf-8 with confidence 0.99

# Run file
E:\div_code
λ python uni.py
Crème and Spicy jalapeño ☂ ⛄日本語のキ

Code used.
unicode.txt

Output:
Crème and Spicy jalapeño ☂ ⛄日本語のキ

# uni.txt
with open('unicode.txt', encoding='utf-8') as f:
    data = f.read()
    print(data)

Same when write the same,always specify encoding.

s = 'Crème and Spicy jalapeño ☂ ⛄日本語のキ'
with open('unicode.txt', 'w', encoding='utf-8') as f_out:
    f_out.write(s)

Thanks for all the replies. The CSV write code is above. This is the read code. I am running Python 3.10.6

    if os.path.exists(full_report_path):
        print(f'Reading report {full_report_path}')
        with open(full_report_path, 'r', encoding='utf-8') as data:
            for machine in csv.DictReader(data):
                print(f"Sub Estate - {machine['Sub Estate']}. Hostname - {machine['Hostname']}")
                if machine['Sub EstateID'] != '':
                    list_of_machines_to_delete.append(machine)

bazcurtis

deanhystad

Pedroski55

snippsat

bazcurtis

bazcurtis

Gribouillis

snippsat

bazcurtis