Python Forum
Manipulating files Python 2.7
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Manipulating files Python 2.7
#4
Fairly well explained here. In Python2, file.read() reads bytes as single-byte characters that may have to real meaning. If you use characters that aren't in the ASCII set (ASCII codes up to 127, which excludes accented characters) you have to use the 'unicode' type that behaves like a string but can contain non-ASCII characters. To go from the string of single byte to unicode you decode it:
# read in the file contents
iso=open('iso-8859-15.txt').read()
utf=open('utf-8.txt').read()

# this is how they look, one <str> character for each byte in the source file
print 'ISO:', repr(iso)
print 'UTF:', repr(utf)

# transform them to unicode, specifying the appropriate encoding
unicodeISO=unicode(iso,encoding='iso-8859-15')
unicodeUTF=unicode(utf,encoding='UTF-8')

# Now, as unicode strings, they are identical
print repr(unicodeISO),unicodeISO
print repr(unicodeUTF),unicodeUTF
(the two data files attached)

You may wonder why the Unicode string looks like the ISO one. It's an optical illusion. Of course the people who defined Unicode didn't completely reinvent the wheel, and integrated as many existing encoding as feasible. So the numbers that encode characters in ISO-8859 and Unicode can be the same. However, the first is a one-byte 0xe9 and the second is really the Unicode +00E9.

Needless to say, this means that you have to know in advance the encoding used to encode the files... On the other hand, there aren't that many encodings for Romance languages, so it will likely be either UTF-8 or some variant of ISO-8859.

Attached Files

.zip   dejavu.zip (Size: 369 bytes / Downloads: 339)
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply


Messages In This Thread
Manipulating files Python 2.7 - by hugobaur - Oct-21-2016, 01:20 PM
RE: Manipulating files Pytohn 2.7 - by Ofnuts - Oct-21-2016, 04:53 PM
RE: Manipulating files Pytohn 2.7 - by hugobaur - Oct-25-2016, 11:53 AM
RE: Manipulating files Python 2.7 - by Ofnuts - Oct-25-2016, 01:02 PM
RE: Manipulating files Python 2.7 - by hugobaur - Oct-31-2016, 06:37 PM
RE: Manipulating files Python 2.7 - by snippsat - Oct-31-2016, 07:16 PM
RE: Manipulating files Python 2.7 - by hugobaur - Nov-01-2016, 12:28 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Manipulating data from a CSV EvanS1 5 2,787 Jun-12-2020, 05:59 PM
Last Post: perfringo
  manipulating two lists rancans 8 3,308 Apr-16-2020, 06:00 PM
Last Post: deanhystad
  Manipulating index value, what is wrong with this code? Emun 1 1,820 Feb-05-2020, 07:18 AM
Last Post: perfringo
  Manipulating the filename of an output script mckinneycm 4 11,968 Jan-15-2020, 07:29 PM
Last Post: mckinneycm
  Manipulating Excel with Python. Spacely 2 3,698 Jun-25-2019, 01:57 AM
Last Post: Dequanharrison
  Manipulating CSV Prince_Bhatia 1 1,988 Apr-25-2019, 11:55 AM
Last Post: Gribouillis
  Reading and manipulating csv Prince_Bhatia 11 5,217 Mar-14-2019, 11:40 AM
Last Post: Larz60+
  Manipulating an Excel Workbook Stanimal 4 3,050 Jan-18-2019, 11:03 PM
Last Post: Stanimal
  Running a python tool transforming xml files into epub files silfer 7 5,522 May-10-2018, 03:49 PM
Last Post: snippsat
  Manipulating Binary Data arsenal88 10 8,803 Apr-25-2017, 02:30 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020