Manipulating files Python 2.7

***Ofnuts*** · (This post was last modified: Oct-25-2016, 01:04 PM by Ofnuts.)

Fairly well explained here. In Python2, file.read() reads bytes as single-byte characters that may have to real meaning. If you use characters that aren't in the ASCII set (ASCII codes up to 127, which excludes accented characters) you have to use the 'unicode' type that behaves like a string but can contain non-ASCII characters. To go from the string of single byte to unicode you decode it:

# read in the file contents
iso=open('iso-8859-15.txt').read()
utf=open('utf-8.txt').read()

# this is how they look, one <str> character for each byte in the source file
print 'ISO:', repr(iso)
print 'UTF:', repr(utf)

# transform them to unicode, specifying the appropriate encoding
unicodeISO=unicode(iso,encoding='iso-8859-15')
unicodeUTF=unicode(utf,encoding='UTF-8')

# Now, as unicode strings, they are identical
print repr(unicodeISO),unicodeISO
print repr(unicodeUTF),unicodeUTF

(the two data files attached)

You may wonder why the Unicode string looks like the ISO one. It's an optical illusion. Of course the people who defined Unicode didn't completely reinvent the wheel, and integrated as many existing encoding as feasible. So the numbers that encode characters in ISO-8859 and Unicode can be the same. However, the first is a one-byte 0xe9 and the second is really the Unicode +00E9.

Needless to say, this means that you have to know in advance the encoding used to encode the files... On the other hand, there aren't that many encodings for Romance languages, so it will likely be either UTF-8 or some variant of ISO-8859.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Manipulating data from a CSV	EvanS1	5	3,633	Jun-12-2020, 05:59 PM Last Post: perfringo
	manipulating two lists	rancans	8	4,533	Apr-16-2020, 06:00 PM Last Post: deanhystad
	Manipulating index value, what is wrong with this code?	Emun	1	2,383	Feb-05-2020, 07:18 AM Last Post: perfringo
	Manipulating the filename of an output script	mckinneycm	4	13,014	Jan-15-2020, 07:29 PM Last Post: mckinneycm
	Manipulating Excel with Python.	Spacely	2	5,100	Jun-25-2019, 01:57 AM Last Post: Dequanharrison
	Manipulating CSV	Prince_Bhatia	1	2,452	Apr-25-2019, 11:55 AM Last Post: Gribouillis
	Reading and manipulating csv	Prince_Bhatia	11	6,966	Mar-14-2019, 11:40 AM Last Post: Larz60+
	Manipulating an Excel Workbook	Stanimal	4	4,072	Jan-18-2019, 11:03 PM Last Post: Stanimal
	Running a python tool transforming xml files into epub files	silfer	7	6,994	May-10-2018, 03:49 PM Last Post: snippsat
	Manipulating Binary Data	arsenal88	10	10,312	Apr-25-2017, 02:30 PM Last Post: snippsat

Manipulating files Python 2.7

User Panel Messages

Announcements