Hi,
After a lot of reading an searching to no avail, this is my problem:
1. I can read & write xml files using ElementTree : no problems as long as i stick to "normal" chars. (ascii < 128)
2. Normally utf-8 is the standard encoding, i tested that in IDLE with: sys.getdefaultencoding() => it says "utf-8"
3. But now i try to sneek in a "French" char like so:
elementx.text = "test" works, but as soon as i do elementx.text = "testç", my python program throws an error while building the tree:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 4, column 16
The thing is that while building the tree in code, i can find no place where i can reference one encoding system or other.
For completeness sake, i build a tree like this:
then i can
So this is something but not all !
Any suggestions ?
thx,
Paul
After a lot of reading an searching to no avail, this is my problem:
1. I can read & write xml files using ElementTree : no problems as long as i stick to "normal" chars. (ascii < 128)
2. Normally utf-8 is the standard encoding, i tested that in IDLE with: sys.getdefaultencoding() => it says "utf-8"
3. But now i try to sneek in a "French" char like so:
elementx.text = "test" works, but as soon as i do elementx.text = "testç", my python program throws an error while building the tree:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 4, column 16
The thing is that while building the tree in code, i can find no place where i can reference one encoding system or other.
For completeness sake, i build a tree like this:
yx = ET.Element(x) yxx = ET.SubElement(yx,xx) yxxx = ET.SubElement(yxx, xxx) yxxx.text = "testç"UPDATE : a new search suggested that i should install "unidecode" => import unidecode
then i can
dat = unidecode.unidecode("testç") yxxx.text = datNo more error thrown by python but it changes the string into "testc" . The c-cédille has gone!
So this is something but not all !
Any suggestions ?
thx,
Paul