Unicode lists

broodmdh · (This post was last modified: Jun-26-2020, 05:44 PM by broodmdh.)

I have a series of text files that appear to contain a list of unicode values. For example, one file will have the following:

[u'01', u'13', u'20', u'30', u'32', u'43']

I'm new to python, but this looks like a list of unicode strings. Ultimately I want to convert them to a list of int, but I'm assuming I have to convert them to a list of regular strings first. Is this a poor assumption?

I've tried converting to UTF8, but it isn't give me the result I'm expecting.

with open(filename, 'r') as txtFile:
  content = txtFile.readline()
  [x.encode('UTF8') for x in content]
  int_list = map(int, content)

This code throws the following error:

Error:
invalid literal for int() with base 10: '['

Can someone help me understand why, and what the correct way to go about this is?

I was getting this data through json, which apparently likes to return everything in unicode. I ended up converting the data immediately after retrieving it using a suggestion I found here. This gave me fewer problems further down the line.

**Gribouillis** · (This post was last modified: Jun-26-2020, 06:33 PM by Gribouillis.)

broodmdh Wrote:I'm new to python, but this looks like a list of unicode strings.

This is the way people wrote unicode literals in the great age of python 2. Today one simply writes '32' instead of u'32', although python still understands the latter. All the strings are unicode strings in today's python. You don't need to encode these strings before using them.

>>> int('32')
32
>>> int(u'32')
32
>>>

>>> import ast
>>> line = "[u'01', u'13', u'20', u'30', u'32', u'43']"
>>> [int(x) for x in ast.literal_eval(line)]
[1, 13, 20, 30, 32, 43]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Split dict of lists into smaller dicts of lists.	pcs3rd	3	2,368	Sep-19-2020, 09:12 AM Last Post: ibreeden
	sort lists of lists with multiple criteria: similar values need to be treated equal	stillsen	2	3,263	Mar-20-2019, 08:01 PM Last Post: stillsen
	clean unicode string to contain only characters from some unicode blocks	gmarcon	2	3,971	Nov-23-2018, 09:17 PM Last Post: Gribouillis

Unicode lists

User Panel Messages

Announcements