Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unicode lists
#1
I have a series of text files that appear to contain a list of unicode values. For example, one file will have the following:

[u'01', u'13', u'20', u'30', u'32', u'43']

I'm new to python, but this looks like a list of unicode strings. Ultimately I want to convert them to a list of int, but I'm assuming I have to convert them to a list of regular strings first. Is this a poor assumption?

I've tried converting to UTF8, but it isn't give me the result I'm expecting.
with open(filename, 'r') as txtFile:
  content = txtFile.readline()
  [x.encode('UTF8') for x in content]
  int_list = map(int, content)
This code throws the following error:

Error:
invalid literal for int() with base 10: '['
Can someone help me understand why, and what the correct way to go about this is?

I was getting this data through json, which apparently likes to return everything in unicode. I ended up converting the data immediately after retrieving it using a suggestion I found here. This gave me fewer problems further down the line.
Reply
#2
broodmdh Wrote:I'm new to python, but this looks like a list of unicode strings.
This is the way people wrote unicode literals in the great age of python 2. Today one simply writes '32' instead of u'32', although python still understands the latter. All the strings are unicode strings in today's python. You don't need to encode these strings before using them.
>>> int('32')
32
>>> int(u'32')
32
>>>
>>> import ast
>>> line = "[u'01', u'13', u'20', u'30', u'32', u'43']"
>>> [int(x) for x in ast.literal_eval(line)]
[1, 13, 20, 30, 32, 43]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Split dict of lists into smaller dicts of lists. pcs3rd 3 2,368 Sep-19-2020, 09:12 AM
Last Post: ibreeden
  sort lists of lists with multiple criteria: similar values need to be treated equal stillsen 2 3,263 Mar-20-2019, 08:01 PM
Last Post: stillsen
  clean unicode string to contain only characters from some unicode blocks gmarcon 2 3,971 Nov-23-2018, 09:17 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020