Python Forum
python charmap codec can't decode byte X in position Y character maps to < undefined>
Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python charmap codec can't decode byte X in position Y character maps to < undefined>
#1
I'm experimenting with python libraries for data analysis,the problem i'm facing is this exception

Quote:UnicodeDecodeError was unhandled by user code Message: 'charmap' codec can't decode byte 0x81 in position 165: character maps to < undefined>

I have looked into answers with similar issues and the original poster seems to be either reading text with different encoding or printing it.

In my code the error shows up at import statement,that's what confuses me.
[Image: 4ZINb.png]


I'm using python 64 bit 3.3 on Visual Studio 2015 and geotext is the library where it shows the error.

Kindly point as to where to look to deal with this error.
Reply
#2
Please post real code (within code tags)
and real error traceback
Looking at geotext, it shows that you have to provide it
with details of text you want to use. Simply importing the
modules executes nothing
Example:
from geotext import GeoText

places = GeoText("London is a great city")
places.cities
Reply
#3
Have you tried running it from the command line, to make sure it isn't something weird with the python visual studio plugin?
Reply
#4
I tried to install and run this module.
I can't even import it without error

Error:
File "C:\Python36\lib\encodings\cp1252.py", line 23, in decode     return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 165: character maps to <undefined>
Reply
#5
Testing and fixing his code.
Here the whole run,using virtual environment.
C:\Python36
λ python -m venv geo_test

C:\Python36
λ cd geo_test

C:\Python36\geo_test
λ ls
Include/  Lib/  Scripts/  pyvenv.cfg

C:\Python36\geo_test
λ c:\python36\geo_test\Scripts\activate.bat
(geo_test) C:\Python36\geo_test
λ pip -V
pip 9.0.1 from c:\python36\geo_test\lib\site-packages (python 3.6)

(geo_test) C:\Python36\geo_test
λ pip install geotext-0.3.0-py2.py3-none-any.whl
Processing c:\python36\geo_test\geotext-0.3.0-py2.py3-none-any.whl
Installing collected packages: geotext
Successfully installed geotext-0.3.0

(geo_test) C:\Python36\geo_test
λ python
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from geotext import GeoText
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Python36\geo_test\lib\site-packages\geotext\__init__.py", line 7, in <module>
   from .geotext import GeoText
 File "C:\Python36\geo_test\lib\site-packages\geotext\geotext.py", line 87, in <module>
   class GeoText(object):
 File "C:\Python36\geo_test\lib\site-packages\geotext\geotext.py", line 103, in GeoText
   index = build_index()
 File "C:\Python36\geo_test\lib\site-packages\geotext\geotext.py", line 77, in build_index
   cities = read_table(get_data_path('cities15000.txt'), usecols=[1, 8])
 File "C:\Python36\geo_test\lib\site-packages\geotext\geotext.py", line 54, in read_table
   for line in lines:
 File "C:\Python36\geo_test\lib\site-packages\geotext\geotext.py", line 51, in <genexpr>
   lines = (line for line in f if not line.startswith(comment))
 File "C:\Python36\lib\encodings\cp1252.py", line 23, in decode
   return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 165: character maps to <undefined>
Fix:
Line 45 geotext.py:
with open(filename, 'r') as f:
To:
with open(filename, 'r', encoding='utf-8') as f:
Test.
(geo_test) C:\Python36\geo_test
λ python
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from geotext import GeoText
>>> places = GeoText("London is a great city")
>>> places.cities
['London']

>>> GeoText('New York, Texas, and also China').country_mentions
OrderedDict([('US', 2), ('CN', 1)])

>>> places = GeoText("Oslo is a great city")
>>> places.cities
['Oslo']
Edit:
There where 1-bug rapport about this.
So i have given this info to author of geotext on GitHub.
Reply
#6
Here is the code:

# --- pythonTwitterTest.py --- The main client module which is using all other code
import tweepy
import json
from TwitterPoint.TwitterEngine import *
from dbLayer.dbHelper import rHelper
from process.processData import *

processAndCleanData('trumpFollowers')

print('breakpoint here')
# --- processData.py ---
from geotext import GeoText # for classifying and seperating City , Country and States/Provinces
#import pycountry # for en to English
import simplejson
#import babelfish
from dbLayer.dbHelper import rHelper
from langcodes import Language

def getLocations(array):
    return []

def processAndCleanData(tableName):    
    rdb = rHelper('tw') #gets data from rethinkdb (in json format)
    rawCursor = rdb.getRawDumpCursor(tableName)
    cleanData = []
    langPyCountry = ''
    for raw in rawCursor:
        langPyCountry = Language.get(raw['lang']).language_name()
        places = GeoText(x.location)

        cleanData.append({"id": raw['id_str'],
                          "name": raw['name'],
                          "location": raw['location'],
                          "locationExtract" : places.[color=#000000]cities,[/color]

                          "language": raw['lang'],
                          "LangExtract": langPyCountry})
    dumpToFile(cleanData)

def dumpToFile(array):
    f = open('dump.txt', 'w')
    simplejson.dump(array, f)
    f.close()
Quote:Traceback (most recent call last):
  File "pythonTwitterTest.py", line 5, in <module>
    from process.processData import *
  File "C:\OwaisWorkx\Courses\5th Semester\Project\pythonTwitterTest\pythonTwitterTest\process\processData.py", line 1, in <module>
    from geotext import GeoText # for classifying and seperating City , Country and States/Provinces
  File "c:\Python33\lib\site-packages\geotext\__init__.py", line 7, in <module>
    from .geotext import GeoText
  File "c:\Python33\lib\site-packages\geotext\geotext.py", line 87, in <module>
    class GeoText(object):
  File "c:\Python33\lib\site-packages\geotext\geotext.py", line 103, in GeoText
    index = build_index()
  File "c:\Python33\lib\site-packages\geotext\geotext.py", line 77, in build_index
    cities = read_table(get_data_path('cities15000.txt'), usecols=[1, 8])
  File "c:\Python33\lib\site-packages\geotext\geotext.py", line 54, in read_table
    for line in lines:
  File "c:\Python33\lib\site-packages\geotext\geotext.py", line 51, in <genexpr>
    lines = (line for line in f if not line.startswith(comment))
  File "c:\Python33\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 165: character maps to <undefined>

(Mar-21-2017, 10:18 PM)snippsat Wrote: Testing and fixing his code.

Fix:
Line 45 geotext.py:
with open(filename, 'r') as f:
To:
with open(filename, 'r', encoding='utf-8') as f:
Test.
(geo_test) C:\Python36\geo_test
λ python
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from geotext import GeoText
>>> places = GeoText("London is a great city")
>>> places.cities
['London']

>>> GeoText('New York, Texas, and also China').country_mentions
OrderedDict([('US', 2), ('CN', 1)])

>>> places = GeoText("Oslo is a great city")
>>> places.cities
['Oslo']
Edit:
There where 1-bug rapport about this.
So i have given this info to author of geotext on GitHub.

Thanks that solved the problem,this issue wasted few days.
Reply
#7
Welcome to the world of programming!
If I had a dollar for every time I took five hours to solve a ten minute problem
I'd be a very rich man!
Reply
#8
(Mar-22-2017, 03:10 AM)Larz60+ Wrote: Welcome to the world of programming!
If I had a dollar for every time I took five hours to solve a ten minute problem
I'd be a very rich man!

Thanks and much appreciated your help guys .
That problem wasted few days of mine I couldn't figure it out,I'm new to python programming,Although I have experience in static languages such as C# , Java and some javascript. Python has its own pitfalls and once you start to understand it,you can appreciate the way things are in python and you are open to a whole new world of possibilities.

Love and respect for you guys...Once again thanks  Smile
Check out my new question
Reply
#9
Hi there! I did as you said and changed this:

Quote:with open(filename, 'r') as f:
# skip initial lines
for _ in range(skip):
next(f)
To this:

Quote:with open(filename, 'rt', encoding=encoding) as f:
# skip initial lines
for _ in range(skip):
next(f)
I saved geotext.py and ran it and it worked, yet I still get the same error in example file when I try to run the following

Quote:from geotext import GeoText

places = GeoText("London is a great city")
places.cities

# should return "London"

Could you please help?
Reply
#10
Solution:
the module was installed in the 3.6.3 python folder while I was using the 3.6.4 shell. I changed the shell and the module worked! :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help with nested maps Unkovic 10 1,254 Nov-01-2023, 03:45 PM
Last Post: Unkovic
  Python rule about the space character surrounding the equal sign ineuw 10 1,518 Sep-21-2023, 09:17 AM
Last Post: ineuw
  How do I handle escape character in parameter arguments in Python? JKR 6 1,039 Sep-12-2023, 03:00 AM
Last Post: Apoed2023
Question UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 562: ord ctrldan 23 4,609 Apr-24-2023, 03:40 PM
Last Post: ctrldan
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 16: invalid cont Melcu54 3 4,702 Mar-26-2023, 12:12 PM
Last Post: Gribouillis
  Decode string ? JohnnyCoffee 1 790 Jan-11-2023, 12:29 AM
Last Post: bowlofred
  [SOLVED] [Debian] UnicodeEncodeError: 'ascii' codec Winfried 1 988 Nov-16-2022, 11:41 AM
Last Post: Winfried
  UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character Melcu54 7 18,310 Sep-26-2022, 10:09 AM
Last Post: Melcu54
  undefined function error JonWayn 5 1,400 Sep-11-2022, 03:38 AM
Last Post: JonWayn
  Undefined Led_Zeppelin 4 1,367 Aug-02-2022, 11:57 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020