Python Forum
FCC API: addressing unexpected error
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
FCC API: addressing unexpected error
#1
Hello!

My goal is to convert a number of latitudes and longitudes to Federal Information Processing Standard (FIPS) codes. To do so, I am using the following code, which I successfully tested on my computer using small (1000 rows) and large (100,000 rows) pieces of data:

from fcc.census_block_conversions import census_block_fips /// https://www.fcc.gov/general/census-block-conversions-api 
df['fips'] = df.apply(lambda row: census_block_fips(row['latitude'], row['longitude']), axis=1)

df.to_csv('df.csv')
Since my task involves conversion of almost 20 Million (!) rows, I am using Ubuntu EC2 instance with high network connectivity to run the code below. (I did install FCC API using -sudo apt-get install fcc-) However, after some time running, the code gives me the following error:

Error:
ubuntu@euca-100-00-00-00:~/Twitter$ python 5_df__cancer_14.py WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 1 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 6 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 36 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 1 seconds. ((32.35204448, -106.77485461, None, 'json', None), {})                                                                 WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 6 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 36 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) Traceback (most recent call last):  File "5_df__cancer_14.py", line 10, in <module>    df['fips'] = df.apply(lambda row: census_block_fips(row['latitude'], row['longitude']), axis=1)  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 3972, in apply    return self._apply_standard(f, axis, reduce=reduce)  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 4064, in _apply_standard    results[i] = func(v)  File "5_df__cancer_14.py", line 10, in <lambda>    df['fips'] = df.apply(lambda row: census_block_fips(row['latitude'], row['longitude']), axis=1)  File "/home/ubuntu/.local/lib/python2.7/site-packages/fcc/census_block_conversions.py", line 48, in census_block_fips    return census_block_dict(latitude, longitude, year)['Block']['FIPS']  File "/home/ubuntu/.local/lib/python2.7/site-packages/fcc/census_block_conversions.py", line 39, in census_block_dict    raise e UnicodeEncodeError: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128)
Could you please help me determine the problem and also suggest if there is a way to "safeguard" from it (e.g., just drop "problematic" values if they occur). Thank you in advance!
Reply
#2
Also, I'm currently trying to process 100,000 rows chunks and one of them has resulted in the following error:

Error:
--------------------------------------------------------------------------- JSONDecodeError Traceback (most recent call last) <ipython-input-3-8ead2ace4c8c> in <module>() 1 from fcc.census_block_conversions import census_block_fips ----> 2 df3['fips'] = df3.apply(lambda row: census_block_fips(row['latitude'], row['longitude']), axis=1) 3 4 df3.to_csv('/Users/mymac/Documents/UB/Twitter/MAIN_DATA_TW/13_14_15_TW_CSV/df3.csv') /Users/mymac/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds) 4150 if reduce is None: 4151 reduce = True -> 4152 return self._apply_standard(f, axis, reduce=reduce) 4153 else: 4154 return self._apply_broadcast(f, axis) /Users/mymac/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce) 4246 try: 4247 for i, v in enumerate(series_gen): -> 4248 results[i] = func(v) 4249 keys.append(v.name) 4250 except Exception as e: <ipython-input-3-8ead2ace4c8c> in <lambda>(row) 1 from fcc.census_block_conversions import census_block_fips ----> 2 df3['fips'] = df3.apply(lambda row: census_block_fips(row['latitude'], row['longitude']), axis=1) 3 4 df3.to_csv('/Users/mymac/Documents/UB/Twitter/MAIN_DATA_TW/13_14_15_TW_CSV/df3.csv') /Users/mymac/anaconda/lib/python3.6/site-packages/fcc/census_block_conversions.py in census_block_fips(latitude, longitude, year) 46 ''' 47 try: ---> 48 return census_block_dict(latitude, longitude, year)['Block']['FIPS'] 49 except TypeError as e: 50 if str(e) == "'NoneType' object is not subscriptable": /Users/mymac/anaconda/lib/python3.6/site-packages/fcc/census_block_conversions.py in census_block_dict(latitude, longitude, year, showall) 37 logging.warning('{0}: {1}'.format(e, (latitude,longitude,year,showall))) 38 return None ---> 39 raise e 40 41 /Users/mymac/anaconda/lib/python3.6/site-packages/fcc/census_block_conversions.py in census_block_dict(latitude, longitude, year, showall) 32 '''Get the FCC API response as a Python dictionary.''' 33 try: ---> 34 return json.loads(census_block(latitude, longitude, year, 'json', showall)) 35 except ValueError as e: 36 if str(e) == 'No JSON object could be decoded': /Users/mymac/anaconda/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw) 352 parse_int is None and parse_float is None and 353 parse_constant is None and object_pairs_hook is None and not kw): --> 354 return _default_decoder.decode(s) 355 if cls is None: 356 cls = JSONDecoder /Users/mymac/anaconda/lib/python3.6/json/decoder.py in decode(self, s, _w) 337 338 """ --> 339 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 340 end = _w(s, end).end() 341 if end != len(s): /Users/mymac/anaconda/lib/python3.6/json/decoder.py in raw_decode(self, s, idx) 355 obj, end = self.scan_once(s, idx) 356 except StopIteration as err: --> 357 raise JSONDecodeError("Expecting value", s, err.value) from None 358 return obj, end JSONDecodeError: ('Expecting value: line 1 column 1 (char 0)', 'occurred at index 205093')
Reply
#3
(Jun-26-2017, 05:53 PM)kiton Wrote:
Error:
ubuntu@euca-100-00-00-00:~/Twitter$ python 5_df__cancer_14.py WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 1 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 6 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 36 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 1 seconds. ((32.35204448, -106.77485461, None, 'json', None), {})                                                                 WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 6 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) WARNING:root:Caught an Error: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128). Retrying in 36 seconds. ((32.35204448, -106.77485461, None, 'json', None), {}) Traceback (most recent call last):  File "5_df__cancer_14.py", line 10, in <module>    df['fips'] = df.apply(lambda row: census_block_fips(row['latitude'], row['longitude']), axis=1)  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 3972, in apply    return self._apply_standard(f, axis, reduce=reduce)  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 4064, in _apply_standard    results[i] = func(v)  File "5_df__cancer_14.py", line 10, in <lambda>    df['fips'] = df.apply(lambda row: census_block_fips(row['latitude'], row['longitude']), axis=1)  File "/home/ubuntu/.local/lib/python2.7/site-packages/fcc/census_block_conversions.py", line 48, in census_block_fips    return census_block_dict(latitude, longitude, year)['Block']['FIPS']  File "/home/ubuntu/.local/lib/python2.7/site-packages/fcc/census_block_conversions.py", line 39, in census_block_dict    raise e UnicodeEncodeError: 'ascii' codec can't encode characters in position 71-72: ordinal not in range(128)
Could you please help me determine the problem and also suggest if there is a way to "safeguard" from it (e.g., just drop "problematic" values if they occur). Thank you in advance!

Looks like your JSON contains invalid characters due to unescaped characters (if it's in the US look for Spanish names with a n-tilde) or just plain invalid characters due to transmission errors.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#4
Bingo!:

Trying this (this is your data above):
http://data.fcc.gov/api/block/find?format=json&latitude=32.35204448&longitude= -106.77485461&showall=true
Gives:
Output:
{"Block":{"FIPS":"350130001041007"},"County":{"FIPS":"35013","name":"Doña Ana"},"State":{"FIPS":"35","code":"NM","name":"New Mexico"},"status":"OK","executionTime":"1004"}
With the ñ conveniently in columns 72.
Looking at the JSON standard, Unicode characters are accepted, and the response seems to be proper UTF-8, so it would be your code that doesn't handle UTF-8 properly (assumes ASCII only?).
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#5
Thank you very much for clarification, Ofnuts. I do see the issue with encoding now. So, is there is a way to prevent such errors? May be load some UTF-8 codec upfront?
Reply
#6
Quote:from fcc.census_block_conversions import census_block_fips
Show your import code.

You are using python 2.7 and Python 3.6(anaconda).
With 3.6(anaconda):
Quote:JSONDecodeError: ('Expecting value: line 1 column 1 (char 0)'
This usually mean.
  • non-JSON conforming quoting
  • XML/HTML output (that is, a string starting with <), or
  • incompatible character encoding
Tells you that at the very first position the string already doesn't conform to JSON.

A test with Requests and Python 2.7.
Requests read it without Unicode error.
import requests
r = requests.get('http://data.fcc.gov/api/block/find?format=json&latitude=32.35204448&longitude= -106.77485461&showall=true')
Test:
>>> r.json()['County']
{u'FIPS': u'35013', u'name': u'Do\xf1a Ana'}
>>> print r.json()['County']['name']
Doña Ana
Reply
#7
snippsat, I appreciate your response. Here are some comments in relation to yours:

1. I am not sure how to show you the import code (since I never see it when importing). How can I do that?
2. Indeed, I prepared (and tested) the codes using python3; however, to run the codes from EC2 instance I am using -python- command, because otherwise the -from fcc.census_block_conversions import census_block_fips- does not work.
3. In an attempt to solve the unicode issue, I tried to convert my data frame to ASCII only using examples provided here (https://stackoverflow.com/questions/1977...python-3-3), but really nothing helped as the error remained there.
Reply
#8
So, I assume that the error might originate from the fact that FCC API is not "suitable" for python 3 (as it does not run with -python3-), but is ok for python2 (runs under -python-). Therefore, I (a) used 3to2 to convert my code written in python3 to python2, and (b) am trying to run that "converted" code now.

Update: the aforementioned approach did not work. It seems that the problem is beyond my ability to address it. Therefore, I am currently elaborating on another approach to complete the task. Details will be posted later.

I am thankful to all the contributors for their input.
Reply
#9
In Python2 You can convert a sequence of UTF-8 bytes to Unicode. See the codecs module. But your root problem seems to be that the EC2 instance hasn't got the fcc module installed for Python3, and this is what I would solve.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Addressing with np.where pberrett 0 1,556 May-19-2019, 12:45 PM
Last Post: pberrett

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020