Python Forum
'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte
#1
Hi,

I'm using python 3.8.5 to call Jira Rest API to download all tickets information and save in a json file AllIssues.json. In order to get all fields, comments from a Jira ticket, I have to make separate Rest API calls for each ticket and get a json file, I save it in a temp file "tmp.json". I'll loop through all tickets in a project and append tmp.json to the AllIssues.json.

The content in tmp.json is displayed correctly in Notepad++ with accent mark but I got error for a specific ticket when appending its json to AllIssues.json, I get the error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte". Please advice.

#Call Jira Rest API to get a ticket's info and save in tmp.json

with open('tmp.json', encoding='utf-8') as file:
   issue = file.read()
f = open('AllIssue.jsons', "a", encoding="utf-8")
f.write(issue)

#some code
Error:

Error:
-------Successfully created the file C:\Users\OPSWAT\OneDrive - OPSWAT\Jira2SF_Migration\ConversionJsonToCSV\Jira_Data\IssuesFieldsComments_0-10.json IMP-225 IMP-228 IMP-229 Traceback (most recent call last): File "main_v6.py", line 61, in <module> GAI.getTicketFieldsAndComments(TicketKeysToProceed, ticketCommentJson_filePath) File "C:\Users\OPSWAT\OneDrive - OPSWAT\Jira2SF_Migration\Jira_v1\getAllIssues_v6.py", line 58, in getTicketFieldsAndComments issue = file.read() File "C:\Users\OPSWAT\AppData\Local\Programs\Python\Python38\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte
Thanks in advance,

Tien Tran
Reply
#2
Use the json module when working with json,write is called dump when using json.
So usually when call a API in Python then normal way is use Requests,and if need to to write to disk include the json module.
Example.
import json
import requests

time_line = requests.get('https://github.com/timeline.json')
response = time_line.json()
with open('time_line.json', 'w') as f:
    json.dump(response, f)
Back:
with open("time_line.json", "r") as read_json:
    time_line = json.load(read_json)

>>> time_line
{'documentation_url': 'https://docs.github.com/v3/activity/events/#list-public-events',
 'message': 'Hello there, wayfaring stranger. If you’re reading this then you '
            'probably didn’t see our blog post a couple of years back '
            'announcing that this API would go away: http://git.io/17AROg Fear '
            'not, you should be able to get what you need from the shiny new '
            'Events API instead.'}

>>> time_line['documentation_url']
'https://docs.github.com/v3/activity/events/#list-public-events'
Reply
#3
for item in file:
       f.writelines(str(json.dumps(item.text,ensure_ascii=False)))
Reply
#4
@cnull, this does not make any sense and almost certainly will produce error. @snippsat show the right way to read/write json
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
(Sep-18-2020, 06:49 AM)buran Wrote: @cnull, this does not make any sense and almost certainly will produce error. @snippsat show the right way to read/write json
A code I've used before and included in my notes."this does not make any sense" unreasonable according to what? I just wrote a different approach.I guess it's not a crime.
Reply
#6
1. In the original code that OP show, they already read the whole json file in memory. So file is at the end. If you try to add your code at that point - there is nothing to iterate over. So your loop will not execute at all.

2. if you ignore OP reading the whole file in memory, iterating over file will yield one char at a time and you will get AttributeError AttributeError: 'str' object has no attribute 'text'

You will get the same error, even if you managed to load json properly and get some iterator that yield other objects, not char. I guess you used something like this to iterate over objects that had .text property like html tags from BeautifulSoup. It's very likely that even if the object has text property you will get error when pass it to json.dumps.

All that said, json.dumps will produce str, so no need to cast to str explicitly. and if you write item by item (i.e. assuming item has text propery etc.) the resulting file will not be valid json.


I must also say that OP attempt to append JSON file to already existing file will not produce valid JSON either.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#7
ensure_ascii=False
this part will work for his.

text/str part if html/div/class... due to the title shot use.
Reply
#8
I told you what the multiple issues are with your code are and why I said it does not make any sense. It's irrelevant to OP problem. You can continue to believe whatever you want.
And please, don't write all posts in bold.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#9
(Sep-18-2020, 07:43 AM)buran Wrote: I told you what the multiple issues are with your code are and why I said it does not make any sense. You can continue to believe whatever you want.
And please, don't write all posts in bold.
I don't want to be your interlocutor anymore.
Reply
#10
Hi Buran and Snippsat,

Thanks for your advises and quick reply. Actually, I am using json module to working with json. I used json.dumps but it didn't work, the non-ascii isn't displayed correctly.

This is my previous code. I realized the json I got from Jira Rest API is correct but json.dumps() makes it incorrect.

def getTicketFieldsAndComments(TicketKeys, filepath):
    
    totalTickets = len(TicketKeys)
    f = Util.createFile(filepath)
    f.write('{ "total":' + str(totalTickets) + ', ')
    f.write('"smallest":"' + TicketKeys[0]  + '", ')
    f.write('"largest":"' + TicketKeys[totalTickets-1] + '", ')
    
    f.write('"issues":[')

    lastIssue =  totalTickets - 1
    for i, ticketNumber in enumerate(TicketKeys):
        
        issue = json.loads(subprocess.check_output('java -jar OAuthTutorialClient-1.0.jar request "https://impulsepoint.atlassian.net/rest/api/latest/issue/' + ticketNumber + '"' , shell=True, encoding="437"))
               
        if i == lastIssue:
            f.write(json.dumps(issue))             
        else:
            f.write(json.dumps(issue) + ',')    

    f.write(']}') 
   
    f.close()    
Cnull, I also tried to use ensure_ascii=False and it didn't work either

def getTicketFieldsAndComments(TicketKeys, filepath):
    
    totalTickets = len(TicketKeys)
    f = Util.createFile(filepath)
    f.write('{ "total":' + str(totalTickets) + ', ')
    f.write('"smallest":"' + TicketKeys[0]  + '", ')
    f.write('"largest":"' + TicketKeys[totalTickets-1] + '", ')
    
    f.write('"issues":[')

    lastIssue =  totalTickets - 1
    for i, ticketNumber in enumerate(TicketKeys):
        
        issue = json.loads(subprocess.check_output('java -jar OAuthTutorialClient-1.0.jar request "https://impulsepoint.atlassian.net/rest/api/latest/issue/' + ticketNumber + '"' , shell=True, encoding="437"))
               
        if i == lastIssue:            
            json_string = json.dumps(issue, ensure_ascii=False).encode('utf-8')
            f.write(json_string.decode())
           # f.write(json.dumps(issue))             
        else:
            json_string = json.dumps(issue, ensure_ascii=False).encode('utf-8')
            f.write(json_string.decode())
            f.write(",")
            #f.write(json.dumps(issue) + ',')    

    f.write(']}') 
   
    f.close()    
And this is the location in tmp.json which threw the exception and error
 "body": "Ashley Tarloski <[email protected]> commented:\n\nHi Dan,\n\nIxE2\u20acx99m on the phone with support now.\n\nThanks,\nAshley",
          "updated": "2017-08-03T13:58:41.682-0400"
The correct text should be

Ashley Tarloski <[email protected]> commented:\n\nHi Dan,\n\nI'm on the phone with support now.\n\nThanks,\nAshley
Regards,
Tien
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Search for multiple unknown 3 (2) Byte combinations in a file. lastyle 7 1,256 Aug-14-2023, 02:28 AM
Last Post: deanhystad
Question UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 562: ord ctrldan 23 4,599 Apr-24-2023, 03:40 PM
Last Post: ctrldan
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 16: invalid cont Melcu54 3 4,695 Mar-26-2023, 12:12 PM
Last Post: Gribouillis
  Decode string ? JohnnyCoffee 1 785 Jan-11-2023, 12:29 AM
Last Post: bowlofred
  extract only text strip byte array Pir8Radio 7 2,788 Nov-29-2022, 10:24 PM
Last Post: Pir8Radio
  [SOLVED] [Debian] UnicodeEncodeError: 'ascii' codec Winfried 1 988 Nov-16-2022, 11:41 AM
Last Post: Winfried
  sending byte in code? korenron 2 1,087 Oct-30-2022, 01:14 PM
Last Post: korenron
  UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character Melcu54 7 18,298 Sep-26-2022, 10:09 AM
Last Post: Melcu54
  Byte Error when working with APIs Oshadha 2 980 Jul-05-2022, 05:23 AM
Last Post: deanhystad
  UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in position 14: ordin Armandito 6 2,641 Apr-29-2022, 12:36 PM
Last Post: Armandito

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020