Python Forum
'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte
#1
Hi,

I'm using python 3.8.5 to call Jira Rest API to download all tickets information and save in a json file AllIssues.json. In order to get all fields, comments from a Jira ticket, I have to make separate Rest API calls for each ticket and get a json file, I save it in a temp file "tmp.json". I'll loop through all tickets in a project and append tmp.json to the AllIssues.json.

The content in tmp.json is displayed correctly in Notepad++ with accent mark but I got error for a specific ticket when appending its json to AllIssues.json, I get the error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte". Please advice.

#Call Jira Rest API to get a ticket's info and save in tmp.json

with open('tmp.json', encoding='utf-8') as file:
   issue = file.read()
f = open('AllIssue.jsons', "a", encoding="utf-8")
f.write(issue)

#some code
Error:

Error:
-------Successfully created the file C:\Users\OPSWAT\OneDrive - OPSWAT\Jira2SF_Migration\ConversionJsonToCSV\Jira_Data\IssuesFieldsComments_0-10.json IMP-225 IMP-228 IMP-229 Traceback (most recent call last): File "main_v6.py", line 61, in <module> GAI.getTicketFieldsAndComments(TicketKeysToProceed, ticketCommentJson_filePath) File "C:\Users\OPSWAT\OneDrive - OPSWAT\Jira2SF_Migration\Jira_v1\getAllIssues_v6.py", line 58, in getTicketFieldsAndComments issue = file.read() File "C:\Users\OPSWAT\AppData\Local\Programs\Python\Python38\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte
Thanks in advance,

Tien Tran
Reply
#2
Use the json module when working with json,write is called dump when using json.
So usually when call a API in Python then normal way is use Requests,and if need to to write to disk include the json module.
Example.
import json
import requests

time_line = requests.get('https://github.com/timeline.json')
response = time_line.json()
with open('time_line.json', 'w') as f:
    json.dump(response, f)
Back:
with open("time_line.json", "r") as read_json:
    time_line = json.load(read_json)

>>> time_line
{'documentation_url': 'https://docs.github.com/v3/activity/events/#list-public-events',
 'message': 'Hello there, wayfaring stranger. If you’re reading this then you '
            'probably didn’t see our blog post a couple of years back '
            'announcing that this API would go away: http://git.io/17AROg Fear '
            'not, you should be able to get what you need from the shiny new '
            'Events API instead.'}

>>> time_line['documentation_url']
'https://docs.github.com/v3/activity/events/#list-public-events'
Reply
#3
for item in file:
       f.writelines(str(json.dumps(item.text,ensure_ascii=False)))
Reply
#4
@cnull, this does not make any sense and almost certainly will produce error. @snippsat show the right way to read/write json
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
(Sep-18-2020, 06:49 AM)buran Wrote: @cnull, this does not make any sense and almost certainly will produce error. @snippsat show the right way to read/write json
A code I've used before and included in my notes."this does not make any sense" unreasonable according to what? I just wrote a different approach.I guess it's not a crime.
Reply
#6
1. In the original code that OP show, they already read the whole json file in memory. So file is at the end. If you try to add your code at that point - there is nothing to iterate over. So your loop will not execute at all.

2. if you ignore OP reading the whole file in memory, iterating over file will yield one char at a time and you will get AttributeError AttributeError: 'str' object has no attribute 'text'

You will get the same error, even if you managed to load json properly and get some iterator that yield other objects, not char. I guess you used something like this to iterate over objects that had .text property like html tags from BeautifulSoup. It's very likely that even if the object has text property you will get error when pass it to json.dumps.

All that said, json.dumps will produce str, so no need to cast to str explicitly. and if you write item by item (i.e. assuming item has text propery etc.) the resulting file will not be valid json.


I must also say that OP attempt to append JSON file to already existing file will not produce valid JSON either.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#7
ensure_ascii=False
this part will work for his.

text/str part if html/div/class... due to the title shot use.
Reply
#8
I told you what the multiple issues are with your code are and why I said it does not make any sense. It's irrelevant to OP problem. You can continue to believe whatever you want.
And please, don't write all posts in bold.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#9
(Sep-18-2020, 07:43 AM)buran Wrote: I told you what the multiple issues are with your code are and why I said it does not make any sense. You can continue to believe whatever you want.
And please, don't write all posts in bold.
I don't want to be your interlocutor anymore.
Reply
#10
Hi Buran and Snippsat,

Thanks for your advises and quick reply. Actually, I am using json module to working with json. I used json.dumps but it didn't work, the non-ascii isn't displayed correctly.

This is my previous code. I realized the json I got from Jira Rest API is correct but json.dumps() makes it incorrect.

def getTicketFieldsAndComments(TicketKeys, filepath):
    
    totalTickets = len(TicketKeys)
    f = Util.createFile(filepath)
    f.write('{ "total":' + str(totalTickets) + ', ')
    f.write('"smallest":"' + TicketKeys[0]  + '", ')
    f.write('"largest":"' + TicketKeys[totalTickets-1] + '", ')
    
    f.write('"issues":[')

    lastIssue =  totalTickets - 1
    for i, ticketNumber in enumerate(TicketKeys):
        
        issue = json.loads(subprocess.check_output('java -jar OAuthTutorialClient-1.0.jar request "https://impulsepoint.atlassian.net/rest/api/latest/issue/' + ticketNumber + '"' , shell=True, encoding="437"))
               
        if i == lastIssue:
            f.write(json.dumps(issue))             
        else:
            f.write(json.dumps(issue) + ',')    

    f.write(']}') 
   
    f.close()    
Cnull, I also tried to use ensure_ascii=False and it didn't work either

def getTicketFieldsAndComments(TicketKeys, filepath):
    
    totalTickets = len(TicketKeys)
    f = Util.createFile(filepath)
    f.write('{ "total":' + str(totalTickets) + ', ')
    f.write('"smallest":"' + TicketKeys[0]  + '", ')
    f.write('"largest":"' + TicketKeys[totalTickets-1] + '", ')
    
    f.write('"issues":[')

    lastIssue =  totalTickets - 1
    for i, ticketNumber in enumerate(TicketKeys):
        
        issue = json.loads(subprocess.check_output('java -jar OAuthTutorialClient-1.0.jar request "https://impulsepoint.atlassian.net/rest/api/latest/issue/' + ticketNumber + '"' , shell=True, encoding="437"))
               
        if i == lastIssue:            
            json_string = json.dumps(issue, ensure_ascii=False).encode('utf-8')
            f.write(json_string.decode())
           # f.write(json.dumps(issue))             
        else:
            json_string = json.dumps(issue, ensure_ascii=False).encode('utf-8')
            f.write(json_string.decode())
            f.write(",")
            #f.write(json.dumps(issue) + ',')    

    f.write(']}') 
   
    f.close()    
And this is the location in tmp.json which threw the exception and error
 "body": "Ashley Tarloski <[email protected]> commented:\n\nHi Dan,\n\nIxE2\u20acx99m on the phone with support now.\n\nThanks,\nAshley",
          "updated": "2017-08-03T13:58:41.682-0400"
The correct text should be

Ashley Tarloski <[email protected]> commented:\n\nHi Dan,\n\nI'm on the phone with support now.\n\nThanks,\nAshley
Regards,
Tien
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to do line continuation in Jupyter Notebook? Mark17 4 188 Sep-22-2021, 04:22 PM
Last Post: ibreeden
  ASCII-Codec in Python3 [SOLVED] AlphaInc 4 461 Jul-07-2021, 07:05 PM
Last Post: AlphaInc
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 error from Mysql call AkaAndrew123 1 817 Apr-28-2021, 08:16 AM
Last Post: AkaAndrew123
  How to understand the byte notation in python3 blackknite 3 584 Feb-23-2021, 04:45 PM
Last Post: bowlofred
  codec for byte transparency Skaperen 7 1,443 Sep-25-2020, 02:20 AM
Last Post: Skaperen
  how to encode and decode same value absolut 2 775 Sep-08-2020, 09:46 AM
Last Post: TomToad
  How do I write a single 8-bit byte to a file? MysticLord 2 995 Sep-03-2020, 12:27 PM
Last Post: MysticLord
  convert array of numbers to byte array adetheheat 3 904 Aug-13-2020, 05:09 PM
Last Post: bowlofred
  'charmap' codec louis216 4 4,365 Jun-30-2020, 06:25 AM
Last Post: louis216
  TypeError: ENCODE Method, str instead of byte Rajath 1 1,513 May-09-2020, 06:05 PM
Last Post: bowlofred

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020