'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte

tienttt · Sep-17-2020, 10:27 PM

Hi,

I'm using python 3.8.5 to call Jira Rest API to download all tickets information and save in a json file AllIssues.json. In order to get all fields, comments from a Jira ticket, I have to make separate Rest API calls for each ticket and get a json file, I save it in a temp file "tmp.json". I'll loop through all tickets in a project and append tmp.json to the AllIssues.json.

The content in tmp.json is displayed correctly in Notepad++ with accent mark but I got error for a specific ticket when appending its json to AllIssues.json, I get the error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte". Please advice.

#Call Jira Rest API to get a ticket's info and save in tmp.json

with open('tmp.json', encoding='utf-8') as file:
   issue = file.read()
f = open('AllIssue.jsons', "a", encoding="utf-8")
f.write(issue)

#some code

Error:

Error:-------Successfully created the file  C:\Users\OPSWAT\OneDrive - OPSWAT\Jira2SF_Migration\ConversionJsonToCSV\Jira_Data\IssuesFieldsComments_0-10.json
IMP-225
IMP-228
IMP-229
Traceback (most recent call last):
  File "main_v6.py", line 61, in <module>
    GAI.getTicketFieldsAndComments(TicketKeysToProceed, ticketCommentJson_filePath)
  File "C:\Users\OPSWAT\OneDrive - OPSWAT\Jira2SF_Migration\Jira_v1\getAllIssues_v6.py", line 58, in getTicketFieldsAndComments
    issue = file.read()
  File "C:\Users\OPSWAT\AppData\Local\Programs\Python\Python38\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte

Thanks in advance,

Tien Tran

***snippsat*** · (This post was last modified: Sep-17-2020, 11:22 PM by snippsat.)

Use the json module when working with json,write is called dump when using json.
So usually when call a API in Python then normal way is use Requests,and if need to to write to disk include the json module.
Example.

import json
import requests

time_line = requests.get('https://github.com/timeline.json')
response = time_line.json()
with open('time_line.json', 'w') as f:
    json.dump(response, f)

Back:

with open("time_line.json", "r") as read_json:
    time_line = json.load(read_json)

>>> time_line
{'documentation_url': 'https://docs.github.com/v3/activity/events/#list-public-events',
 'message': 'Hello there, wayfaring stranger. If you’re reading this then you '
            'probably didn’t see our blog post a couple of years back '
            'announcing that this API would go away: http://git.io/17AROg Fear '
            'not, you should be able to get what you need from the shiny new '
            'Events API instead.'}

>>> time_line['documentation_url']
'https://docs.github.com/v3/activity/events/#list-public-events'

cnull · Sep-18-2020, 06:43 AM

for item in file:
       f.writelines(str(json.dumps(item.text,ensure_ascii=False)))

**buran** · (This post was last modified: Sep-18-2020, 06:49 AM by buran.)

@cnull, this does not make any sense and almost certainly will produce error. @snippsat show the right way to read/write json

cnull · Sep-18-2020, 06:59 AM

(Sep-18-2020, 06:49 AM)buran Wrote: @cnull, this does not make any sense and almost certainly will produce error. @snippsat show the right way to read/write json

A code I've used before and included in my notes."this does not make any sense" unreasonable according to what? I just wrote a different approach.I guess it's not a crime.

**buran** · (This post was last modified: Sep-18-2020, 07:22 AM by buran.)

1. In the original code that OP show, they already read the whole json file in memory. So file is at the end. If you try to add your code at that point - there is nothing to iterate over. So your loop will not execute at all.

2. if you ignore OP reading the whole file in memory, iterating over file will yield one char at a time and you will get AttributeError AttributeError: 'str' object has no attribute 'text'

You will get the same error, even if you managed to load json properly and get some iterator that yield other objects, not char. I guess you used something like this to iterate over objects that had .text property like html tags from BeautifulSoup. It's very likely that even if the object has text property you will get error when pass it to json.dumps.

All that said, json.dumps will produce str, so no need to cast to str explicitly. and if you write item by item (i.e. assuming item has text propery etc.) the resulting file will not be valid json.

I must also say that OP attempt to append JSON file to already existing file will not produce valid JSON either.

cnull · Sep-18-2020, 07:38 AM

ensure_ascii=False

this part will work for his.

text/str part if html/div/class... due to the title shot use.

**buran** · (This post was last modified: Sep-18-2020, 07:44 AM by buran.)

I told you what the multiple issues are with your code are and why I said it does not make any sense. It's irrelevant to OP problem. You can continue to believe whatever you want.
And please, don't write all posts in bold.

cnull · Sep-18-2020, 07:49 AM

(Sep-18-2020, 07:43 AM)buran Wrote: I told you what the multiple issues are with your code are and why I said it does not make any sense. You can continue to believe whatever you want.
And please, don't write all posts in bold.

I don't want to be your interlocutor anymore.

tienttt · (This post was last modified: Sep-18-2020, 03:25 PM by tienttt.)

Hi Buran and Snippsat,

Thanks for your advises and quick reply. Actually, I am using json module to working with json. I used json.dumps but it didn't work, the non-ascii isn't displayed correctly.

This is my previous code. I realized the json I got from Jira Rest API is correct but json.dumps() makes it incorrect.

def getTicketFieldsAndComments(TicketKeys, filepath):
    
    totalTickets = len(TicketKeys)
    f = Util.createFile(filepath)
    f.write('{ "total":' + str(totalTickets) + ', ')
    f.write('"smallest":"' + TicketKeys[0]  + '", ')
    f.write('"largest":"' + TicketKeys[totalTickets-1] + '", ')
    
    f.write('"issues":[')

    lastIssue =  totalTickets - 1
    for i, ticketNumber in enumerate(TicketKeys):
        
        issue = json.loads(subprocess.check_output('java -jar OAuthTutorialClient-1.0.jar request "https://impulsepoint.atlassian.net/rest/api/latest/issue/' + ticketNumber + '"' , shell=True, encoding="437"))
               
        if i == lastIssue:
            f.write(json.dumps(issue))             
        else:
            f.write(json.dumps(issue) + ',')    

    f.write(']}') 
   
    f.close()

Cnull, I also tried to use ensure_ascii=False and it didn't work either

def getTicketFieldsAndComments(TicketKeys, filepath):
    
    totalTickets = len(TicketKeys)
    f = Util.createFile(filepath)
    f.write('{ "total":' + str(totalTickets) + ', ')
    f.write('"smallest":"' + TicketKeys[0]  + '", ')
    f.write('"largest":"' + TicketKeys[totalTickets-1] + '", ')
    
    f.write('"issues":[')

    lastIssue =  totalTickets - 1
    for i, ticketNumber in enumerate(TicketKeys):
        
        issue = json.loads(subprocess.check_output('java -jar OAuthTutorialClient-1.0.jar request "https://impulsepoint.atlassian.net/rest/api/latest/issue/' + ticketNumber + '"' , shell=True, encoding="437"))
               
        if i == lastIssue:            
            json_string = json.dumps(issue, ensure_ascii=False).encode('utf-8')
            f.write(json_string.decode())
           # f.write(json.dumps(issue))             
        else:
            json_string = json.dumps(issue, ensure_ascii=False).encode('utf-8')
            f.write(json_string.decode())
            f.write(",")
            #f.write(json.dumps(issue) + ',')    

    f.write(']}') 
   
    f.close()

And this is the location in tmp.json which threw the exception and error

 "body": "Ashley Tarloski <[email protected]> commented:\n\nHi Dan,\n\nIxE2\u20acx99m on the phone with support now.\n\nThanks,\nAshley",
          "updated": "2017-08-03T13:58:41.682-0400"

The correct text should be

Ashley Tarloski <[email protected]> commented:\n\nHi Dan,\n\nI'm on the phone with support now.\n\nThanks,\nAshley

Regards,
Tien

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Search for multiple unknown 3 (2) Byte combinations in a file.	lastyle	7	3,141	Aug-14-2023, 02:28 AM Last Post: deanhystad
	UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 562: ord	ctrldan	23	9,200	Apr-24-2023, 03:40 PM Last Post: ctrldan
	UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 16: invalid cont	Melcu54	3	10,680	Mar-26-2023, 12:12 PM Last Post: Gribouillis
	Decode string ?	JohnnyCoffee	1	1,400	Jan-11-2023, 12:29 AM Last Post: bowlofred
	extract only text strip byte array	Pir8Radio	7	6,776	Nov-29-2022, 10:24 PM Last Post: Pir8Radio
	[SOLVED] [Debian] UnicodeEncodeError: 'ascii' codec	Winfried	1	1,626	Nov-16-2022, 11:41 AM Last Post: Winfried
	sending byte in code?	korenron	2	1,838	Oct-30-2022, 01:14 PM Last Post: korenron
	UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character	Melcu54	7	28,689	Sep-26-2022, 10:09 AM Last Post: Melcu54
	Byte Error when working with APIs	Oshadha	2	1,617	Jul-05-2022, 05:23 AM Last Post: deanhystad
	UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in position 14: ordin	Armandito	6	4,260	Apr-29-2022, 12:36 PM Last Post: Armandito

'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte

User Panel Messages

Announcements