Python Forum
Strange Characters in JSON returned string
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Strange Characters in JSON returned string
#1
Hi,

I'm getting some strange characters at the beginning of an string that I am setting up for a request payload.

In this example I am trying to read a text file containing one data item and then enumerate it.

with open('list.txt', 'r') as f:
             d = {k: v.strip() for (k, v) in enumerate(f, start=1)}
             data = json.dumps(d)
    
             with open('out.txt', 'w') as f_out:  
             f_out.write(data)
the resulting out.txt contains

{"1": "\u00ef\u00bb\u00bf10841911101489"}

the return should read :

{"1": "10841911101489"}

Can anyone help please?

Many thanks,

Fiorano
Reply
#2
hard to say without knowing what's in list.txt, can you provide that?
Reply
#3
It seems as though its due to the encoding of my source txt file.

By default it is output from my source app as a UTF-8 file. If I open and save this as an ANSI text file the process works. Is there anything I can do at the start of my python app to amend the encoding?

Thanks for your help
Reply
#4
You read a file, which has a UTF8 encoding with Byte Order Mark: EF BB BF

Use as encoding utf-8-sig when you read the file.
Then the BOM is stripped away.

The prefix \uxxxx is just a representation for Unicode code points in Json and also for Python.
What you see, are the first three bytes, which defines the Byte Order. Usually this is not used.
I guess you must seek for documents, which are still using the Byte Order Mark.
My code examples are always for Python >=3.6.0
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
(Dec-02-2019, 03:56 PM)DeaD_EyE Wrote: You read a file, which has a UTF8 encoding with Byte Order Mark: EF BB BF

Use as encoding utf-8-sig when you read the file.
Then the BOM is stripped away.

The prefix \uxxxx is just a representation for Unicode code points in Json and also for Python.
What you see, are the first three bytes, which defines the Byte Order. Usually this is not used.
I guess you must seek for documents, which are still using the Byte Order Mark.

Perfect Thank You!!!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question convert unlabeled list of tuples to json (string) masterAndreas 4 1,137 Apr-27-2021, 10:35 AM
Last Post: masterAndreas
  TypeError: __str__ returned non-string (type tuple) Anldra12 1 1,004 Apr-13-2021, 07:50 AM
Last Post: Anldra12
  Extract continuous numeric characters from a string in Python Robotguy 2 537 Jan-16-2021, 12:44 AM
Last Post: snippsat
  Convert string to JSON using a for loop PG_Breizh 3 663 Jan-08-2021, 06:10 PM
Last Post: PG_Breizh
  Python win32api keybd_event: How do I input a string of characters? JaneTan 3 703 Oct-19-2020, 04:16 AM
Last Post: deanhystad
  How to get first two characters in a string scratchmyhead 2 689 May-19-2020, 11:00 AM
Last Post: scratchmyhead
  TypeError: __repr__ returned non-string (type dict) shockwave 0 1,134 May-17-2020, 05:56 PM
Last Post: shockwave
  Remove escape characters / Unicode characters from string DreamingInsanity 5 3,088 May-15-2020, 01:37 PM
Last Post: snippsat
  Exception: Returned Type Mismatch Error devansing 1 1,764 Mar-06-2020, 07:26 PM
Last Post: ndc85430
  How to use a returned value? t4keheart 12 1,680 Jan-16-2020, 06:54 AM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020