Python Forum
Strange Characters in JSON returned string
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Strange Characters in JSON returned string
#1
Hi,

I'm getting some strange characters at the beginning of an string that I am setting up for a request payload.

In this example I am trying to read a text file containing one data item and then enumerate it.

with open('list.txt', 'r') as f:
             d = {k: v.strip() for (k, v) in enumerate(f, start=1)}
             data = json.dumps(d)
    
             with open('out.txt', 'w') as f_out:  
             f_out.write(data)
the resulting out.txt contains

{"1": "\u00ef\u00bb\u00bf10841911101489"}

the return should read :

{"1": "10841911101489"}

Can anyone help please?

Many thanks,

Fiorano
Reply
#2
hard to say without knowing what's in list.txt, can you provide that?
Reply
#3
It seems as though its due to the encoding of my source txt file.

By default it is output from my source app as a UTF-8 file. If I open and save this as an ANSI text file the process works. Is there anything I can do at the start of my python app to amend the encoding?

Thanks for your help
Reply
#4
You read a file, which has a UTF8 encoding with Byte Order Mark: EF BB BF

Use as encoding utf-8-sig when you read the file.
Then the BOM is stripped away.

The prefix \uxxxx is just a representation for Unicode code points in Json and also for Python.
What you see, are the first three bytes, which defines the Byte Order. Usually this is not used.
I guess you must seek for documents, which are still using the Byte Order Mark.
My code examples are always for Python >=3.6.0
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
(Dec-02-2019, 03:56 PM)DeaD_EyE Wrote: You read a file, which has a UTF8 encoding with Byte Order Mark: EF BB BF

Use as encoding utf-8-sig when you read the file.
Then the BOM is stripped away.

The prefix \uxxxx is just a representation for Unicode code points in Json and also for Python.
What you see, are the first three bytes, which defines the Byte Order. Usually this is not used.
I guess you must seek for documents, which are still using the Byte Order Mark.

Perfect Thank You!!!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  TypeError: __str__ returned non-string (type tuple) Anldra12 1 144 Apr-13-2021, 07:50 AM
Last Post: Anldra12
  Extract continuous numeric characters from a string in Python Robotguy 2 283 Jan-16-2021, 12:44 AM
Last Post: snippsat
  Convert string to JSON using a for loop PG_Breizh 3 331 Jan-08-2021, 06:10 PM
Last Post: PG_Breizh
  Python win32api keybd_event: How do I input a string of characters? JaneTan 3 441 Oct-19-2020, 04:16 AM
Last Post: deanhystad
  How to get first two characters in a string scratchmyhead 2 541 May-19-2020, 11:00 AM
Last Post: scratchmyhead
  TypeError: __repr__ returned non-string (type dict) shockwave 0 868 May-17-2020, 05:56 PM
Last Post: shockwave
  Remove escape characters / Unicode characters from string DreamingInsanity 5 1,903 May-15-2020, 01:37 PM
Last Post: snippsat
  Exception: Returned Type Mismatch Error devansing 1 1,314 Mar-06-2020, 07:26 PM
Last Post: ndc85430
  How to use a returned value? t4keheart 12 1,355 Jan-16-2020, 06:54 AM
Last Post: perfringo
  Replacing characters in a string with a list cjms981 1 658 Dec-30-2019, 10:50 PM
Last Post: micseydel

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020