Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
infinite JSON
#1
i want to parse an infinite stream of JSON. i also want to generate an infinite stream of JSON. this will be sent and received over a TCP connection (local to local for initial testing). i a m looking for code to parse incoming JSON. obviously it will need to return each piece of data as it gets parsed and not wait for end-of-file to return all the data as one big structure, suggestions? comments? code?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Don't do this. The task is interesting, but this format is the wrong to do this task and parsing is always a pain and parsing text is very slow.

Use a message queue.

There are many. If you use for example ZMQ, which has Python bindings, it's very short.
Here a example from my memory:


# PUBLISHER
from itertools import count
from time import sleep
import zmq


context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind('tcp://127.0.0.1:5555')
for n in count(1):
    socket.send_multipart([b'frame', str(n).encode()])
    sleep(1)
# SUBSCRIBER
from time import sleep
import zmq


context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.subscribe(b'frame')
socket.connect('tcp://127.0.0.1:5555')
while True:
    typ, frame = socket.recv_multipart()
    print(frame.decode())
    sleep(1)
Instead you can pickle/dump Python objects and send them over the message queue.
Publisher > Subscriber is only one pattern.

There are different approaches for all kind of tasks.
The connection is started inside the application. So zmq is not a server or framework, it's a library.

I use for example ZMQ to send two signals (I/Q) from a radar transceiver to another worker process.
Inside this sensor process i have a time critical read-loop.
Reading not fast enough will crash the program (buffer overflow in hardware.)
The receiving worker process distributes the data to the connected websocket clients.
At the same time the receiving worker process gets also data from a second process (flask app).

I don't use json for zmq, because it is for my task too slow and too much data.

This is only one example. There are many different solutions for it.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
i have no choice. the data is already in JSON format as an infinite stream. i just want to insert a program to modify it. if this was a non-infinite data stream and came to an end very soon then the JSON module in Python would be just right. but the data never comes to an end except under some failures or exceptions at the source (after possibly hundreds of billions of data items over months of operation). i was hoping one of the 100,000+ modules available to Python could do this. i have seen a function set in C++ for this. but it would be difficult for me since i have zero experience in C++ and at that level only in C.

the plan was to parse the incoming stream, watch for the condition and data i need to change, make the change if needed (not that often), then rebuild the stream. compression is also involved, but that should not be hard to deal with.

there might also be a future need to split the data with some going to one "subscriber" and the rest going to another. so the ability to run 2 or more formatters at the same time (class instances or generators) would be a big plus (else i'll have to duplicate and filter).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
fugly!

why mess with such an elegant package.
You're thinking the json package is flawed, I suggest that perhaps your thinking if flawed.
Reply
#5
ijson, NAYA
Reply
#6
(Feb-18-2018, 03:04 AM)Larz60+ Wrote: fugly!

why mess with such an elegant package.
You're thinking the json package is flawed, I suggest that perhaps your thinking if flawed.

i'm not suggesting that, at all.

it just doesn't have methods for continuous streams of JSON. it only has methods for blocks of JSON, which have an end point. this is generally what you get in a file.

i'm just saying that the JSON library in Python doesn't have the interface (and almost certainly the underlying logic) for continuous stream JSON. for other kinds of JSON i would use what comes in Python, have used it, and have had no troubles with it.

XML has similar issues. processing continuous/infinite streams has quite different issues than processing a data block.

(Feb-18-2018, 03:33 AM)snippsat Wrote: ijson, NAYA

these do look interesting. both have an iterative interface, which seems to be the most sensible.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#7
I think that even with continuous streams, there has to be cutoff points.
take for example calls coming through fiber. In Call record processing, segments of a predetermined size
depending on traffic say once each hour. Those segments are processed in a continuous fashion but are divided
still static as far as the identification, accumulation and billing process. Like packages on a conveyor belt,
or packets on a network.
I guess that I'm just not seeing a use that hasn't already been addressed.
Reply
#8
well. there are cutoff points. but they may not be obvious. and you may need to test data components to find them.

i think it's just like any other infinite stream of data elements, but the chose to use JSON instead of perhaps XML. may JSON is not really suited for this and someone just forced it. or maybe it's something else that resembles JSON.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#9
it could be csv.dictwriter
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020