Python Forum
validate large json file with millions of records in batches
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
validate large json file with millions of records in batches
#1
Greetings...

I'm working on validating large json file with millions of records in batches using jsonschema Draft202012Validator.

import json
from azure.storage.blob import BlobServiceClient
from azure.identity import DefaultAzureCredential

account = "myAccount"
container = "myContainer"
blob_name = "myBlob.json"

default_credential = DefaultAzureCredential()
blob_service_client = BlobServiceClient(account, credential=default_credential)
container_client = blob_service_client.get_container_client(container)
blob_client = container_client.get_blob_client(blob_name)
data = bytearray(blob_client.download_blob().readall())

    batch_size = 1000

    process = json.loads(data)

    for batch in [process[i:i+batch_size] for i in range(0, len(process), batch_size)]:        
    # process data
I'm running into convert object of type bytearray to string JSON serializable error.
Reply
#2
What type of data is coming from the .readall()? Does it look like ascii?

By default loads() will try to decode the data as UTF-8, but you can force a different decode if that is better.
Reply
#3
(Dec-10-2022, 09:18 PM)bowlofred Wrote: What type of data is coming from the .readall()? Does it look like ascii?

By default loads() will try to decode the data as UTF-8, but you can force a different decode if that is better.

Hello, The data is actually UTF-8.
Reply
#4
What's the error you're getting?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  encrypt data in json file help jacksfrustration 1 230 Mar-28-2024, 05:16 PM
Last Post: deanhystad
  Parsing large JSON josvink66 5 671 Jan-10-2024, 05:46 PM
Last Post: snippsat
  parse json field from csv file lebossejames 4 769 Nov-14-2023, 11:34 PM
Last Post: snippsat
  Python Script to convert Json to CSV file chvsnarayana 8 2,546 Apr-26-2023, 10:31 PM
Last Post: DeaD_EyE
  Loop through json file and reset values [SOLVED] AlphaInc 2 2,150 Apr-06-2023, 11:15 AM
Last Post: AlphaInc
  Converted EXE file size is too large Rajasekaran 0 1,528 Mar-30-2023, 11:50 AM
Last Post: Rajasekaran
  Converting a json file to a dataframe with rows and columns eyavuz21 13 4,522 Jan-29-2023, 03:59 PM
Last Post: eyavuz21
  Create SQL connection function and validate mg24 1 959 Sep-30-2022, 07:45 PM
Last Post: deanhystad
  Create multiple/single csv file for each sql records mg24 6 1,414 Sep-29-2022, 08:06 AM
Last Post: buran
Sad how to validate user input from database johnconar 3 1,936 Sep-11-2022, 12:36 PM
Last Post: ndc85430

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020