Python Forum

Full Version: regex on json file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I'm trying to match a regex expression to the value of Id in a json file. My goal is to iterate over a bunch of json files in a directory and replace the value of the Id key with the regex match. The regex works well by itself when i try it on regex101, however when I run it again the json files on my computer i get a
Error:
TypeError: expected string or bytes-like object
. Any and all help appreciated.

import json
import os
import re


rootdir = r'C:\\Users\\homersimpson\\jsondumps'

for files in os.scandir(rootdir):

        with open(files, "r") as file:
            json_data = json.load(file)
            extracted  = re.findall((r'.+?(?<=\$apples\$)'),json_data)
            print(something)
json.load() returns a python object. For instance it might be a deeply-nested set of dictionaries or lists.

re.findall takes a single string and operates on it, not a dict or list.

I wouldn't recommend regex on a JSON file, but if that's what you're trying to do, just read the file as text (a la open()).
load the json. iterate over it. match the keys (maybe regex here). Replace the value. at the end - dump back to json file.
well I'm thinking maybe I can load the python as a dictionary, cycle through values, modify it and write it out as Json.

(May-05-2020, 07:57 AM)buran Wrote: [ -> ]load the json. iterate over it. match the keys (maybe regex here). Replace the value. at the end - dump back to json file.

I load the json but won't let me iterate.
we don't know what your json file look like. Can you show some sample data?
Quote:{
"id": "companyName-channelruleprintermap-5746-$pc2$companyName$5746$uspc02",
"ChannelRuleCollections": [
{
"Name": "Peak",
"State": 0,
"Modified": "2020-01-10T08:17:11.9072155Z",
"Rules": [
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Ground Coffee",
"Printer": "None",
"LabelPrinter": "None",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
},
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Warmed Food",
"Printer": "Warming Printer",
"LabelPrinter": "Warming Printer",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
},
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Blended Beverages",
"Printer": "Bar 1 (Closest to HOP)",
"LabelPrinter": "Bar 1 (Closest to HOP)",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
}...

so I'm trying to change the

"Id" : "companyName-channelruleprintermap-5746-$pc2$companyName$5746$uspc02"

to
"Id" : "companyName-channelruleprintermap-5746"
probably something like this
json_data = json.load(file)
json_data['id'] = json_data['id'].split('-$')[0]
thank you SO MUCH Buran, I have no idea why I thought regex would be the solution to this.

so I had neglected to mention that there is a [ before { "id":.... is in essence it in an array, does this make it iterable?
[
{
"id": "companyName-channelruleprintermap-5746-$pc2$companyName$5746$uspc02",
"ChannelRuleCollections": [
{
"Name": "Peak",
"State": 0,
"Modified": "2020-01-10T08:17:11.9072155Z",
"Rules": [
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Ground Coffee",
"Printer": "None",
"LabelPrinter": "None",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
},
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Warmed Food",
"Printer": "Warming Printer",
"LabelPrinter": "Warming Printer",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
},
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Blended Beverages",
"Printer": "Bar 1 (Closest to HOP)",
"LabelPrinter": "Bar 1 (Closest to HOP)",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
}...
]
That just adds another list around the json object. Assuming there's only one item inside, your content will be item number 0. Instead of json_data['id'] you'd access the ID as json_data[0]['id']
so roughly speaking all the "id" keys are objects in a big array. I have tried various variation of the simple code below but I keep getting TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper
import json
import os
  
rootfile = r'C:\\Users\\homer\\dumps\\cosmodb-sample.json'
with open(rootfile,'r') as file:
    json_data = json.loads(file) 
    for docs in json_data[0]:
        print(docs)
Quote:[
{
"id": "company-channelruleprintermap-8913-$pc2$company$8913$uspc02",
"ChannelRuleCollections": [
{
"Name": "Peak",
"State": 0,
"Modified": "2020-01-10T08:34:34.8015346Z"
}
]
},
{
"id": "company-channelruleprintermap-8913-$pc2$company$8913$uspc02",
"ChannelRuleCollections": [
{
"Name": "Peak",
"State": 0,
"Modified": "2020-01-10T08:34:34.8015346Z"
}
]
},
]
Pages: 1 2