Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
regex on json file
#1
I'm trying to match a regex expression to the value of Id in a json file. My goal is to iterate over a bunch of json files in a directory and replace the value of the Id key with the regex match. The regex works well by itself when i try it on regex101, however when I run it again the json files on my computer i get a
Error:
TypeError: expected string or bytes-like object
. Any and all help appreciated.

import json
import os
import re


rootdir = r'C:\\Users\\homersimpson\\jsondumps'

for files in os.scandir(rootdir):

        with open(files, "r") as file:
            json_data = json.load(file)
            extracted  = re.findall((r'.+?(?<=\$apples\$)'),json_data)
            print(something)
Reply
#2
json.load() returns a python object. For instance it might be a deeply-nested set of dictionaries or lists.

re.findall takes a single string and operates on it, not a dict or list.

I wouldn't recommend regex on a JSON file, but if that's what you're trying to do, just read the file as text (a la open()).
Reply
#3
load the json. iterate over it. match the keys (maybe regex here). Replace the value. at the end - dump back to json file.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#4
well I'm thinking maybe I can load the python as a dictionary, cycle through values, modify it and write it out as Json.

(May-05-2020, 07:57 AM)buran Wrote: load the json. iterate over it. match the keys (maybe regex here). Replace the value. at the end - dump back to json file.

I load the json but won't let me iterate.
Reply
#5
we don't know what your json file look like. Can you show some sample data?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#6
Quote:{
"id": "companyName-channelruleprintermap-5746-$pc2$companyName$5746$uspc02",
"ChannelRuleCollections": [
{
"Name": "Peak",
"State": 0,
"Modified": "2020-01-10T08:17:11.9072155Z",
"Rules": [
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Ground Coffee",
"Printer": "None",
"LabelPrinter": "None",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
},
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Warmed Food",
"Printer": "Warming Printer",
"LabelPrinter": "Warming Printer",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
},
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Blended Beverages",
"Printer": "Bar 1 (Closest to HOP)",
"LabelPrinter": "Bar 1 (Closest to HOP)",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
}...

so I'm trying to change the

"Id" : "companyName-channelruleprintermap-5746-$pc2$companyName$5746$uspc02"

to
"Id" : "companyName-channelruleprintermap-5746"
Reply
#7
probably something like this
json_data = json.load(file)
json_data['id'] = json_data['id'].split('-$')[0]
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#8
thank you SO MUCH Buran, I have no idea why I thought regex would be the solution to this.

so I had neglected to mention that there is a [ before { "id":.... is in essence it in an array, does this make it iterable?
[
{
"id": "companyName-channelruleprintermap-5746-$pc2$companyName$5746$uspc02",
"ChannelRuleCollections": [
{
"Name": "Peak",
"State": 0,
"Modified": "2020-01-10T08:17:11.9072155Z",
"Rules": [
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Ground Coffee",
"Printer": "None",
"LabelPrinter": "None",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
},
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Warmed Food",
"Printer": "Warming Printer",
"LabelPrinter": "Warming Printer",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
},
{
"RuleType": "Item",
"Channel": "cafe",
"ChannelSource": "usrg01005746",
"RuleName": "Blended Beverages",
"Printer": "Bar 1 (Closest to HOP)",
"LabelPrinter": "Bar 1 (Closest to HOP)",
"ChitPrinter": "None",
"FormatName": "Label",
"ChitFormat": "OrderTicket"
}...
]
Reply
#9
That just adds another list around the json object. Assuming there's only one item inside, your content will be item number 0. Instead of json_data['id'] you'd access the ID as json_data[0]['id']
Reply
#10
so roughly speaking all the "id" keys are objects in a big array. I have tried various variation of the simple code below but I keep getting TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper
import json
import os
  
rootfile = r'C:\\Users\\homer\\dumps\\cosmodb-sample.json'
with open(rootfile,'r') as file:
    json_data = json.loads(file) 
    for docs in json_data[0]:
        print(docs)
Quote:[
{
"id": "company-channelruleprintermap-8913-$pc2$company$8913$uspc02",
"ChannelRuleCollections": [
{
"Name": "Peak",
"State": 0,
"Modified": "2020-01-10T08:34:34.8015346Z"
}
]
},
{
"id": "company-channelruleprintermap-8913-$pc2$company$8913$uspc02",
"ChannelRuleCollections": [
{
"Name": "Peak",
"State": 0,
"Modified": "2020-01-10T08:34:34.8015346Z"
}
]
},
]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  encrypt data in json file help jacksfrustration 1 191 Mar-28-2024, 05:16 PM
Last Post: deanhystad
  parse json field from csv file lebossejames 4 725 Nov-14-2023, 11:34 PM
Last Post: snippsat
  Python Script to convert Json to CSV file chvsnarayana 8 2,496 Apr-26-2023, 10:31 PM
Last Post: DeaD_EyE
  Loop through json file and reset values [SOLVED] AlphaInc 2 2,097 Apr-06-2023, 11:15 AM
Last Post: AlphaInc
  search file by regex SamLiu 1 906 Feb-23-2023, 01:19 PM
Last Post: deanhystad
  Split pdf in pypdf based upon file regex standenman 1 2,075 Feb-03-2023, 12:01 PM
Last Post: SpongeB0B
  Converting a json file to a dataframe with rows and columns eyavuz21 13 4,404 Jan-29-2023, 03:59 PM
Last Post: eyavuz21
  validate large json file with millions of records in batches herobpv 3 1,264 Dec-10-2022, 10:36 PM
Last Post: bowlofred
  Writing to json file ebolisa 1 996 Jul-17-2022, 04:51 PM
Last Post: deanhystad
  Trying to parse only 3 key values from json file cubangt 8 3,447 Jul-16-2022, 02:05 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020