Python Forum

Full Version: Parsing large JSON
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi!

I'm am very new to Python but love playing with it. What I want to accomplice is to get the content of the following JSON url: https://raw.githubusercontent.com/superm...rkets.json

Then parse that content for the different supermarkets named in the first n value (example, AH, Aldi, coop, etc). Then go trough all the products under the d value and print all the links.

I already have a code that extracts the content from the link in de JSON. So I only need the supermarket name and the product link.

Anyone any advise?

Greetings,
Jos
A json file defines a datastructure. You don't "parse" a json file, you use the json.load() method to reproduce the structure. Loading your json file will create a list of dictionaries. In turn, each dictionary contains a list of dictionaries.
[
    {"n":"ah","d":[
        {"n":"'t IJ Kadoosje bierpakket","l":"wi394045/t-ij-kadoosje-bierpakket","p":17.99,"s":"6 x 0,33 l"},
        {"n":"&c","l":"wi410827/en-c","p":7.45,"s":"per stuk"},
        {"n":"&Then Cabernet sauvignon alcoholvrij","l":"wi549461/en-then-cabernet-sauvignon-alcoholvrij","p":6.99,"s":"0,75 l"},
        {"n":"&Then Chardonnay alcoholvrij","l":"wi549460/en-then-chardonnay-alcoholvrij","p":6.99,"s":"0,75 l"},
        {"n":"100 Watt Orchestra of angels","l":"wi437855/100-watt-orchestra-of-angels","p":3.29,"s":"0,33 l"},
        {"n":"100% Coconut grove","l":"wi415202/100-coconut-grove","p":1.99,"s":"1 l"},
        {"n":"1000 Stories Bourbon barrel aged Zinfandel","l":"wi473073/1000-stories-bourbon-barrel-aged-zinfandel","p":16.59,"s":"0,75 l"},
        {"n":"19 Crimes Chardonnay","l":"wi465846/19-crimes-chardonnay","p":9.49,"s":"0,75 l"},
        {"n":"19 Crimes Red blend","l":"wi465836/19-crimes-red-blend","p":9.49,"s":"0,75 l"},
        {"n":"19 Crimes Sauvignon blanc","l":"wi503579/19-crimes-sauvignon-blanc","p":9.49,"s":"0,75 l"},
...
The keys ("n", "d", "l", "p", "s") are not very descriptive.
Hi deanhystad,

Thanks for the explanation! The json file is not mine but open to use. I believe that he shorten the keys to reduce the file size. Are you or someone else able to create a example code to print the supermarket name (n key) and the product url (l key)

Thanks!
Is there a supermarket name? Is it "ah"? I mostly see inventory "d". You can read about the json module here:

https://docs.python.org/3/library/json.html
Hi,

Yes the names of the supermarkets are in the first n key. Then after that in the d key are all the products that supermarket has. All the supermarkets are:
  • ah
  • aldi
  • coop
  • dekamarkt
  • dirk
  • hoogvliet
  • janlinders
  • jumbo
  • picnic
  • plus
  • spar
  • vomar
The structure of the Json file is not very good,so you have to make into something that make sense.
Use eg JSON Editor Online to see structure better.
Example on how could start to make sense and do some search of the data.
import json

with open('supermarkets.json') as fp:
    data = json.load(fp)

#print(data)

supermarkets = 'AH'
for market in range(12):
    if supermarkets in data[market].values():
        print(f'{supermarkets} is a market')
        # 5 first relative links
        for item in range(5):
            print(data[market]['d'][item]['p'])
Output:
AH is a market wi394045/t-ij-kadoosje-bierpakket wi410827/en-c wi549461/en-then-cabernet-sauvignon-alcoholvrij wi549460/en-then-chardonnay-alcoholvrij wi437855/100-watt-orchestra-of-angels
So if change to supermarkets = 'Coop' will get 5 first from Coop.
So Coop just use number for relative links.
Output:
Coop is a market 8714319929120 8711757043012 8720182162120 8720182161970 9001442225977
If want price change to eg p
supermarkets = 'Vomar'
for market in range(12):
    if supermarkets in data[market].values():
        print(f'{supermarkets} is a market')
        # 5 first price
        for item in range(5):
            print(data[market]['d'][item]['p'])
Output:
Vomar is a market 2.09 2.89 2.89 2.39 0.89