Python Forum
extract specific data from a group of json-files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extract specific data from a group of json-files
#1
Hello everyone!

I have a a large collection of json-files (a few thousand) each containing metadata about a text post, such as the post-ID, the username (and full name, if made public by the user), timestamp and so on. I would like to extract this information from each file without having to do so manually, but am myself not yet familiar enough with Python to figure out how I can do this (I have only been able to follow one course so far, and unfortunately it isn't useful in this case - it was just on basic calculations i Python).
Another problem I have is that some of the files will contain several sets of different data with the same name when someone commented on the post. However, I only need this information about the main post (thus the first time this information appears in the file).

Does anyone have any idea how I might be able to extract this information?
Thank you so much in advance!

Kind regards
Reply
#2
First you should know the data structure of the json file.
You can investigate it, if you open the file with Python and use json.loads() on the open file.

import json

with open('your_data0001.json') as fd:
    data = json.loafs(fd)
Usually json data is a dictionary with subdictionaries and lists.
To show the keys, you can use list(data.keys()).
If you find the right key, you can dig deeper.

For example if you have the key 'metadata', accessing it, is very easy:
data['metadata']
The value of 'metadata' could be a list or a dict or something else (int, float, str).


After you know the structure, you can write a transformer function for it.
It should transform the json data into the form you want to have.

This is just an example and do not have to fit on your data.
def transformer(mapping_from_json):
    """
    A generator which takes a mapping (dict)
    and yields name, age, active
    """
    for items in mapping_from_json['results']['metadata']:
        # if metadata is a list
        for element in items:
            name = element.get('name', 'NO NAME')
            age = element.get('age', 0)
            active = element.get('active', False)
            yield (name, age, active)
I used the generator, because the logic is easier to understand. Always if a yield is in a function, then it's returning a generator, if you call the function. Iterating over the generator, yields the elements.

To consume it, just use something, which takes iterables or use a for loop.
list(transformer(my_dict))
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
Hi @DeaD_EyE! Thank you for helping me!
where do I need to put my file to open it? Right now it's situated in my documents, but when I tried to open it, I received following error:
Error:
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) <ipython-input-1-b22700e301c2> in <module> 1 import json 2 ----> 3 with open('2015-05-14_16-35-57_UTC_janedoe.json') as fd: 4 data=jsonloafs(fd) FileNotFoundError: [Errno 2] No such file or directory: '2015-05-14_16-35-57_UTC_janedoe.json'
Reply
#4
Either give the full path to the file, or put it in your script's working directory (i.e. the same directory as the script resides).

(Dec-05-2019, 09:40 AM)DeaD_EyE Wrote: Usually json data is a dictionary with subdictionaries and lists.

I don't think it's reasonable to assume that. You can have an array as the top level thing, for example.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  encrypt data in json file help jacksfrustration 1 63 Yesterday, 05:16 PM
Last Post: deanhystad
  Is it possible to extract 1 or 2 bits of data from MS project files? cubangt 8 945 Feb-16-2024, 12:02 AM
Last Post: deanhystad
  data validation with specific regular expression shaheen07 0 296 Jan-12-2024, 07:56 AM
Last Post: shaheen07
  Why can't it extract the data from .txt well? Melcu54 3 646 Aug-20-2023, 10:07 PM
Last Post: deanhystad
Question Need help for a python script to extract information from a list of files lephunghien 6 1,033 Jun-12-2023, 05:40 PM
Last Post: snippsat
  script to calculate data in csv-files ledgreve 0 1,057 May-19-2023, 07:24 AM
Last Post: ledgreve
  python print all files which contain specific word in it mg24 5 1,188 Jan-27-2023, 11:20 AM
Last Post: snippsat
  python move specific files from source to destination including duplicates mg24 3 1,051 Jan-21-2023, 04:21 AM
Last Post: deanhystad
  Read nested data from JSON - Getting an error marlonbown 5 1,309 Nov-23-2022, 03:51 PM
Last Post: snippsat
  Using locationtagger to extract locations found in a specific country/region lord_of_cinder 1 1,222 Oct-04-2022, 12:46 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020