Python Forum
How to extract specific information from json database? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How to extract specific information from json database? (/thread-8069.html)

Pages: 1 2


How to extract specific information from json database? - Alberto - Feb-05-2018

Dear Python Users,

I have a json database "data_df" [8194 rows x 6 columns]. The sample of the data is uploaded below.

 nutrients  \
0     [{'code': '203cp1252', 'name': 'Proteincp1252'...   
1     [{'code': '203cp1252', 'name': 'Proteincp1252'...   
2     [{'code': '203cp1252', 'name': 'Proteincp1252'...   
3     [{'code': '203cp1252', 'name': 'Proteincp1252'.
...

 name  \
 0     {'long': 'Butter, saltedcp1252', 'common': ['c...   
 1     {'long': 'Butter, whipped, with saltcp1252', '...   
 2     {'long': 'Butter oil, anhydrouscp1252', 'commo...
 ...
What I want is to output a table that will show different parts of the group (Alanine, Glycine, Histidine etc) and the foods in which they are present (Gelatins, dry powder, beluga, meat..etc).

I have the following code:

AminoAcid = ['Alaninecp1252', 'Argininecp1252', 'Aspartic acidcp1252', 'Cystinecp1252', 'Glutamic acidcp1252', 'Glycinecp1252', 'Histidinecp1252',  'Hydroxyprolinecp1252',  'Isoleucinecp1252', 'Leucinecp1252', 'Lysinecp1252', 'Methioninecp1252', 'Phenylalaninecp1252', 'Prolinecp1252', 'Serinecp1252', 'Threoninecp1252', 'Tryptophancp1252', 'Tyrosinecp1252', 'Valinecp1252']    

for index, row in data_df.iterrows():
    group = row['group']
    food = row['name']['long']
    manufacturer = row['manufacturer']
    nutrients = row['nutrients']
    for nutrient in nutrients:
        nutrient_name = nutrient['name']
        if any(e == nutrient_name for e in AminoAcid):
            nutrient_val = nutrient['value']
            nutrient_unit = nutrient['units']
However, I dont know how to continue.. what I want is smth like this:
nutrient value Food
Alanine 8 Gelatins, dry powder

Please, advise me this issue


RE: How to extract specific information from json database? - Larz60+ - Feb-05-2018

Is it possible to download this file?
If not, take a usable sample of same and post here
it appears to be in dictionary format, but you have loaded into a list.
If kept in dictionary format, it's a piece of cake.
just for chuckles, try the following and see if it works:
with open('data_df') as f:
    amino_acid_dict = json.load(f)
count = 20
for key, value in amino_acid_dict.items():
    print('key: {}, value: {}'.format(key, value))
    count -= 1
    if count == 0:
        break
Not tested, hope there are no typo's

Please post (at least part of) results


RE: How to extract specific information from json database? - Alberto - Feb-05-2018

Here I upload the data that is a subsample of pandas df. I saved it as csv. I tried to run the code (skipping the first line as my df is not saved as a separate file on the hard disk) and got the error:

Error:
AttributeError: 'DataFrame' object has no attribute 'read'
The list of Amino Acids is as follows:

AminoAcid = ['Alaninecp1252', 'Argininecp1252', 'Aspartic acidcp1252', 'Cystinecp1252', 'Glutamic acidcp1252', 'Glycinecp1252', 'Histidinecp1252',  'Hydroxyprolinecp1252',  'Isoleucinecp1252', 'Leucinecp1252', 'Lysinecp1252', 'Methioninecp1252', 'Phenylalaninecp1252', 'Prolinecp1252', 'Serinecp1252', 'Threoninecp1252', 'Tryptophancp1252', 'Tyrosinecp1252', 'Valinecp1252']    
Can you, please take a look at the data and advise me how to proceed.


RE: How to extract specific information from json database? - Larz60+ - Feb-05-2018

Give me a few hours to play...

I do have a question before I start to look at this.
Your title
Quote:How to extract specific information from json database?
suggests that there is a json file somewhere.
Is there such a file?
If so, this is what I would be interested in. The data could be extracted directly from this file.
Also, the snippet of code I postes, assumed from your title that the file was in json format.


RE: How to extract specific information from json database? - Alberto - Feb-05-2018

Yes, you are right. I used the code below to convert it to pandas df (the file is 350MB):

                   
import pandas as pd

# read the entire file into a python array
with open('C:/Users/Alberto/nutrients.json', 'r') as f:
    data = f.readlines()

# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)

# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in and of itself, is one large JSON object
# basically... add square brackets to the beginning
# and end, and have all the individual business JSON objects
# separated by a comma
data_json_str = "[" + ",".join(data) + "]"

# now, load it into pandas
data_df = pd.read_json(data_json_str)



RE: How to extract specific information from json database? - Larz60+ - Feb-05-2018

I would much rather have the json file, as it is already in the perfect format for extracting information
is it possible to get that data.

the snippet i posted should work with your json file
Just change the filename


RE: How to extract specific information from json database? - Alberto - Feb-05-2018

I tried and here what it gives me:

Error:
raise JSONDecodeError("Extra data", s, end) JSONDecodeError: Extra data



RE: How to extract specific information from json database? - Larz60+ - Feb-05-2018

I am suspect of your json file.
If it was truly json, you would be able to read it with:

with open(filename) as f:
    data_dict = json.load(f)
There is a pandas to dictionary command (see: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html)

In dictionary format, extraction the data that you are interested in is so easy.
I am not a pandas user.


RE: How to extract specific information from json database? - Alberto - Feb-05-2018

Thank you for your help!


RE: How to extract specific information from json database? - Larz60+ - Feb-06-2018

I will work with the data you provided, but you're thread title
again is to extract data from a json database.

Is there no possible way that you can get me a copy of that json
file? It would be very easy to get exactly what you wanted from that.
Taking the exported data and converting to a csv file is a snap.

Is the source the USDA Nutrition database?