How to extract specific information from json database? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: How to extract specific information from json database? (/thread-8069.html) Pages:
1
2
|
How to extract specific information from json database? - Alberto - Feb-05-2018 Dear Python Users, I have a json database "data_df" [8194 rows x 6 columns]. The sample of the data is uploaded below. nutrients \ 0 [{'code': '203cp1252', 'name': 'Proteincp1252'... 1 [{'code': '203cp1252', 'name': 'Proteincp1252'... 2 [{'code': '203cp1252', 'name': 'Proteincp1252'... 3 [{'code': '203cp1252', 'name': 'Proteincp1252'. ... name \ 0 {'long': 'Butter, saltedcp1252', 'common': ['c... 1 {'long': 'Butter, whipped, with saltcp1252', '... 2 {'long': 'Butter oil, anhydrouscp1252', 'commo... ...What I want is to output a table that will show different parts of the group (Alanine, Glycine, Histidine etc) and the foods in which they are present (Gelatins, dry powder, beluga, meat..etc). I have the following code: AminoAcid = ['Alaninecp1252', 'Argininecp1252', 'Aspartic acidcp1252', 'Cystinecp1252', 'Glutamic acidcp1252', 'Glycinecp1252', 'Histidinecp1252', 'Hydroxyprolinecp1252', 'Isoleucinecp1252', 'Leucinecp1252', 'Lysinecp1252', 'Methioninecp1252', 'Phenylalaninecp1252', 'Prolinecp1252', 'Serinecp1252', 'Threoninecp1252', 'Tryptophancp1252', 'Tyrosinecp1252', 'Valinecp1252'] for index, row in data_df.iterrows(): group = row['group'] food = row['name']['long'] manufacturer = row['manufacturer'] nutrients = row['nutrients'] for nutrient in nutrients: nutrient_name = nutrient['name'] if any(e == nutrient_name for e in AminoAcid): nutrient_val = nutrient['value'] nutrient_unit = nutrient['units']However, I dont know how to continue.. what I want is smth like this: nutrient value Food Alanine 8 Gelatins, dry powder Please, advise me this issue RE: How to extract specific information from json database? - Larz60+ - Feb-05-2018 Is it possible to download this file? If not, take a usable sample of same and post here it appears to be in dictionary format, but you have loaded into a list. If kept in dictionary format, it's a piece of cake. just for chuckles, try the following and see if it works: with open('data_df') as f: amino_acid_dict = json.load(f) count = 20 for key, value in amino_acid_dict.items(): print('key: {}, value: {}'.format(key, value)) count -= 1 if count == 0: breakNot tested, hope there are no typo's Please post (at least part of) results RE: How to extract specific information from json database? - Alberto - Feb-05-2018 Here I upload the data that is a subsample of pandas df. I saved it as csv. I tried to run the code (skipping the first line as my df is not saved as a separate file on the hard disk) and got the error: The list of Amino Acids is as follows:AminoAcid = ['Alaninecp1252', 'Argininecp1252', 'Aspartic acidcp1252', 'Cystinecp1252', 'Glutamic acidcp1252', 'Glycinecp1252', 'Histidinecp1252', 'Hydroxyprolinecp1252', 'Isoleucinecp1252', 'Leucinecp1252', 'Lysinecp1252', 'Methioninecp1252', 'Phenylalaninecp1252', 'Prolinecp1252', 'Serinecp1252', 'Threoninecp1252', 'Tryptophancp1252', 'Tyrosinecp1252', 'Valinecp1252']Can you, please take a look at the data and advise me how to proceed. RE: How to extract specific information from json database? - Larz60+ - Feb-05-2018 Give me a few hours to play... I do have a question before I start to look at this. Your title Quote:How to extract specific information from json database?suggests that there is a json file somewhere. Is there such a file? If so, this is what I would be interested in. The data could be extracted directly from this file. Also, the snippet of code I postes, assumed from your title that the file was in json format. RE: How to extract specific information from json database? - Alberto - Feb-05-2018 Yes, you are right. I used the code below to convert it to pandas df (the file is 350MB): import pandas as pd # read the entire file into a python array with open('C:/Users/Alberto/nutrients.json', 'r') as f: data = f.readlines() # remove the trailing "\n" from each line data = map(lambda x: x.rstrip(), data) # each element of 'data' is an individual JSON object. # i want to convert it into an *array* of JSON objects # which, in and of itself, is one large JSON object # basically... add square brackets to the beginning # and end, and have all the individual business JSON objects # separated by a comma data_json_str = "[" + ",".join(data) + "]" # now, load it into pandas data_df = pd.read_json(data_json_str) RE: How to extract specific information from json database? - Larz60+ - Feb-05-2018 I would much rather have the json file, as it is already in the perfect format for extracting information is it possible to get that data. the snippet i posted should work with your json file Just change the filename RE: How to extract specific information from json database? - Alberto - Feb-05-2018 I tried and here what it gives me:
RE: How to extract specific information from json database? - Larz60+ - Feb-05-2018 I am suspect of your json file. If it was truly json, you would be able to read it with: with open(filename) as f: data_dict = json.load(f)There is a pandas to dictionary command (see: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html) In dictionary format, extraction the data that you are interested in is so easy. I am not a pandas user. RE: How to extract specific information from json database? - Alberto - Feb-05-2018 Thank you for your help! RE: How to extract specific information from json database? - Larz60+ - Feb-06-2018 I will work with the data you provided, but you're thread title again is to extract data from a json database. Is there no possible way that you can get me a copy of that json file? It would be very easy to get exactly what you wanted from that. Taking the exported data and converting to a csv file is a snap. Is the source the USDA Nutrition database? |