Hello everyone, I am new here so please be gentle. I need to parse some JSON and convert to a Dataframe but am having the difficulty in getting the right format. The ultimate goal is to put the data in a Google BigQuery table. The following is sample data but the idea is the same.
JSON:
JSON:
{ 'data': { 'max_age': '', 'now': '2020/02/18 12:21:35 GMT-0000', 'result': [ { 'age': 0, 'saved_question': 0, 'Applicant_id': '678345783', 'columns': [ { 'field_id': 5634564356, 'name': 'Applicant Name' }, { 'field_id': 35884245, 'name': 'Applicant Address' }, { 'field_id': 245676734, 'name': 'Applicant age' }, { 'field_id': 345694546, 'name': 'Count' } ], 'count_employees': 2100, 'id': 15036122, 'employee_count': 2100, 'rows': [ { 'eid': 2971220455, 'data': [ [ { 'text': 'John Smith' } ], [ { 'text': '21' } ], [ { 'text': 'Rock Star' } ], [ { 'text': '1' } ] ], 'tid': 3622122231 }, { 'eid': 18444835, 'data': [ [ { 'text': 'Jane Doe' } ], [ { 'text': '65' } ], [ { 'text': 'Retired' } ], [ { 'text': '1' } ] ], 'tid': 1301620287 },Code:
with open('signals.pickle', 'rb') as outfile: json_data = pickle.load(outfile) something = json_normalize(json_data, record_path =['data','result','rows'[0:]]) something2 = something.pivot(index='cid', columns=['data'[0:]], values=[['data'[0][0]]]) print(something.head(10))Traceback:
Error:Traceback (most recent call last):
File "test.py", line 90, in <module>
something2 = something.pivot(index='cid', columns=['data'[0:]], values=[['data'[0][0]]])
File "C:\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX", line 5912, in pivot
return pivot(self, index=index, columns=columns, values=values)
File "C:\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX\Python\Python37\lib\site-packages\pandas\core\reshape\pivot.py", line 426, in pivot
data[values].values, index=index, columns=values
File "C:\XXXXX\XXXXX\XXXXX\XXXXX\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 2981, in __getitem__
indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
File "C:\XXXXX\XXXXX\XXXXX\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1271, in _convert_to_indexer
return self._get_listlike_indexer(obj, axis, **kwargs)[1]
File "C:\XXXXX\XXXXX\XXXXX\XXXXX\Programs\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1078, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "C:\XXXXX\XXXXX\XXXXX\XXXXX\Programs\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1163, in _validate_read_indexer
key=key, axis=self.obj._get_axis_name(axis)
KeyError: "None of [Index([('d',)], dtype='object')] are in the [columns]"
I have tried several different ways of doing this and decided this was probably the best. Any ideas are obviously welcome.