Python Forum
How to filter out Column data From Multiple rows data? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to filter out Column data From Multiple rows data? (/thread-32266.html)

Pages: 1 2


How to filter out Column data From Multiple rows data? - firaki12345 - Jan-31-2021

Good Evening


Hi everyone, so i got the following JSON file from Walmart regarding their product items and price.


so i loaded up jupyter notebook, imported pandas and then loaded it into a Data frame with custom columns as shown in the pics below.


[Image: walmart1.png]

[Image: walmart2.png]


now this is what i want to do:

make new columns named as min price and max price and load the data into it

how can i do that ?


Here is the code in jupyter notebook for reference.

i also want the offer price as some items don't have minprice and maxprice :)


EDIT: Here is the code and Json file

Json File: https://pastebin.com/sLGCFCDC


Python Code:

import json
import pandas as pd
     
     
with open("walmart.json") as f:
 data = json.load(f)
     
walmart = data["items"]
     
     
wdf = pd.DataFrame(walmart,columns=["productId","primaryOffer"])
     
     
print(wdf.loc[0,"primaryOffer"])
     
     
pd.set_option('display.max_colwidth', None)
     
print(wdf)



RE: How to filter out Column data From Multiple rows data? - eddywinch82 - Jan-31-2021

Hi firaki12345,

Try this :

I hope it works for you :-

wdf.join(wdf['primaryOffer'].str.split(':', 1, expand=True).rename(columns={0:'minPrice', 1:'maxPrice', 2:'offerPrice}))
I hope this works for you, let me know what output you get with that code, and will try to modify it if you don't get the result you want.

Edit :- I tried it and it didn't work for me, so I will work on a solution for you, sorry about this.

Best Regards

Eddie Winch


RE: How to filter out Column data From Multiple rows data? - firaki12345 - Feb-01-2021

(Jan-31-2021, 10:00 PM)eddywinch82 Wrote: Hi firaki12345,

Try this :

I hope it works for you :-

wdf.join(wdf['primaryOffer'].str.split(':', 1, expand=True).rename(columns={0:'minPrice', 1:'maxPrice', 2:'offerPrice}))
I hope this works for you, let me know what output you get with that code, and will try to modify it if you don't get the result you want.

Edit :- I tried it and it didn't work for me, so I will work on a solution for you, sorry about this.

Best Regards

Eddie Winch

hi, thanks for the quick reply, actually i found last night a better solution we all can use and it was hiding in plain sight.

json_normalize


RE: How to filter out Column data From Multiple rows data? - eddywinch82 - Feb-01-2021

Hi firaki12345,

Could you share the Python Code and output, of the solution, to what you wanted to achieve ?

I am very interested in knowing, if that is okay with you ?

Regards

Eddie Winch Smile


RE: How to filter out Column data From Multiple rows data? - eddywinch82 - Feb-01-2021

Hi firaki12345,

I have achieved what you wanted doing, using str.extract a regular expression, and a couple of other changes.

Here is the finished Code, let me know if you would like any other changes doing :-

import json
import pandas as pd
import re     
      
with open("walmart json file.txt") as f:
 data = json.load(f)
      
walmart = data["items"]
      
      
wdf = pd.DataFrame(walmart,columns=["productId","primaryOffer"])
      
      
print(wdf.loc[0,"primaryOffer"])
      
      
pd.set_option('display.max_colwidth', None)
      

wdf['primaryOffer'] = wdf['primaryOffer'].astype(str)

wdf['primaryOffer'] = wdf.primaryOffer.str.replace("'", '')

wdf['primaryOffer'] = wdf.primaryOffer.str.replace(",", '')

wdf['minPrice'] = wdf.primaryOffer.str.extract('.*minPrice:\s?(\d+[.]?\d*)', expand = False)
wdf['maxPrice'] = wdf.primaryOffer.str.extract('.*maxPrice:\s?(\d+[.]?\d*)', expand = False)
wdf['offerPrice'] = wdf.primaryOffer.str.extract('.*offerPrice:\s?(\d+[.]?\d*)', expand = False)

wdf=wdf.fillna('--')

wdf
The walmart json file.txt line of text, will need changing to what it is for you.

Best Regards

Eddie Winch Smile


RE: How to filter out Column data From Multiple rows data? - eddywinch82 - Feb-02-2021

Hi firaki12345,

Here is a more cleaned up output version, of the previous Code :-

I hope you like it ))

import json
import pandas as pd
import re     
      
with open("walmart json file.txt") as f:
 data = json.load(f)
      
walmart = data["items"]
      
      
wdf = pd.DataFrame(walmart,columns=["productId","primaryOffer"])
      
      
print(wdf.loc[0,"primaryOffer"])
      
      
pd.set_option('display.max_colwidth', None)
      

wdf['primaryOffer'] = wdf['primaryOffer'].astype(str)

wdf['primaryOffer'] = wdf.primaryOffer.str.replace("'", '')

wdf['primaryOffer'] = wdf.primaryOffer.str.replace(",", '')

wdf['primaryOffer'] = wdf.primaryOffer.str.replace("{", '') 


wdf['offerId'] = wdf.primaryOffer.str.extract('.*offerId:\s?(\w+)', expand = False)
wdf['minPrice(USD)'] = wdf.primaryOffer.str.extract('.*minPrice:\s?(\d+[.]?\d*)', expand = False)
wdf['maxPrice(USD)'] = wdf.primaryOffer.str.extract('.*maxPrice:\s?(\d+[.]?\d*)', expand = False)
wdf['offerPrice(USD)'] = wdf.primaryOffer.str.extract('.*offerPrice:\s?(\d+[.]?\d*)', expand = False)

wdf=wdf.fillna('--')

wdf.drop('primaryOffer', axis=1, inplace=True)

wdf
Regards

Eddie Winch


RE: How to filter out Column data From Multiple rows data? - firaki12345 - Feb-05-2021

(Feb-02-2021, 10:02 PM)eddywinch82 Wrote: Hi firaki12345,

Here is a more cleaned up output version, of the previous Code :-

I hope you like it ))

import json
import pandas as pd
import re     
      
with open("walmart json file.txt") as f:
 data = json.load(f)
      
walmart = data["items"]
      
      
wdf = pd.DataFrame(walmart,columns=["productId","primaryOffer"])
      
      
print(wdf.loc[0,"primaryOffer"])
      
      
pd.set_option('display.max_colwidth', None)
      

wdf['primaryOffer'] = wdf['primaryOffer'].astype(str)

wdf['primaryOffer'] = wdf.primaryOffer.str.replace("'", '')

wdf['primaryOffer'] = wdf.primaryOffer.str.replace(",", '')

wdf['primaryOffer'] = wdf.primaryOffer.str.replace("{", '') 


wdf['offerId'] = wdf.primaryOffer.str.extract('.*offerId:\s?(\w+)', expand = False)
wdf['minPrice(USD)'] = wdf.primaryOffer.str.extract('.*minPrice:\s?(\d+[.]?\d*)', expand = False)
wdf['maxPrice(USD)'] = wdf.primaryOffer.str.extract('.*maxPrice:\s?(\d+[.]?\d*)', expand = False)
wdf['offerPrice(USD)'] = wdf.primaryOffer.str.extract('.*offerPrice:\s?(\d+[.]?\d*)', expand = False)

wdf=wdf.fillna('--')

wdf.drop('primaryOffer', axis=1, inplace=True)

wdf
Regards

Eddie Winch


hi very very sorry for the late reply, but as i said i found an easier solution by using json_normalize
as it turns all jumbled up data into columns. so it becomes easy to use in pandas :)


RE: How to filter out Column data From Multiple rows data? - eddywinch82 - Feb-05-2021

No problem firaki12345,

I enjoyed coming up with the solutions, to what you wanted doing,
my pleasure.

Regards

Eddie Winch


RE: How to filter out Column data From Multiple rows data? - buran - Feb-05-2021

cross-posted https://stackoverflow.com/questions/65977198/how-to-filter-out-column-data-from-multiple-rows-data

this is a great example why cross-posting is bad and at least you should let us know about it.


RE: How to filter out Column data From Multiple rows data? - firaki12345 - Feb-06-2021

(Feb-05-2021, 03:47 PM)buran Wrote: cross-posted https://stackoverflow.com/questions/65977198/how-to-filter-out-column-data-from-multiple-rows-data

this is a great example why cross-posting is bad and at least you should let us know about it.

hi, i didn't cross post. i posted my question in web scraping thats it :)