Posts: 88
Threads: 33
Joined: Apr 2018
Hi All,
I have a csv file containing product details. From which I am extracting required informations.
There is one column in the csv-Options and it contains values like- [], [['Newborn (2.5-6kg)', 'Big Baby (6-9kg)']].
I have written an unnesting function and applying on Options so that values looks like= Newborn (2.5-6kg),Big Baby (6-9kg).
But th eproblem with this function is that it is removing the [] from the Options and so all rows containing Options as [] are removed and so I am loosing the products.
I need to accept empty Options values also so that there is no loss of product while unnesting.
PFB my unnesting function-
def unnest(df, col, col2,reset_index=False):
col_flat = pd.DataFrame([[i, x]
for i, y in df[col].apply(list).iteritems()
for x in y ], columns=['I', col]
)
col_flat = col_flat.set_index('I')
df = df.drop(col, 1)
df = df.merge(col_flat, left_index=True, right_index=True)
if reset_index:
df = df.reset_index(drop=True)
merchant_product_code = (df['Product code'] + '-' + df[col])
df['item_group_id'] = df['Product code']
df['Product code'] = merchant_product_code
return df I am applying above function as below-
full_df = unnest(full_df,'Options','product code') I have attached the full_df csv, please have a look and correct me what I am missing here.
Attached Files
full.csv (Size: 64.63 KB / Downloads: 73)
Posts: 1,950
Threads: 8
Joined: Jun 2018
Jul-20-2018, 08:16 AM
(This post was last modified: Jul-20-2018, 08:17 AM by perfringo.)
I have no idea what would you like to do with data, but in with csv-s its often easier to use screwdriver (csv module) instead of hammer (pandas).
I would import csv file into ordered dict and would do any manipulation there:
import csv
with open('full.csv', 'r', encoding='UTF-8') as f:
data = f.read().splitlines()
products = list(csv.DictReader(data, delimiter=',')) Now all rows are in variable 'products' which is OrderedDict in form of:
Output: OrderedDict([('Product code', 'PHBC'),
('Category', 'All/For Baby/Feeding /Pacifiers'),
('List price', '6.5'),
('Price', '6.5'),
('Weight', '0'),
('Quantity', '93'),
('Min quantity', '0'),
('Shipping freight', '0'),
('Date added', '07-01-2014 00:00'),
.....
One can access needed column by:
products[0]['Options'] which will give:
Output: "['Newborn (2.5-6kg)', 'Big Baby (6-9kg)']"
As empty lists are False, I would make needed conversions something like that (it will ignore []):
for row in products:
if row['Options']:
# do your magic here
As said, I have no idea what kind of task you would like to accomplish, so it may help you or be totally irrelevant.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 8,152
Threads: 160
Joined: Sep 2016
Jul-20-2018, 08:28 AM
(This post was last modified: Jul-20-2018, 08:29 AM by buran.)
(Jul-20-2018, 08:16 AM)perfringo Wrote: As empty lists are False, I would make needed conversions something like that (it will ignore []):
Actually, because this comes from csv, this would be not list/empty list, but simple str, so "[]" would not be a empty list, thus your last snippet will not work as expected.
I guess, when information was scraped from the site and the csv was created the list of options was written as str.
Can you influence the format of the file?
Posts: 1,950
Threads: 8
Joined: Jun 2018
(Jul-20-2018, 08:28 AM)buran Wrote: (Jul-20-2018, 08:16 AM)perfringo Wrote: As empty lists are False, I would make needed conversions something like that (it will ignore []):
Actually, because this comes from csv, this would be not list/empty list, but simple str, so "[]" would not be a empty list, thus your last snippet will not work as expected.
You are right! I stand corrected. One should always inspect data types with precision.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 8,152
Threads: 160
Joined: Sep 2016
in this case it's relatively safe to use eval
>>> product_options = eval("[['Newborn (2.5-6kg)', 'Big Baby (6-9kg)']]")
>>> product_options
[['Newborn (2.5-6kg)', 'Big Baby (6-9kg)']]
>>> product_options[0][0]
'Newborn (2.5-6kg)'
>>>
Posts: 1,950
Threads: 8
Joined: Jun 2018
(Jul-20-2018, 08:40 AM)perfringo Wrote: (Jul-20-2018, 08:28 AM)buran Wrote: Actually, because this comes from csv, this would be not list/empty list, but simple str, so "[]" would not be a empty list, thus your last snippet will not work as expected.
You are right! I stand corrected. One should always inspect data types with precision.
One way is to use re to parse list out of string.
import re
def string_to_list(s):
'''Parse string and return list inside string'''
refine = re.sub("[\[\],\s]","",s)
splitted = re.split("[\'\"]", refine)
result = [s for s in splitted if s != '']
return result Then:
for row in products:
if string_to_list(row['Options']):
# do your magic here Output on file provided is:
Output: ['Newborn(2.5-6kg)', 'BigBaby(6-9kg)']
['Newborn(2.5-6kg)', 'BigBaby(6-9kg)', 'Preemie(1.5-2.5kg)']
['Newborn(2.5-6kg)', 'BigBaby(6-9kg)']
['Newborn(2.5-6kg)', 'BigBaby(6-9kg)']
['Newborn(2.5-6kg)', 'BigBaby(6-9kg)']
['Newborn(2.5-6kg)', 'BigBaby(6-9kg)']
Note, that result of function is list; after doing magic it would be probably good idea to convert result back to string.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 88
Threads: 33
Joined: Apr 2018
Hi All,
I am working on a ETL process using Python3.6.
For now I am facing one issue while unnesting 'Options' column in my dataframe.
Currently I am downloading two csv files from FTP location and then performing cleaning/transformation on them.
For one csv file my unnesting is working as expected by transforming the product code and options values as below-
But for the second csv file, options column is not unnested properly like below-
My code is in below link-
my python code
I have attached the dataframes generated from csvs with minimal columns and full dataframe also so that anyone can test.
Please see once and let me know if need anything from my side.
Attached Files
df1.csv (Size: 484 bytes / Downloads: 15)
df2.csv (Size: 526 bytes / Downloads: 17)
full_df.csv (Size: 390 bytes / Downloads: 16)
|