Python Forum
How to make my python 3.6 unnesting function correct?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to make my python 3.6 unnesting function correct?
#1
Hi All,

I have a csv file containing product details. From which I am extracting required informations.
There is one column in the csv-Options and it contains values like- [], [['Newborn (2.5-6kg)', 'Big Baby (6-9kg)']].

I have written an unnesting function and applying on Options so that values looks like= Newborn (2.5-6kg),Big Baby (6-9kg).

But th eproblem with this function is that it is removing the [] from the Options and so all rows containing Options as [] are removed and so I am loosing the products.

I need to accept empty Options values also so that there is no loss of product while unnesting.

PFB my unnesting function-

def unnest(df, col, col2,reset_index=False):
   col_flat = pd.DataFrame([[i, x]
                      for i, y in df[col].apply(list).iteritems()
                          for x in y ], columns=['I', col]
                           )
   col_flat = col_flat.set_index('I')
   df = df.drop(col, 1)
   df = df.merge(col_flat, left_index=True, right_index=True)

   if reset_index:
       df = df.reset_index(drop=True)
   merchant_product_code = (df['Product code'] + '-' + df[col])
   df['item_group_id'] = df['Product code']
   df['Product code'] = merchant_product_code
   return df
I am applying above function as below-
full_df = unnest(full_df,'Options','product code')
I have attached the full_df csv, please have a look and correct me what I am missing here.

Attached Files

.csv   full.csv (Size: 64.63 KB / Downloads: 73)
Reply
#2
I have no idea what would you like to do with data, but in with csv-s its often easier to use screwdriver (csv module) instead of hammer (pandas).

I would import csv file into ordered dict and would do any manipulation there:

import csv

with open('full.csv', 'r', encoding='UTF-8') as f:
    data = f.read().splitlines()
    products = list(csv.DictReader(data, delimiter=','))
Now all rows are in variable 'products' which is OrderedDict in form of:

Output:
OrderedDict([('Product code', 'PHBC'), ('Category', 'All/For Baby/Feeding /Pacifiers'), ('List price', '6.5'), ('Price', '6.5'), ('Weight', '0'), ('Quantity', '93'), ('Min quantity', '0'), ('Shipping freight', '0'), ('Date added', '07-01-2014 00:00'), .....
One can access needed column by:

products[0]['Options']
which will give:
Output:
"['Newborn (2.5-6kg)', 'Big Baby (6-9kg)']"
As empty lists are False, I would make needed conversions something like that (it will ignore []):

for row in products:
    if row['Options']:             
         # do your magic here
 
As said, I have no idea what kind of task you would like to accomplish, so it may help you or be totally irrelevant.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
(Jul-20-2018, 08:16 AM)perfringo Wrote: As empty lists are False, I would make needed conversions something like that (it will ignore []):

Actually, because this comes from csv, this would be not list/empty list, but simple str, so "[]" would not be a empty list, thus your last snippet will not work as expected.

I guess, when information was scraped from the site and the csv was created the list of options was written as str.
Can you influence the format of the file?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#4
(Jul-20-2018, 08:28 AM)buran Wrote:
(Jul-20-2018, 08:16 AM)perfringo Wrote: As empty lists are False, I would make needed conversions something like that (it will ignore []):

Actually, because this comes from csv, this would be not list/empty list, but simple str, so "[]" would not be a empty list, thus your last snippet will not work as expected.

You are right! I stand corrected. One should always inspect data types with precision.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
in this case it's relatively safe to use eval

>>> product_options = eval("[['Newborn (2.5-6kg)', 'Big Baby (6-9kg)']]")
>>> product_options
[['Newborn (2.5-6kg)', 'Big Baby (6-9kg)']]
>>> product_options[0][0]
'Newborn (2.5-6kg)'
>>>
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#6
(Jul-20-2018, 08:40 AM)perfringo Wrote:
(Jul-20-2018, 08:28 AM)buran Wrote: Actually, because this comes from csv, this would be not list/empty list, but simple str, so "[]" would not be a empty list, thus your last snippet will not work as expected.

You are right! I stand corrected. One should always inspect data types with precision.

One way is to use re to parse list out of string.

import re

def string_to_list(s):
    '''Parse string and return list inside string'''
    refine = re.sub("[\[\],\s]","",s)
    splitted = re.split("[\'\"]", refine)
    result = [s for s in splitted if s != '']
    return result
Then:

for row in products:
    if string_to_list(row['Options']):
        # do your magic here
Output on file provided is:

Output:
['Newborn(2.5-6kg)', 'BigBaby(6-9kg)'] ['Newborn(2.5-6kg)', 'BigBaby(6-9kg)', 'Preemie(1.5-2.5kg)'] ['Newborn(2.5-6kg)', 'BigBaby(6-9kg)'] ['Newborn(2.5-6kg)', 'BigBaby(6-9kg)'] ['Newborn(2.5-6kg)', 'BigBaby(6-9kg)'] ['Newborn(2.5-6kg)', 'BigBaby(6-9kg)']
Note, that result of function is list; after doing magic it would be probably good idea to convert result back to string.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#7
Hi All,

I am working on a ETL process using Python3.6.
For now I am facing one issue while unnesting 'Options' column in my dataframe.

Currently I am downloading two csv files from FTP location and then performing cleaning/transformation on them.
For one csv file my unnesting is working as expected by transforming the product code and options values as below-
[Image: view?usp=sharing]

But for the second csv file, options column is not unnested properly like below-
[Image: view?usp=sharing]

My code is in below link-
my python code

I have attached the dataframes generated from csvs with minimal columns and full dataframe also so that anyone can test.

Please see once and let me know if need anything from my side.

Attached Files

.csv   df1.csv (Size: 484 bytes / Downloads: 15)
.csv   df2.csv (Size: 526 bytes / Downloads: 17)
.csv   full_df.csv (Size: 390 bytes / Downloads: 16)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Finding the correct Python framework with cmake kavyasahu 0 1,793 Jun-26-2019, 10:04 AM
Last Post: kavyasahu
  are these equation to PyOpenCV API function conversions correct? jasper13 0 2,290 Oct-06-2018, 03:03 PM
Last Post: jasper13

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020