Python Forum
Insert into sublist if sublist is not having same no of records. - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Insert into sublist if sublist is not having same no of records. (/thread-18124.html)

Pages: 1 2


Insert into sublist if sublist is not having same no of records. - parthi1705 - May-06-2019

Having huge no of records in original requirement.
Here a small peace of sample data.

results = [
    (Decimal('28000.0000'), 2010, 272), (Decimal('308000.0000'), 2011, 272),
    (Decimal('582000.0000'), 2012, 272), (Decimal('387000.0000'), 2013, 272),
    (Decimal('367000.0000'), 2010, 281), (Decimal('3037000.0000'), 2011, 281),
    (Decimal('4822000.0000'), 2012, 281), (Decimal('2936000.0000'), 2013, 281),
    (Decimal('637000.0000'), 2010, 282), (Decimal('3440000.0000'), 2011, 282),
    (Decimal('478000.0000'), 2010, 296), (Decimal('1209000.0000'), 2012, 296)
]

#Desired output

CalendarYear = [2010,2011,2012,2013]
Salesamount = [[28000.00,308000.00,582000.00,387000.00],[367000.00,3037000.00,4822000.00,2936000.00],[637000.00,3440000.00,0,0],[478000.00,0,1209000.00,0]]
Employee = [272,281,282,296]
How to add zeros or null to Salesamount list by forming sublist as here each sublist should have 4 records.
For the year 2012 and 2013 data is not there for Employee 282 and for the year 2011 and 2013 data is not there for Employee 296 , so i need to add 0 or NULL .

Got some google info, but it is hardcode value, where as mine is a dynamic one.

https://stackoverflow.com/questions/21447538/how-to-insert-an-item-into-a-sublist-if-sublist-is-a-certain-length]

Also this is different from
https://python-forum.io/Thread-Split-List-and-Sublist-from-Pyodbc
as here we are splitting data not adding zeros or null.

For better understanding having single/ same set of data to work around.


RE: Insert into sublist if sublist is not having same no of records. - micseydel - May-08-2019

What have you tried?


RE: Insert into sublist if sublist is not having same no of records. - parthi1705 - May-09-2019

Since my data is from pyodbc source i have modified my data there.
Have created database view with left join which has null value.


RE: Insert into sublist if sublist is not having same no of records. - parthi1705 - May-10-2019

Have tried some thing like below but it did not return my result
Had my data in pandas and tried, but no luck

>>> df
    SalesamountQuota  CalendarYear  Employeekey
0            28000.0          2010          272
1           308000.0          2011          272
2           582000.0          2012          272
3           387000.0          2013          272
4           367000.0          2010          281
5          3037000.0          2011          281
6          4822000.0          2012          281
7          2936000.0          2013          281
8           637000.0          2010          282
9          3440000.0          2011          282
10          478000.0          2010          296
11        12090000.0          2012          296

df_emp =df[['CalendarYear','Employeekey']]
emp=df['Employeekey'].unique()
empnew=pd.DataFrame(data=emp.flatten())
empnew.columns =['Employeekey']
for i in empnew:
    pd.merge(df_emp,df,left_on='Employeekey', right_on='CalendarYear',how='right')



RE: Insert into sublist if sublist is not having same no of records. - parthi1705 - May-16-2019

Is there a way to fill zero's?


RE: Insert into sublist if sublist is not having same no of records. - perfringo - May-16-2019

It's probably not the answer you looking for, but "the way to fill zero's" in case of iterables is:

>>> from itertools import zip_longest
>>> a = [1, 2, 3]
>>> b = [10, 20, 30, 40]
>>> list(zip_longest(a, b, fillvalue=0))
[(1, 10), (2, 20), (3, 30), (0, 40)]



RE: Insert into sublist if sublist is not having same no of records. - parthi1705 - May-16-2019

From Database i am able to do it by adding Zeros , unable to do it in python.

Select * from #FactSalesQuota

SalesamountQuota	CalendarYear	Employeekey
28000.00	2010	272
308000.00	2011	272
582000.00	2012	272
387000.00	2013	272
367000.00	2010	281
3037000.00	2011	281
4822000.00	2012	281
2936000.00	2013	281
637000.00	2010	282
3440000.00	2011	282
478000.00	2010	296
1209000.00	2012	296
CREATE TABLE #CalendarSales_final (SalesamountQuota DECIMAL(20,4),CalendarYear INT,Employeekey INT)

SELECT  Distinct Identity(INT,1,1) Rid,EmployeeKey INTO #temp  FROM FactSalesQuota

SELECT  DISTINCT Calendaryear INTO #dates  FROM FactSalesQuota
DECLARE @i INT,@j INT=1,@Emp INT
SELECT  @i =Rid FROM #temp

WHILE @j<=@i 
Begin
SELECT  @emp=EmployeeKey FROM #temp where Rid=@j
INSERT INTO #CalendarSales_final
SELECT  ISNULL(C.salesamountQuota,0)SalesamountQuota, d.*,@emp FROM FactSalesQuota c Right join #Dates as d   ON c.Calendaryear=d.Calendaryear 
and EmployeeKey  in (SELECT  t. EmployeeKey FROM #temp as t WHERE Rid=@j)
Set @j=@j+1
end

SELECT  * FROM #CalendarSales_final 



RE: Insert into sublist if sublist is not having same no of records. - parthi1705 - May-22-2019

Any suggestions how can we do it in python by filling zeros.


RE: Insert into sublist if sublist is not having same no of records. - parthi1705 - May-28-2019

https://stackoverflow.com/questions/21447538/how-to-insert-an-item-into-a-sublist-if-sublist-is-a-certain-length

The above link adds zero with default position, were as my requirement is different from the above how to add zero's in my case.
Any suggestion would be highly appreciated. Smile Smile


RE: Insert into sublist if sublist is not having same no of records. - michalmonday - May-28-2019

I added pandas way too, trying to make it a little more concise, it would be cool if some pandas wizard would post an optimal solution just to check how good it can possibly look like

def Decimal(x):
    return x

results = [
    (Decimal('28000.0000'), 2010, 272), (Decimal('308000.0000'), 2011, 272),
    (Decimal('582000.0000'), 2012, 272), (Decimal('387000.0000'), 2013, 272),
    (Decimal('367000.0000'), 2010, 281), (Decimal('3037000.0000'), 2011, 281),
    (Decimal('4822000.0000'), 2012, 281), (Decimal('2936000.0000'), 2013, 281),
    (Decimal('637000.0000'), 2010, 282), (Decimal('3440000.0000'), 2011, 282),
    (Decimal('478000.0000'), 2010, 296), (Decimal('1209000.0000'), 2012, 296)
]
 
#Desired output


years = sorted(set([x[1] for x in results]))
employees = sorted(set([x[2] for x in results]))

sales = []
for emp in employees:
    
    # get all the results for that particular employee
    emp_results = [x for x in results if x[2] == emp]

    # init/fill emp_sales with as many 0's as there are years
    emp_sales = [0] * len(years)
    
    for year_index, year in enumerate(years):
        for result_year, result_sales_amount in [(x[1], x[0]) for x in emp_results]:
            if year == result_year:
                emp_sales[year_index] = result_sales_amount
                
    sales.append(emp_sales)
    
print('years:\n', years, '\n')
print('employees:\n', employees, '\n')
print('sales:\n', sales, '\n')
Output:
years: [2010, 2011, 2012, 2013] employees: [272, 281, 282, 296] sales: [['28000.0000', '308000.0000', '582000.0000', '387000.0000'], ['367000.0000', '3037000.0000', '4822000.0000', '2936000.0000'], ['637000.0000', '3440000.0000', 0, 0], ['478000.0000', 0, '1209000.0000', 0]]
import pandas as pd

def Decimal(x):
    return x

results = [
    (Decimal('28000.0000'), 2010, 272), (Decimal('308000.0000'), 2011, 272),
    (Decimal('582000.0000'), 2012, 272), (Decimal('387000.0000'), 2013, 272),
    (Decimal('367000.0000'), 2010, 281), (Decimal('3037000.0000'), 2011, 281),
    (Decimal('4822000.0000'), 2012, 281), (Decimal('2936000.0000'), 2013, 281),
    (Decimal('637000.0000'), 2010, 282), (Decimal('3440000.0000'), 2011, 282),
    (Decimal('478000.0000'), 2010, 296), (Decimal('1209000.0000'), 2012, 296)
]

pd.set_option('display.float_format', '{:.2f}'.format)

df = pd.DataFrame(results, columns = ['sales' , 'year', 'employee'])

years = df['year'].unique()
employees = df['employee'].unique()

new_df = pd.DataFrame(index = employees, columns = years)

for emp in employees:
    emp_df = df.loc[df['employee'] == emp]

    new_df.loc[emp, : ] = [float(emp_df.loc[emp_df['year']==year]['sales']) if year in list(emp_df['year']) else 0 for year in years]

print(new_df)
Output:
2010 2011 2012 2013 272 28000.00 308000.00 582000.00 387000.00 281 367000.00 3037000.00 4822000.00 2936000.00 282 637000.00 3440000.00 0 0 296 478000.00 0 1209000.00 0