Python Forum

Full Version: Insert into sublist if sublist is not having same no of records.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Having huge no of records in original requirement.
Here a small peace of sample data.

results = [
    (Decimal('28000.0000'), 2010, 272), (Decimal('308000.0000'), 2011, 272),
    (Decimal('582000.0000'), 2012, 272), (Decimal('387000.0000'), 2013, 272),
    (Decimal('367000.0000'), 2010, 281), (Decimal('3037000.0000'), 2011, 281),
    (Decimal('4822000.0000'), 2012, 281), (Decimal('2936000.0000'), 2013, 281),
    (Decimal('637000.0000'), 2010, 282), (Decimal('3440000.0000'), 2011, 282),
    (Decimal('478000.0000'), 2010, 296), (Decimal('1209000.0000'), 2012, 296)
]

#Desired output

CalendarYear = [2010,2011,2012,2013]
Salesamount = [[28000.00,308000.00,582000.00,387000.00],[367000.00,3037000.00,4822000.00,2936000.00],[637000.00,3440000.00,0,0],[478000.00,0,1209000.00,0]]
Employee = [272,281,282,296]
How to add zeros or null to Salesamount list by forming sublist as here each sublist should have 4 records.
For the year 2012 and 2013 data is not there for Employee 282 and for the year 2011 and 2013 data is not there for Employee 296 , so i need to add 0 or NULL .

Got some google info, but it is hardcode value, where as mine is a dynamic one.

https://stackoverflow.com/questions/2144...ain-length]

Also this is different from
https://python-forum.io/Thread-Split-Lis...rom-Pyodbc
as here we are splitting data not adding zeros or null.

For better understanding having single/ same set of data to work around.
What have you tried?
Since my data is from pyodbc source i have modified my data there.
Have created database view with left join which has null value.
Have tried some thing like below but it did not return my result
Had my data in pandas and tried, but no luck

>>> df
    SalesamountQuota  CalendarYear  Employeekey
0            28000.0          2010          272
1           308000.0          2011          272
2           582000.0          2012          272
3           387000.0          2013          272
4           367000.0          2010          281
5          3037000.0          2011          281
6          4822000.0          2012          281
7          2936000.0          2013          281
8           637000.0          2010          282
9          3440000.0          2011          282
10          478000.0          2010          296
11        12090000.0          2012          296

df_emp =df[['CalendarYear','Employeekey']]
emp=df['Employeekey'].unique()
empnew=pd.DataFrame(data=emp.flatten())
empnew.columns =['Employeekey']
for i in empnew:
    pd.merge(df_emp,df,left_on='Employeekey', right_on='CalendarYear',how='right')
Is there a way to fill zero's?
It's probably not the answer you looking for, but "the way to fill zero's" in case of iterables is:

>>> from itertools import zip_longest
>>> a = [1, 2, 3]
>>> b = [10, 20, 30, 40]
>>> list(zip_longest(a, b, fillvalue=0))
[(1, 10), (2, 20), (3, 30), (0, 40)]
From Database i am able to do it by adding Zeros , unable to do it in python.

Select * from #FactSalesQuota

SalesamountQuota	CalendarYear	Employeekey
28000.00	2010	272
308000.00	2011	272
582000.00	2012	272
387000.00	2013	272
367000.00	2010	281
3037000.00	2011	281
4822000.00	2012	281
2936000.00	2013	281
637000.00	2010	282
3440000.00	2011	282
478000.00	2010	296
1209000.00	2012	296
CREATE TABLE #CalendarSales_final (SalesamountQuota DECIMAL(20,4),CalendarYear INT,Employeekey INT)

SELECT  Distinct Identity(INT,1,1) Rid,EmployeeKey INTO #temp  FROM FactSalesQuota

SELECT  DISTINCT Calendaryear INTO #dates  FROM FactSalesQuota
DECLARE @i INT,@j INT=1,@Emp INT
SELECT  @i =Rid FROM #temp

WHILE @j<=@i 
Begin
SELECT  @emp=EmployeeKey FROM #temp where Rid=@j
INSERT INTO #CalendarSales_final
SELECT  ISNULL(C.salesamountQuota,0)SalesamountQuota, d.*,@emp FROM FactSalesQuota c Right join #Dates as d   ON c.Calendaryear=d.Calendaryear 
and EmployeeKey  in (SELECT  t. EmployeeKey FROM #temp as t WHERE Rid=@j)
Set @j=@j+1
end

SELECT  * FROM #CalendarSales_final 
Any suggestions how can we do it in python by filling zeros.
https://stackoverflow.com/questions/2144...ain-length

The above link adds zero with default position, were as my requirement is different from the above how to add zero's in my case.
Any suggestion would be highly appreciated. Smile Smile
I added pandas way too, trying to make it a little more concise, it would be cool if some pandas wizard would post an optimal solution just to check how good it can possibly look like

def Decimal(x):
    return x

results = [
    (Decimal('28000.0000'), 2010, 272), (Decimal('308000.0000'), 2011, 272),
    (Decimal('582000.0000'), 2012, 272), (Decimal('387000.0000'), 2013, 272),
    (Decimal('367000.0000'), 2010, 281), (Decimal('3037000.0000'), 2011, 281),
    (Decimal('4822000.0000'), 2012, 281), (Decimal('2936000.0000'), 2013, 281),
    (Decimal('637000.0000'), 2010, 282), (Decimal('3440000.0000'), 2011, 282),
    (Decimal('478000.0000'), 2010, 296), (Decimal('1209000.0000'), 2012, 296)
]
 
#Desired output


years = sorted(set([x[1] for x in results]))
employees = sorted(set([x[2] for x in results]))

sales = []
for emp in employees:
    
    # get all the results for that particular employee
    emp_results = [x for x in results if x[2] == emp]

    # init/fill emp_sales with as many 0's as there are years
    emp_sales = [0] * len(years)
    
    for year_index, year in enumerate(years):
        for result_year, result_sales_amount in [(x[1], x[0]) for x in emp_results]:
            if year == result_year:
                emp_sales[year_index] = result_sales_amount
                
    sales.append(emp_sales)
    
print('years:\n', years, '\n')
print('employees:\n', employees, '\n')
print('sales:\n', sales, '\n')
Output:
years: [2010, 2011, 2012, 2013] employees: [272, 281, 282, 296] sales: [['28000.0000', '308000.0000', '582000.0000', '387000.0000'], ['367000.0000', '3037000.0000', '4822000.0000', '2936000.0000'], ['637000.0000', '3440000.0000', 0, 0], ['478000.0000', 0, '1209000.0000', 0]]
import pandas as pd

def Decimal(x):
    return x

results = [
    (Decimal('28000.0000'), 2010, 272), (Decimal('308000.0000'), 2011, 272),
    (Decimal('582000.0000'), 2012, 272), (Decimal('387000.0000'), 2013, 272),
    (Decimal('367000.0000'), 2010, 281), (Decimal('3037000.0000'), 2011, 281),
    (Decimal('4822000.0000'), 2012, 281), (Decimal('2936000.0000'), 2013, 281),
    (Decimal('637000.0000'), 2010, 282), (Decimal('3440000.0000'), 2011, 282),
    (Decimal('478000.0000'), 2010, 296), (Decimal('1209000.0000'), 2012, 296)
]

pd.set_option('display.float_format', '{:.2f}'.format)

df = pd.DataFrame(results, columns = ['sales' , 'year', 'employee'])

years = df['year'].unique()
employees = df['employee'].unique()

new_df = pd.DataFrame(index = employees, columns = years)

for emp in employees:
    emp_df = df.loc[df['employee'] == emp]

    new_df.loc[emp, : ] = [float(emp_df.loc[emp_df['year']==year]['sales']) if year in list(emp_df['year']) else 0 for year in years]

print(new_df)
Output:
2010 2011 2012 2013 272 28000.00 308000.00 582000.00 387000.00 281 367000.00 3037000.00 4822000.00 2936000.00 282 637000.00 3440000.00 0 0 296 478000.00 0 1209000.00 0
Pages: 1 2