Complete NEWB and openpyxl project

**buran** · (This post was last modified: Jan-10-2019, 08:04 PM by buran.)

Note that reorder codes in these dummy data are unique, i.e. no repeat values. In the previous file there were repeated values. That may affect implementation

**Gribouillis** · Jan-10-2019, 08:15 PM

Netopia Wrote:I'm stuck trying to figure out how to go from this general list of tasks to actually doing it.

Start with simple tasks. For example start with the goal of printing all the records that belong to group1.

Netopia · Jan-10-2019, 08:30 PM

(Jan-10-2019, 08:03 PM)buran Wrote: Note that reorder codes in these dummy data are unique, i.e. no repeat values. In the previous file there were repeated values. That may affect implementation

I'll add more records so there will be multiples. Thank you.

**buran** · (This post was last modified: Jan-10-2019, 08:36 PM by buran.)

Here is sample implementation (actually two, slightly different alternatives) how to read and sort the data. Hope it will kick-start you
Of course plenty can be improved.

from collections import defaultdict
from collections import namedtuple # used in OPTION 2
from openpyxl.reader.excel import load_workbook

filename = 'Example.xlsx'
wb = load_workbook(filename)
ws = wb.active

# read header and all data in two variables -> header and data
# Note the asteriks in frot of data
# This way data will hold everything except the first element in ws.values

header, *data = ws.values



# OPTION 1
print('\n\nOPTION 1 output')
# parse data into groups
groups = defaultdict(list)

for row in data:
    group = row[-1][-1]
    groups[group].append(row)

#print each group in desired order
for group, rows in sorted(groups.items()):
    print('\n\nGroup {}\n---------------\n'.format(group))
    for row in sorted(rows, key=lambda x: x[-1][:-1]):
        print(row)


# OPTION 2
print('\n\nOPTION 2 output')

# parse data into groups
groups = defaultdict(list)

# define namedtuple
fields = ('cust', 'first_name', 'last_name', 'full_name', 'reorder')
Record = namedtuple('Record', fields)

for row in data:
    # parse each row into Record namedtuple
    record = Record(*row)
    group = record.reorder[-1] # last char in the recorder property in the record
    groups[group].append(record)

#print each group in desired order
for group, records in sorted(groups.items()):
    print('\n\nGroup {}\n---------------\n'.format(group))
    for record in sorted(records, key=lambda rec: rec.reorder[:-1]):
        print(record)

OUTPUT:

Hide/Show

Output:OPTION 1 output


Group 1
---------------

(33011907.77726958, 'Vancito E.', 'Gumbs SR', 'Vancito E. Gumbs SR', 'ABCDE1')
(57625313.66239607, 'Ludovico', 'Orejola', 'Ludovico Orejola', 'AMZA1')
(12182451.503507763, 'William', 'Spence', 'William Spence', 'BHWDH1')
(52637815.97059171, 'Barbara', 'Bidwell', 'Barbara Bidwell', 'KVJRE1')
(50734695.10281905, 'Thomas', 'Burnickas', 'Thomas Burnickas', 'LSRHSZ1')
(77812033.03435843, 'Susanna', 'Glick', 'Susanna Glick', 'MBX1')
(26484711.23349142, 'William', 'Spence', 'William Spence', 'MNOPQ1')
(11377165.529150324, 'Yvonne', 'Vickers', 'Yvonne Vickers', 'ZYXWV1')


Group 2
---------------

(37933671.622074716, 'Diane', 'Spindler', 'Diane Spindler', 'ABCDE2')
(57580665.29832982, 'Peter', 'Chea', 'Peter Chea', 'AMZA2')
(15460408.966568317, 'William', 'Seifred', 'William Seifred', 'BHWDH2')
(89706907.37752022, 'Marcia', 'Hammond', 'Marcia Hammond', 'KVJRE2')
(34077016.74244265, 'Maureen', 'Wessel', 'Maureen Wessel', 'LSRHSZ2')
(14929469.194886446, 'Jeff', 'Hammond', 'Jeff Hammond', 'MBX2')
(98449662.99566273, 'Thomas', 'Ferris', 'Thomas Ferris', 'MNOPQ2')
(20809672.42084627, 'James', 'Dunlevy', 'James Dunlevy', 'ZYXWV2')


Group 3
---------------

(6314554.949454221, 'Atmaram', 'Patel', 'Atmaram Patel', 'ABCDE3')
(81544654.82920109, 'Monica', 'Howard', 'Monica Howard', 'AMZA3')
(6967720.845391856, 'Kathleen', 'Scanlan', 'Kathleen Scanlan', 'BHWDH3')
(80462211.22157924, 'Albert', 'Goodwin', 'Albert Goodwin', 'KVJRE3')
(63441201.600005, 'Billie', 'Greeke', 'Billie Greeke', 'LSRHSZ3')
(76206790.42156677, 'Izabella', 'Jedwabski', 'Izabella Jedwabski', 'MBX3')
(30455077.172918256, 'Joan', 'Donovan', 'Joan Donovan', 'MNOPQ3')
(29981174.056717496, 'Robert', 'Duncan', 'Robert Duncan', 'ZYXWV3')


Group 4
---------------

(55135877.35924682, 'Gregory', 'Lombardi', 'Gregory Lombardi', 'ABCDE4')
(56473829, 'Peter', 'Murphy', 'Peter Murphy', 'AMZA4')
(29463356.5493892, 'Mildred', 'Hogue', 'Mildred Hogue', 'BHWDH4')
(49935290.262403, 'Alice', 'Cataldo', 'Alice Cataldo', 'KVJRE4')
(98083780.20955186, 'John', 'Wittekind', 'John Wittekind', 'LSRHSZ4')
(88992104.2779649, 'Jane', 'Bilon', 'Jane Bilon', 'MBX4')
(19455007.061793316, 'Peter', 'Murphy', 'Peter Murphy', 'MNOPQ4')
(11319936.032599876, 'Edward', 'Durham', 'Edward Durham', 'ZYXWV4')


OPTION 2 output


Group 1
---------------

Record(cust=33011907.77726958, first_name='Vancito E.', last_name='Gumbs SR', full_name='Vancito E. Gumbs SR', reorder='ABCDE1')
Record(cust=57625313.66239607, first_name='Ludovico', last_name='Orejola', full_name='Ludovico Orejola', reorder='AMZA1')
Record(cust=12182451.503507763, first_name='William', last_name='Spence', full_name='William Spence', reorder='BHWDH1')
Record(cust=52637815.97059171, first_name='Barbara', last_name='Bidwell', full_name='Barbara Bidwell', reorder='KVJRE1')
Record(cust=50734695.10281905, first_name='Thomas', last_name='Burnickas', full_name='Thomas Burnickas', reorder='LSRHSZ1')
Record(cust=77812033.03435843, first_name='Susanna', last_name='Glick', full_name='Susanna Glick', reorder='MBX1')
Record(cust=26484711.23349142, first_name='William', last_name='Spence', full_name='William Spence', reorder='MNOPQ1')
Record(cust=11377165.529150324, first_name='Yvonne', last_name='Vickers', full_name='Yvonne Vickers', reorder='ZYXWV1')


Group 2
---------------

Record(cust=37933671.622074716, first_name='Diane', last_name='Spindler', full_name='Diane Spindler', reorder='ABCDE2')
Record(cust=57580665.29832982, first_name='Peter', last_name='Chea', full_name='Peter Chea', reorder='AMZA2')
Record(cust=15460408.966568317, first_name='William', last_name='Seifred', full_name='William Seifred', reorder='BHWDH2')
Record(cust=89706907.37752022, first_name='Marcia', last_name='Hammond', full_name='Marcia Hammond', reorder='KVJRE2')
Record(cust=34077016.74244265, first_name='Maureen', last_name='Wessel', full_name='Maureen Wessel', reorder='LSRHSZ2')
Record(cust=14929469.194886446, first_name='Jeff', last_name='Hammond', full_name='Jeff Hammond', reorder='MBX2')
Record(cust=98449662.99566273, first_name='Thomas', last_name='Ferris', full_name='Thomas Ferris', reorder='MNOPQ2')
Record(cust=20809672.42084627, first_name='James', last_name='Dunlevy', full_name='James Dunlevy', reorder='ZYXWV2')


Group 3
---------------

Record(cust=6314554.949454221, first_name='Atmaram', last_name='Patel', full_name='Atmaram Patel', reorder='ABCDE3')
Record(cust=81544654.82920109, first_name='Monica', last_name='Howard', full_name='Monica Howard', reorder='AMZA3')
Record(cust=6967720.845391856, first_name='Kathleen', last_name='Scanlan', full_name='Kathleen Scanlan', reorder='BHWDH3')
Record(cust=80462211.22157924, first_name='Albert', last_name='Goodwin', full_name='Albert Goodwin', reorder='KVJRE3')
Record(cust=63441201.600005, first_name='Billie', last_name='Greeke', full_name='Billie Greeke', reorder='LSRHSZ3')
Record(cust=76206790.42156677, first_name='Izabella', last_name='Jedwabski', full_name='Izabella Jedwabski', reorder='MBX3')
Record(cust=30455077.172918256, first_name='Joan', last_name='Donovan', full_name='Joan Donovan', reorder='MNOPQ3')
Record(cust=29981174.056717496, first_name='Robert', last_name='Duncan', full_name='Robert Duncan', reorder='ZYXWV3')


Group 4
---------------

Record(cust=55135877.35924682, first_name='Gregory', last_name='Lombardi', full_name='Gregory Lombardi', reorder='ABCDE4')
Record(cust=56473829, first_name='Peter', last_name='Murphy', full_name='Peter Murphy', reorder='AMZA4')
Record(cust=29463356.5493892, first_name='Mildred', last_name='Hogue', full_name='Mildred Hogue', reorder='BHWDH4')
Record(cust=49935290.262403, first_name='Alice', last_name='Cataldo', full_name='Alice Cataldo', reorder='KVJRE4')
Record(cust=98083780.20955186, first_name='John', last_name='Wittekind', full_name='John Wittekind', reorder='LSRHSZ4')
Record(cust=88992104.2779649, first_name='Jane', last_name='Bilon', full_name='Jane Bilon', reorder='MBX4')
Record(cust=19455007.061793316, first_name='Peter', last_name='Murphy', full_name='Peter Murphy', reorder='MNOPQ4')
Record(cust=11319936.032599876, first_name='Edward', last_name='Durham', full_name='Edward Durham', reorder='ZYXWV4')

Netopia · Jan-10-2019, 08:39 PM

Ok, I've modified it so that there are four of every Reorder value. Thank goodness for online random name generators! :) Not something I'd ever thought of googling before today.

Netopia · Jan-11-2019, 01:19 PM

Gribouillis,

I had some success implementing the sample code you gave. I can now get a list of the data, with the len equaling the number of rows! Thank you!

buran
I'm going to go through your examples line by line and see what I can understand. I'll post back about anything I don't understand so that I can learn, if that's ok.

To all - what is the difference I see in some places between something like:

import openpyxl

and

from openpyxl import load_workbook

and

from openpyxl.reader.excel import load_workbook

??

**buran** · (This post was last modified: Jan-11-2019, 01:44 PM by buran.)

Both are the same. If you look at the docs or at the source code you will see that openpyxl package is organised in sub-packages. One of these sub-packages is openpyxl.reader. in sub-package openpyxl.reader there is module excel.py and load_workbook() function is part of this module.
So using from openpyxl.reader.excel import load_workbook is very explicit from where you import it.

At the same time the __init__.py file for top-level openpyxl package
imports the load_workbook() effectively exposing it and makes it possible to use from openpyxl import load_workbook or if you do just import openpyxl to use wb = openpyxl.load_workbook(some_file)

It's perfectly fine to use any of these - do as you prefer.

Netopia · Jan-11-2019, 01:48 PM

Thank you for that explanation buran.

Already I'm learning! I puzzled over the line:

header, *data = ws.values

How did it know to pull the header out just by having a variable named "header"?

So, I played with:

this, that, those, *other = ws.values

And happily discovered that each variable = one row and that putting the * in front of the last one makes it skip rows for whatever number of other variables are already defined. Does this work for any collection of data, or just for openpyxl?

I'm having fun learning! Thank you guys for taking the time to help me. :)

**buran** · Jan-11-2019, 01:54 PM

pay attention to comments in the code (and ignore the typing errors :-))

Quote:# read header and all data in two variables -> header and data
# Note the asteriks in frot of data
# This way data will hold everything except the first element in ws.values

and if you are interested to read further
PEP 3132 -- Extended Iterable Unpacking

Netopia · (This post was last modified: Jan-11-2019, 02:19 PM by Netopia.)

(Jan-11-2019, 01:54 PM)buran Wrote: pay attention to comments in the code (and ignore the typing errors :-))

Quote:# read header and all data in two variables -> header and data
# Note the asteriks in frot of data
# This way data will hold everything except the first element in ws.values

and if you are interested to read further
PEP 3132 -- Extended Iterable Unpacking

I did read that. But even though you told me that, I didn't understand how it worked. How it 'knew' that header was the first row. I didn't know that having multiple variables corresponded to rows, one at a time. I didn't know that the * picked up from not just row 2, but from whatever number the last variable was. You pointed me in the right direction with your comments, and that helped me to learn more.

I promise to do my best to read whatever you have taken the time to write! :)

groups = defaultdict(list)

for row in data:
    group = row[-1][-1]
    groups[group].append(row)

I was wondering if the word 'row' had any significance other than being easy to understand. I replaced it temporarily with the word 'this', and everything still worked fine.

Do I understand then, that since you used (list) for defaultdict, that it created a list and so when you say 'for xxxx in yyy', python just grabs the largest unit size, in this case being an entire row? I really am new to this stuff, and this might be basic, but I'm trying to make sure that I don't assume something and then later be confused when something doesn't work the way I thought it did.

Joe

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Python newb need help	Fictile	1	943	Apr-02-2024, 03:28 AM Last Post: buran
	NameError issue with daughter's newb code	MrGonk	2	2,190	Sep-16-2021, 01:29 PM Last Post: BashBedlam
	Simple newb string question	Involute	2	2,992	Sep-08-2019, 12:50 AM Last Post: Involute
	please help this newb install pygame	iofhua	7	8,303	May-15-2019, 01:09 PM Last Post: buran
	Newb question: Debugging + Linting Python in Visual Studio Code	Drone4four	1	3,228	Apr-15-2019, 06:19 AM Last Post: perfringo
	Newb question about %02d %04d	bennylava	30	26,101	Mar-05-2019, 11:23 PM Last Post: snippsat
	Pthyon 3 question (newb)	bennylava	11	8,212	Feb-28-2019, 06:04 PM Last Post: buran
	newb selfie	PatM	5	4,984	Feb-19-2019, 12:20 AM Last Post: snippsat
	Newb Question - Threading in Crons	vvarrior	2	3,586	Jul-20-2018, 08:12 PM Last Post: vvarrior
	Matt's newb question 1	MattSS102	1	3,372	Aug-28-2017, 03:27 AM Last Post: BerlingSwe

Complete NEWB and openpyxl project

User Panel Messages

Announcements