Posts: 8,167
Threads: 160
Joined: Sep 2016
Jan-10-2019, 08:03 PM
(This post was last modified: Jan-10-2019, 08:04 PM by buran.)
Note that reorder codes in these dummy data are unique, i.e. no repeat values. In the previous file there were repeated values. That may affect implementation
Posts: 4,801
Threads: 77
Joined: Jan 2018
Netopia Wrote:I'm stuck trying to figure out how to go from this general list of tasks to actually doing it. Start with simple tasks. For example start with the goal of printing all the records that belong to group1.
Posts: 27
Threads: 2
Joined: Jan 2019
(Jan-10-2019, 08:03 PM)buran Wrote: Note that reorder codes in these dummy data are unique, i.e. no repeat values. In the previous file there were repeated values. That may affect implementation
I'll add more records so there will be multiples. Thank you.
Posts: 8,167
Threads: 160
Joined: Sep 2016
Jan-10-2019, 08:36 PM
(This post was last modified: Jan-10-2019, 08:36 PM by buran.)
Here is sample implementation (actually two, slightly different alternatives) how to read and sort the data. Hope it will kick-start you
Of course plenty can be improved.
from collections import defaultdict
from collections import namedtuple # used in OPTION 2
from openpyxl.reader.excel import load_workbook
filename = 'Example.xlsx'
wb = load_workbook(filename)
ws = wb.active
# read header and all data in two variables -> header and data
# Note the asteriks in frot of data
# This way data will hold everything except the first element in ws.values
header, *data = ws.values
# OPTION 1
print('\n\nOPTION 1 output')
# parse data into groups
groups = defaultdict(list)
for row in data:
group = row[-1][-1]
groups[group].append(row)
#print each group in desired order
for group, rows in sorted(groups.items()):
print('\n\nGroup {}\n---------------\n'.format(group))
for row in sorted(rows, key=lambda x: x[-1][:-1]):
print(row)
# OPTION 2
print('\n\nOPTION 2 output')
# parse data into groups
groups = defaultdict(list)
# define namedtuple
fields = ('cust', 'first_name', 'last_name', 'full_name', 'reorder')
Record = namedtuple('Record', fields)
for row in data:
# parse each row into Record namedtuple
record = Record(*row)
group = record.reorder[-1] # last char in the recorder property in the record
groups[group].append(record)
#print each group in desired order
for group, records in sorted(groups.items()):
print('\n\nGroup {}\n---------------\n'.format(group))
for record in sorted(records, key=lambda rec: rec.reorder[:-1]):
print(record) OUTPUT:
Output: OPTION 1 output
Group 1
---------------
(33011907.77726958, 'Vancito E.', 'Gumbs SR', 'Vancito E. Gumbs SR', 'ABCDE1')
(57625313.66239607, 'Ludovico', 'Orejola', 'Ludovico Orejola', 'AMZA1')
(12182451.503507763, 'William', 'Spence', 'William Spence', 'BHWDH1')
(52637815.97059171, 'Barbara', 'Bidwell', 'Barbara Bidwell', 'KVJRE1')
(50734695.10281905, 'Thomas', 'Burnickas', 'Thomas Burnickas', 'LSRHSZ1')
(77812033.03435843, 'Susanna', 'Glick', 'Susanna Glick', 'MBX1')
(26484711.23349142, 'William', 'Spence', 'William Spence', 'MNOPQ1')
(11377165.529150324, 'Yvonne', 'Vickers', 'Yvonne Vickers', 'ZYXWV1')
Group 2
---------------
(37933671.622074716, 'Diane', 'Spindler', 'Diane Spindler', 'ABCDE2')
(57580665.29832982, 'Peter', 'Chea', 'Peter Chea', 'AMZA2')
(15460408.966568317, 'William', 'Seifred', 'William Seifred', 'BHWDH2')
(89706907.37752022, 'Marcia', 'Hammond', 'Marcia Hammond', 'KVJRE2')
(34077016.74244265, 'Maureen', 'Wessel', 'Maureen Wessel', 'LSRHSZ2')
(14929469.194886446, 'Jeff', 'Hammond', 'Jeff Hammond', 'MBX2')
(98449662.99566273, 'Thomas', 'Ferris', 'Thomas Ferris', 'MNOPQ2')
(20809672.42084627, 'James', 'Dunlevy', 'James Dunlevy', 'ZYXWV2')
Group 3
---------------
(6314554.949454221, 'Atmaram', 'Patel', 'Atmaram Patel', 'ABCDE3')
(81544654.82920109, 'Monica', 'Howard', 'Monica Howard', 'AMZA3')
(6967720.845391856, 'Kathleen', 'Scanlan', 'Kathleen Scanlan', 'BHWDH3')
(80462211.22157924, 'Albert', 'Goodwin', 'Albert Goodwin', 'KVJRE3')
(63441201.600005, 'Billie', 'Greeke', 'Billie Greeke', 'LSRHSZ3')
(76206790.42156677, 'Izabella', 'Jedwabski', 'Izabella Jedwabski', 'MBX3')
(30455077.172918256, 'Joan', 'Donovan', 'Joan Donovan', 'MNOPQ3')
(29981174.056717496, 'Robert', 'Duncan', 'Robert Duncan', 'ZYXWV3')
Group 4
---------------
(55135877.35924682, 'Gregory', 'Lombardi', 'Gregory Lombardi', 'ABCDE4')
(56473829, 'Peter', 'Murphy', 'Peter Murphy', 'AMZA4')
(29463356.5493892, 'Mildred', 'Hogue', 'Mildred Hogue', 'BHWDH4')
(49935290.262403, 'Alice', 'Cataldo', 'Alice Cataldo', 'KVJRE4')
(98083780.20955186, 'John', 'Wittekind', 'John Wittekind', 'LSRHSZ4')
(88992104.2779649, 'Jane', 'Bilon', 'Jane Bilon', 'MBX4')
(19455007.061793316, 'Peter', 'Murphy', 'Peter Murphy', 'MNOPQ4')
(11319936.032599876, 'Edward', 'Durham', 'Edward Durham', 'ZYXWV4')
OPTION 2 output
Group 1
---------------
Record(cust=33011907.77726958, first_name='Vancito E.', last_name='Gumbs SR', full_name='Vancito E. Gumbs SR', reorder='ABCDE1')
Record(cust=57625313.66239607, first_name='Ludovico', last_name='Orejola', full_name='Ludovico Orejola', reorder='AMZA1')
Record(cust=12182451.503507763, first_name='William', last_name='Spence', full_name='William Spence', reorder='BHWDH1')
Record(cust=52637815.97059171, first_name='Barbara', last_name='Bidwell', full_name='Barbara Bidwell', reorder='KVJRE1')
Record(cust=50734695.10281905, first_name='Thomas', last_name='Burnickas', full_name='Thomas Burnickas', reorder='LSRHSZ1')
Record(cust=77812033.03435843, first_name='Susanna', last_name='Glick', full_name='Susanna Glick', reorder='MBX1')
Record(cust=26484711.23349142, first_name='William', last_name='Spence', full_name='William Spence', reorder='MNOPQ1')
Record(cust=11377165.529150324, first_name='Yvonne', last_name='Vickers', full_name='Yvonne Vickers', reorder='ZYXWV1')
Group 2
---------------
Record(cust=37933671.622074716, first_name='Diane', last_name='Spindler', full_name='Diane Spindler', reorder='ABCDE2')
Record(cust=57580665.29832982, first_name='Peter', last_name='Chea', full_name='Peter Chea', reorder='AMZA2')
Record(cust=15460408.966568317, first_name='William', last_name='Seifred', full_name='William Seifred', reorder='BHWDH2')
Record(cust=89706907.37752022, first_name='Marcia', last_name='Hammond', full_name='Marcia Hammond', reorder='KVJRE2')
Record(cust=34077016.74244265, first_name='Maureen', last_name='Wessel', full_name='Maureen Wessel', reorder='LSRHSZ2')
Record(cust=14929469.194886446, first_name='Jeff', last_name='Hammond', full_name='Jeff Hammond', reorder='MBX2')
Record(cust=98449662.99566273, first_name='Thomas', last_name='Ferris', full_name='Thomas Ferris', reorder='MNOPQ2')
Record(cust=20809672.42084627, first_name='James', last_name='Dunlevy', full_name='James Dunlevy', reorder='ZYXWV2')
Group 3
---------------
Record(cust=6314554.949454221, first_name='Atmaram', last_name='Patel', full_name='Atmaram Patel', reorder='ABCDE3')
Record(cust=81544654.82920109, first_name='Monica', last_name='Howard', full_name='Monica Howard', reorder='AMZA3')
Record(cust=6967720.845391856, first_name='Kathleen', last_name='Scanlan', full_name='Kathleen Scanlan', reorder='BHWDH3')
Record(cust=80462211.22157924, first_name='Albert', last_name='Goodwin', full_name='Albert Goodwin', reorder='KVJRE3')
Record(cust=63441201.600005, first_name='Billie', last_name='Greeke', full_name='Billie Greeke', reorder='LSRHSZ3')
Record(cust=76206790.42156677, first_name='Izabella', last_name='Jedwabski', full_name='Izabella Jedwabski', reorder='MBX3')
Record(cust=30455077.172918256, first_name='Joan', last_name='Donovan', full_name='Joan Donovan', reorder='MNOPQ3')
Record(cust=29981174.056717496, first_name='Robert', last_name='Duncan', full_name='Robert Duncan', reorder='ZYXWV3')
Group 4
---------------
Record(cust=55135877.35924682, first_name='Gregory', last_name='Lombardi', full_name='Gregory Lombardi', reorder='ABCDE4')
Record(cust=56473829, first_name='Peter', last_name='Murphy', full_name='Peter Murphy', reorder='AMZA4')
Record(cust=29463356.5493892, first_name='Mildred', last_name='Hogue', full_name='Mildred Hogue', reorder='BHWDH4')
Record(cust=49935290.262403, first_name='Alice', last_name='Cataldo', full_name='Alice Cataldo', reorder='KVJRE4')
Record(cust=98083780.20955186, first_name='John', last_name='Wittekind', full_name='John Wittekind', reorder='LSRHSZ4')
Record(cust=88992104.2779649, first_name='Jane', last_name='Bilon', full_name='Jane Bilon', reorder='MBX4')
Record(cust=19455007.061793316, first_name='Peter', last_name='Murphy', full_name='Peter Murphy', reorder='MNOPQ4')
Record(cust=11319936.032599876, first_name='Edward', last_name='Durham', full_name='Edward Durham', reorder='ZYXWV4')
Posts: 27
Threads: 2
Joined: Jan 2019
Ok, I've modified it so that there are four of every Reorder value. Thank goodness for online random name generators! :) Not something I'd ever thought of googling before today.
Posts: 27
Threads: 2
Joined: Jan 2019
Gribouillis,
I had some success implementing the sample code you gave. I can now get a list of the data, with the len equaling the number of rows! Thank you!
buran
I'm going to go through your examples line by line and see what I can understand. I'll post back about anything I don't understand so that I can learn, if that's ok.
To all - what is the difference I see in some places between something like:
import openpyxl
and
from openpyxl import load_workbook
and
from openpyxl.reader.excel import load_workbook
??
Posts: 8,167
Threads: 160
Joined: Sep 2016
Jan-11-2019, 01:44 PM
(This post was last modified: Jan-11-2019, 01:44 PM by buran.)
Both are the same. If you look at the docs or at the source code you will see that openpyxl package is organised in sub-packages. One of these sub-packages is openpyxl.reader . in sub-package openpyxl.reader there is module excel.py and load_workbook() function is part of this module.
So using from openpyxl.reader.excel import load_workbook is very explicit from where you import it.
At the same time the __init__.py file for top-level openpyxl package
imports the load_workbook() effectively exposing it and makes it possible to use from openpyxl import load_workbook or if you do just import openpyxl to use wb = openpyxl.load_workbook(some_file)
It's perfectly fine to use any of these - do as you prefer.
Posts: 27
Threads: 2
Joined: Jan 2019
Thank you for that explanation buran.
Already I'm learning! I puzzled over the line:
header, *data = ws.values How did it know to pull the header out just by having a variable named "header"?
So, I played with:
this, that, those, *other = ws.values And happily discovered that each variable = one row and that putting the * in front of the last one makes it skip rows for whatever number of other variables are already defined. Does this work for any collection of data, or just for openpyxl?
I'm having fun learning! Thank you guys for taking the time to help me. :)
Posts: 8,167
Threads: 160
Joined: Sep 2016
pay attention to comments in the code (and ignore the typing errors :-))
Quote:# read header and all data in two variables -> header and data
# Note the asteriks in frot of data
# This way data will hold everything except the first element in ws.values
and if you are interested to read further
PEP 3132 -- Extended Iterable Unpacking
Posts: 27
Threads: 2
Joined: Jan 2019
Jan-11-2019, 01:57 PM
(This post was last modified: Jan-11-2019, 02:19 PM by Netopia.)
(Jan-11-2019, 01:54 PM)buran Wrote: pay attention to comments in the code (and ignore the typing errors :-))
Quote:# read header and all data in two variables -> header and data
# Note the asteriks in frot of data
# This way data will hold everything except the first element in ws.values
and if you are interested to read further
PEP 3132 -- Extended Iterable Unpacking
I did read that. But even though you told me that, I didn't understand how it worked. How it 'knew' that header was the first row. I didn't know that having multiple variables corresponded to rows, one at a time. I didn't know that the * picked up from not just row 2, but from whatever number the last variable was. You pointed me in the right direction with your comments, and that helped me to learn more.
I promise to do my best to read whatever you have taken the time to write! :)
groups = defaultdict(list)
for row in data:
group = row[-1][-1]
groups[group].append(row) I was wondering if the word 'row' had any significance other than being easy to understand. I replaced it temporarily with the word 'this', and everything still worked fine.
Do I understand then, that since you used (list) for defaultdict, that it created a list and so when you say 'for xxxx in yyy', python just grabs the largest unit size, in this case being an entire row? I really am new to this stuff, and this might be basic, but I'm trying to make sure that I don't assume something and then later be confused when something doesn't work the way I thought it did.
Joe
|