Python Forum
Complete NEWB and openpyxl project
Thread Rating:
  • 6 Vote(s) - 3.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Complete NEWB and openpyxl project
#1
Greetings!

I've butchered various programming languages in the past, learning to hack my way through simple problems, but have never really become proficient in any of them. This year I decided that I would actually learn Python and stick with it. So, I'm a newb, but dedicated to continuing to learn.

My son-in-law helped me to create a solution for a previous employers of mine, which he wrote in Perl. I thought that, perhaps, that would be a nice basic challenge for me to redo in Python since I already had a basic idea of what might be involved, though the code would be much different.

The project takes an Excel workbook (only 1 sheet) that has rows of data with one column (named Reorder) containing values that are some number of alpha characters (less than 6, historically) and a single digit.

The desired outcome would be that all records would be filtered according to that digit on the end; all that ended with a '1' would be Group1, with a '2' would be Group2... and so on.

Once that was accomplished, each group would be sorted by the alpha values in the Reorder column.

All of the Group1 records would be put into a new Excel workbook named for the Group of records (e.g. Group1.xlsx, Group2.xlsx... etc).

Inside each of these Group workbooks would be separate worksheets (alphabetized) for each unique alpha/digit string (e.g. ABCDE1, BCD1, MNOP1, -OR- ABCDE2, JKM2, MNOP2... etc).

I've gotten through the baby steps of checking for the existence of the source folder and creating it if not there (originally set up so that different people could use the program the in exactly the same way, regardless of what computer it was on). I also have it check to make sure that there is one, and only one .xlsx file in this folder.

I've gotten it to create a new folder in the working folder that is named after the excel file without extension. Copy the original Excel file to that folder, and then change the working folder to that new folder.

From there, I've been able to get openpyxl assign a variable for the source workbook, assign a variable for the worksheet... and now I've sort of gotten lost.

In Perl, my son-in-law used (from Excel::Writer::XLSX) functions 'maxcol' and 'maxrow', so I tried to use max_col and max_row with openpyxl, but wasted a lot of time before realizing that they return the maximum rows and columns in Excel::Writer, but seem to assign them in openpyxl. I also wasted time reading older posts which referenced deprecated commands.

I've read through the documentation at https://openpyxl.readthedocs.io/en/stable/ but haven't gleaned much that seems like it would help.

I have found wb.calculate_dimension(), which I guess I can take the output from. Not sure how to go about the separating and sorting either.

Have I chosen the best tool with openpyxl, or is there something else I should be looking at? Does anyone know of a comprehensive listing of all of the functions/commands and their current names and usages within openpyxl? I have found precious little thus far.

I'll post an example of the type of data in a screenshot.

If you're still here, thanks for even reading this far! Any and all help is appreciated.

Joe

[Image: TestImmuneData.xlsx%20-%20Excel.png?dl=0]

Looks like you'll have to click the broken image icon to display the image. :(
Reply
#2
check https://openpyxl.readthedocs.io/en/stabl...many-cells and next section - https://openpyxl.readthedocs.io/en/stabl...alues-only

using tuple(ws.rows) or tuple(ws.values) you will get tuple of tuples and then you can sort/process, etc.
By the way, you may want to upload a workbook with dummy data as it will help - everyone who wants to help will work with same test data.
(Jan-09-2019, 08:06 PM)Netopia Wrote: Does anyone know of a comprehensive listing of all of the functions/commands and their current names and usages within openpyxl? I have found precious little thus far.

Here is the full API

https://openpyxl.readthedocs.io/en/stabl...npyxl.html

for example here are all properties and methods for worksheet https://openpyxl.readthedocs.io/en/stabl...sheet.html

by the way, according to docs, max_col and max_row will return:

max_column

The maximum column index containing data (1-based)
Type: int

max_row

The maximum row index containing data (1-based)
Type: int
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
buran,

Thank you VERY much for the reply. I thought I'd gone through the main page at that link, and most of the sidebar items, but your links show that I wasn't very thorough! I'll have to now read through them.

As per your suggestion, I'm uploading a test date Excel file.
Hmmm... don't see a way to upload, so I'll just link it.

Link To Test Data


Thank you again, and I look forward to learning from what you posted.

Joe
Reply
#4
Feel free to post your code and ask questions if you face problem. And when I asked for data I really mean dummy data. Are you sure it's OK to share all these personal data?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
buran,

That is dummy data. None of the data is real. Thank you for your concern and consideration though, appreciated.

Joe

I saw this in the instructions, but couldn't make sense of it. Does the "Hello World" have any use here or is it just thrown in?


>>> ws = wb.active
>>> ws['C9'] = 'hello world'
>>> tuple(ws.rows)
((<Cell Sheet.A1>, <Cell Sheet.B1>, <Cell Sheet.C1>),
(<Cell Sheet.A2>, <Cell Sheet.B2>, <Cell Sheet.C2>),
(<Cell Sheet.A3>, <Cell Sheet.B3>, <Cell Sheet.C3>),
(<Cell Sheet.A4>, <Cell Sheet.B4>, <Cell Sheet.C4>),
(<Cell Sheet.A5>, <Cell Sheet.B5>, <Cell Sheet.C5>),
(<Cell Sheet.A6>, <Cell Sheet.B6>, <Cell Sheet.C6>),
(<Cell Sheet.A7>, <Cell Sheet.B7>, <Cell Sheet.C7>),
(<Cell Sheet.A8>, <Cell Sheet.B8>, <Cell Sheet.C8>),
(<Cell Sheet.A9>, <Cell Sheet.B9>, <Cell Sheet.C9>))


Joe
Reply
#6
No, it's just an example so that used range on the sheet is A1:C9 (i.e. the example assumes/starts from blank workbook/worksheet). And then you get tuple of tuples of cell objects.
and please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#7
buran

SÅ­zhalyavam za tova. Blagodarim vi za pomoshtta!

Joe
Reply
#8
Link to corrected test data:

Corrected Test Data
Reply
#9
I wrote once a very simple function that I think could help you
def fetch_xlsx(filename, maxcols = 10):
    """Return a list of rows (lists) extracted from an xlsx workbook
    """
    from openpyxl.reader.excel import load_workbook
    database = []
    wb=load_workbook(filename)
    for sheet in wb.worksheets:
        for row in sheet.rows:
            values=[]
            for i, cell in enumerate(row):
                if i >= maxcols:
                    break
                values.append(cell.value)
            database.append(values)
    return database
Reply
#10
Gribouillis

Thank you so much for this. I've been banging my head all day just trying to figure out what I'm doing. My lack of experience and formal training is getting to me.

I'll go over this and see if I can understand exactly what it's doing.

I wonder if I'm even heading in the right direction. I think I should be trying to do something like this:
  • Assign a variable "header" and have it read the first row as a list
  • Assign a variable "data" and have it read in all the data from row 2 onward
  • Somehow divide that list into 4 separate lists (Group1, Group2, Group3, Group4) by only numerical of data[4]
  • Somehow sort those lists by only alpha of data[4]
  • Somehow divide those lists into unique lists based on value of data[4]
  • Create 4 new workbooks, each named for the Group
  • Add a tab for each unique value in a group to the group's workbook
  • Add the header to each of the tabs in each workbook
  • Add the data for each tab/workbook

I feel like I'm overthinking and not mentally using the power of loops, but I'm stuck trying to figure out how to go from this general list of tasks to actually doing it.

I've gotten the data into a single list, but then I find that the list only has a length of 1, so no way to do anything with that. :(

I'm not really looking for anyone to write this for me, but want to learn and understand what I'm doing. I am MOST appreciative of pushes in the right direction and examples that might turn the light on in my head.

I'm going to go play with what you posted now. THANK YOU!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  NameError issue with daughter's newb code MrGonk 2 1,406 Sep-16-2021, 01:29 PM
Last Post: BashBedlam
  Simple newb string question Involute 2 2,169 Sep-08-2019, 12:50 AM
Last Post: Involute
  please help this newb install pygame iofhua 7 5,845 May-15-2019, 01:09 PM
Last Post: buran
  Newb question: Debugging + Linting Python in Visual Studio Code Drone4four 1 2,388 Apr-15-2019, 06:19 AM
Last Post: perfringo
  Newb question about %02d %04d bennylava 30 19,145 Mar-05-2019, 11:23 PM
Last Post: snippsat
  Pthyon 3 question (newb) bennylava 11 5,741 Feb-28-2019, 06:04 PM
Last Post: buran
  newb selfie PatM 5 3,544 Feb-19-2019, 12:20 AM
Last Post: snippsat
  Newb Question - Threading in Crons vvarrior 2 2,718 Jul-20-2018, 08:12 PM
Last Post: vvarrior
  Matt's newb question 1 MattSS102 1 2,669 Aug-28-2017, 03:27 AM
Last Post: BerlingSwe
  Newb: Simple Explicit Formula Duplicitous 1 3,108 May-05-2017, 07:03 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020