Deleting rows based on cell value in Excel

azizrasul · Oct-09-2022, 10:07 PM

The following snippet of code deletes empty rows. How can I change the code such that a row is deleted if a value of 'x' appears in any row in column C?

for row in ws.iter_rows():
    if not any([cell.value for cell in row[1:]]):
        ws.delete_rows(row[0].row)

Pedroski55 · (This post was last modified: Oct-12-2022, 08:28 AM by Pedroski55.)

This Excel file contains some names containing X or x in column 2 and some None values in column 2 or 3

Before the Excel looks like this:

Output:>>> for row in sF[sFsheet].values:
    print(row)

    
('id', 'name', 'Age')
(1, 'Peter', 1)
(2, 'Xavier', 2)
(3, 'George', 3)
(4, None, None)
(5, 'Peggy-Sue', 4)
(6, 'Xander', 5)
(7, 'Legolas', 6)
(8, 'Happy', 7)
(9, None, None)
(10, 'Fatty', 8)
(11, 'Sox', 9)
(12, 'Justin', 10)

This deletes rows containing a name in which the letters X or x appear

def myApp():
    path2XL = '/home/pedro/myPython/openpyxl/xlsx_files/'
    file = 'some_X_or_x_in_column_2.xlsx'

    sF = openpyxl.load_workbook(path2XL + file)
    sFsheets = sF.sheetnames
    sFsheet = sFsheets[0]
    sFmaxRow = sF[sFsheet].max_row

    # open a new XL
    name = file.split('.')
    savename = name[0] + '_modified.xlsx'
    newXL = openpyxl.Workbook()
    sheets = newXL.sheetnames
    sheet = sheets[0]

    # before
    for row in sF[sFsheet].values:
        print(row)

    # you can choose the rows and columns you want to iterate
    # for row in sF[sFsheet].iter_rows(min_row=1, min_col=1, max_row=6, max_col=2):
    # look through column 2
    # if a name contains X or x delete that row
    # row[1] is the value in column 2 of the XL
    # if there is an X or x in a header name, the headers row will be gone! Fix that!
    for row in sF[sFsheet].iter_rows():
        if row[1].value == None:
            continue
        if 'X' in row[1].value or 'x' in row[1].value:
            sF[sFsheet].delete_rows(row[0].row)
        
    # after
    for row in sF[sFsheet].values:
        print(row)

    sF.save(path2XL + savename)

azizrasul · Oct-10-2022, 11:02 PM

Fantastic, many thanks.

Pedroski55 · Oct-11-2022, 02:44 AM

I find, in Python, there is almost always more than 1 way to do things.

This creates a new modified file, where names with X or x or None values are not copied.

def myApp():
    import openpyxl

    path2XL = '/home/pedro/myPython/openpyxl/xlsx_files/'
    file = 'some_X_or_x_in_column_2.xlsx'

    sF = openpyxl.load_workbook(path2XL + file)
    sFsheets = sF.sheetnames
    sFsheet = sFsheets[0]
    sFmaxRow = sF[sFsheet].max_row

    # open a new XL
    name = file.split('.')
    savename = name[0] + '_modified.xlsx'
    newXL = openpyxl.Workbook()
    sheets = newXL.sheetnames
    sheet = sheets[0]

    # copy straight from sF, except if you find X or x in column 2
    count = 1
    for row in sF[sFsheet].values:
        print(row)
        if count == 1:
            print('Inserting the headers ... ')
            for i in range(len(row)):
                newXL[sheet].cell(row=count, column=i+1).value=row[i]
            count +=1
            continue
        if row[1] == None:
            print('Found a None value ... ')
            continue
        elif 'X' in row[1] or 'x' in row[1]:
            print('Found an X or x in column 2 ... ')
            continue
        else:
            print('row is', row) 
            for i in range(len(row)):
                print('row[i] is', row[i])
                newXL[sheet].cell(row=count, column=i+1).value=row[i]
            count +=1

    newXL.save(path2XL + savename)

azizrasul · Oct-11-2022, 07:41 PM

I hadn't got a chance until now to run the code. I get an error: -

    if 'X' in row[1].value or 'x' in row[1].value:
TypeError: argument of type 'int' is not iterable

Pedroski55 · Oct-12-2022, 08:35 AM

Funny, it works for me.

In post #2 I should have saved sF, not newXL, or that will be empty. Actually, you don't need newXL at all in post #2

Below is the output in Idle for post #4

Output:>>> myApp()
('id', 'name', 'Age')
Inserting the headers ... 
(1, 'Peter', 1)
row is (1, 'Peter', 1)
row[i] is 1
row[i] is Peter
row[i] is 1
(2, 'Xavier', 2)
Found an X or x in column 2 ... 
(3, 'George', 3)
row is (3, 'George', 3)
row[i] is 3
row[i] is George
row[i] is 3
(4, None, None)
Found a None value ... 
(5, 'Peggy-Sue', 4)
row is (5, 'Peggy-Sue', 4)
row[i] is 5
row[i] is Peggy-Sue
row[i] is 4
(6, 'Xander', 5)
Found an X or x in column 2 ... 
(7, 'Legolas', 6)
row is (7, 'Legolas', 6)
row[i] is 7
row[i] is Legolas
row[i] is 6
(8, 'Happy', 7)
row is (8, 'Happy', 7)
row[i] is 8
row[i] is Happy
row[i] is 7
(9, None, None)
Found a None value ... 
(10, 'Fatty', 8)
row is (10, 'Fatty', 8)
row[i] is 10
row[i] is Fatty
row[i] is 8
(11, 'Sox', 9)
Found an X or x in column 2 ... 
(12, 'Justin', 10)
row is (12, 'Justin', 10)
row[i] is 12
row[i] is Justin
row[i] is 10

I can't even get Pycharm to work, or I'd try it there!

I'm sure you'll find the problem!

azizrasul · Oct-12-2022, 04:26 PM

OK thanks I will need to play with it and see why it's not working.

azizrasul · Oct-13-2022, 03:01 AM

I changed (based on post #2)

if 'X' in row[1].value or 'x' in row[1].value:

to

if row[1].value == 'X' or row[1].value=='x':

and it works now.

Referencing post #4, I get an error on line (which I have changed, in line with I have said above)

elif row[1].value == 'x' or row[1].value=='x':

I get the error

AttributeError: 'str' object has no attribute 'value'

What am I doing wrong?

Pedroski55 · Oct-13-2022, 06:53 AM

I don't know, sorry!

I copied and pasted Post #4 straight into Idle and it worked exactly as I hoped!

I didn't change anything.

Change it back and try again!

This is post #2 changed, because I didn't need a new Excel for post #2

def myApp():
    import openpyxl
    path2XL = '/home/pedro/myPython/openpyxl/xlsx_files/'
    file = 'some_X_or_x_in_column_2.xlsx'
    name = file.split('.')
    savename = name[0] + '_modified.xlsx'
 
    sF = openpyxl.load_workbook(path2XL + file)
    sFsheets = sF.sheetnames
    sFsheet = sFsheets[0]
    sFmaxRow = sF[sFsheet].max_row
 
    # before
    for row in sF[sFsheet].values:
        print(row)
 
    # you can choose the rows and columns you want to iterate
    # for row in sF[sFsheet].iter_rows(min_row=1, min_col=1, max_row=6, max_col=2):
    # look through column 2
    # if a name contains X or x delete that row
    # row[1] is the value in column 2 of the XL
    # if there is an X or x in a header name, the headers row will be gone! Fix that!
    for row in sF[sFsheet].iter_rows():
        if row[1].value == None:
            continue
        if 'X' in row[1].value or 'x' in row[1].value:
            sF[sFsheet].delete_rows(row[0].row)
         
    # after
    for row in sF[sFsheet].values:
        print(row)
 
    sF.save(path2XL + savename)

From this, you can see how .delete_rows works: it adds and empty row, then deletes, otherwise you would get and index error,

azizrasul · Oct-13-2022, 03:53 PM

It may be that I am using PyCharm environment that is causing the issue. Sigh.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Copy Paste excel files based on the first letters of the file name	Viento	2	1,488	Feb-07-2024, 12:24 PM Last Post: Viento
	Color a table cell based on specific text	Creepy	11	5,057	Jul-27-2023, 02:48 PM Last Post: deanhystad
	How to properly format rows and columns in excel data from parsed .txt blocks	jh67	7	3,678	Dec-12-2022, 08:22 PM Last Post: jh67
	How to assign a value to pandas dataframe column rows based on a condition	klllmmm	0	1,800	Sep-08-2022, 06:32 AM Last Post: klllmmm
	The code I have written removes the desired number of rows, but wrong rows	Jdesi1983	0	2,093	Dec-08-2021, 04:42 AM Last Post: Jdesi1983
	Calculate next rows based on previous values of array	divon	0	2,861	Nov-23-2021, 04:44 AM Last Post: divon
	Using Excel Cell As A Variable In A Loop	knight2000	7	6,058	Aug-25-2021, 12:43 PM Last Post: snippsat
	Using Excel Cell As A Variable In A Loop	knight2000	7	7,819	Jul-18-2021, 10:52 AM Last Post: knight2000
	Pandas DataFrame combine rows by column value, where Date Rows are NULL	rhat398	0	2,675	May-04-2021, 10:51 PM Last Post: rhat398
	Set 'Time' format cell when writing data to excel and not 'custom'	limors	3	8,516	Mar-29-2021, 09:36 PM Last Post: Larz60+

Deleting rows based on cell value in Excel

User Panel Messages

Announcements