Python Forum

Full Version: comparison of two xlsx files
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone!

I have 2 xlsx files and I need to search some words from one file in file two and write down founded information in third xlsx file. The problem is that when I make two xlsx verification files by my own all works great but when I take information from other xlsx files my program didn't work good.

Here is my program:

from openpyxl import load_workbook, Workbook
import re
from xlsxwriter import Workbook as XlsxWb


def reader(file_name, file_name2):

    book1 = load_workbook(file_name)
    book2 = load_workbook(file_name2)
    book_new = XlsxWb('pizza.xlsx')
    line = book1.active
    line2 = book2.active
    line_new = book_new.add_worksheet("Pakkumine")
    input_list = []
    one_piece_list = []
    one_piece_km_list = []
    i = 0
    j = 0
    h = 0
    for value in range(1, line.max_row + 1):  # I take values from first file
        highest_count = 0
        counter = 0
        input_list.clear()
        one_piece_list.clear()
        one_piece_km_list.clear()
        item = line[value][1].value
        pos = line[value][0].value
        unit = line[value][2].value
        amount = line[value][3].value
        if item is None:
            continue
        else:
            lower_case_item = item.lower()
            text_re = re.sub(r'[\([{})\]]', "", lower_case_item)
            item_to_list = list(text_re.strip().split(' ')) # I make list with words that containing in value

        for value2 in range(1, line2.max_row + 1): # Now we take value from second file
            item2 = line2[value2][1].value
            one_piece = line2[value2][2].value
            one_piece_km = line2[value2][3].value

            if item2 is None:
                continue
            else:
                lower_case_item2 = item2.lower()
                text_re2 = re.sub(r'[\([{})\]]', "", lower_case_item2)

                for word in item_to_list:
                    text = re.search(r'\b{}\b'.format(word), text_re2) #Here we are searching words from second file in words from first file
                    if text:
                        counter += 1 #Counting how much coincidences we have in this case
                    else:
                        continue

                if counter > highest_count: #This piece of code we need to write value with biggest amount of coincidence
                    highest_count = counter
                    input_list.append(item2)
                    one_piece_list.append(one_piece)
                    one_piece_km_list.append(one_piece_km)
                    counter = 0
                    line_new.write(i, 1, item)
                    line_new.write(i, 0, pos)

                elif counter > 0 and counter == highest_count:
                    input_list.append(item2)
                    one_piece_list.append(one_piece)
                    one_piece_km_list.append(one_piece_km)
                    counter = 0
                    line_new.write(i, 1, item)
                    line_new.write(i, 0, pos)

                else:
                    continue

        for x in input_list: #We are writing information about our coincidence
            line_new.write(i, 2, x) #Name of coincidence from second file
            line_new.write(i, 3, unit) #quantity name
            line_new.write(i, 4, amount) #How many we need
            i += 1
        for y in one_piece_list:
            if type(y) == str: #Check for string
                continue
            elif y is None: #Check for zero
                j += 1
                continue
            else:
                line_new.write(j, 5, float(y)) #Numbers about cost of one piece
                line_new.write(j, 7, float(y) * float(amount)) #Numbers about cost of all products
            j += 1
        for z in one_piece_km_list:
            if type(z) == str:
                continue
            elif z is None:
                h += 1
                continue
            else:
                line_new.write(h, 6, float(z))
                line_new.write(h, 8, float(z) * float(amount))
            h += 1
    book_new.close()


if __name__ == '__main__':
    reader('newin.xlsx', 'sqr.xlsx')
Can someone explain why my code didnt work properly? I attach my xlsx files too.
curious: if this is a one time operation, why not export each file to CSV and use diff?
(Sep-07-2021, 11:27 AM)Larz60+ Wrote: [ -> ]curious: if this is a one time operation, why not export each file to CSV and use diff?

Sorry I am new one in Python and I dont know a lot of things. I have all info in xlsx files and new info files will be in xlsx format.
What do you mean diff? Will it help and program will work fine?
you don't need package xlsxwriter as openpyxl van both read and write.

your goal is unclear to me, and code is difficult to follow.
Please show step by step the goal, and give me a few hours to respond.
(Sep-10-2021, 08:51 AM)Larz60+ Wrote: [ -> ]you don't need package xlsxwriter as openpyxl van both read and write.

your goal is unclear to me, and code is difficult to follow.
Please show step by step the goal, and give me a few hours to respond.

Sorry for the long answer. The goal of my program is to create a third xlsx file with products and prices. In first xlsx file I have list of products that I need to buy. In second xlsx file I have list of all products of shop with prices. I need to automate search process because it take a lot of time to find products by ctrl+f in xlsx file.

Do you need that I describe my code more accurately? If my problem can still be solved))))

When I use search with 1 repetition, program find a lot of unnecessary lines. The problem is that when the number of repetitions is increased, it does not work correctly.