Python Forum
comparison of two xlsx files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
comparison of two xlsx files
#1
Hello everyone!

I have 2 xlsx files and I need to search some words from one file in file two and write down founded information in third xlsx file. The problem is that when I make two xlsx verification files by my own all works great but when I take information from other xlsx files my program didn't work good.

Here is my program:

from openpyxl import load_workbook, Workbook
import re
from xlsxwriter import Workbook as XlsxWb


def reader(file_name, file_name2):

    book1 = load_workbook(file_name)
    book2 = load_workbook(file_name2)
    book_new = XlsxWb('pizza.xlsx')
    line = book1.active
    line2 = book2.active
    line_new = book_new.add_worksheet("Pakkumine")
    input_list = []
    one_piece_list = []
    one_piece_km_list = []
    i = 0
    j = 0
    h = 0
    for value in range(1, line.max_row + 1):  # I take values from first file
        highest_count = 0
        counter = 0
        input_list.clear()
        one_piece_list.clear()
        one_piece_km_list.clear()
        item = line[value][1].value
        pos = line[value][0].value
        unit = line[value][2].value
        amount = line[value][3].value
        if item is None:
            continue
        else:
            lower_case_item = item.lower()
            text_re = re.sub(r'[\([{})\]]', "", lower_case_item)
            item_to_list = list(text_re.strip().split(' ')) # I make list with words that containing in value

        for value2 in range(1, line2.max_row + 1): # Now we take value from second file
            item2 = line2[value2][1].value
            one_piece = line2[value2][2].value
            one_piece_km = line2[value2][3].value

            if item2 is None:
                continue
            else:
                lower_case_item2 = item2.lower()
                text_re2 = re.sub(r'[\([{})\]]', "", lower_case_item2)

                for word in item_to_list:
                    text = re.search(r'\b{}\b'.format(word), text_re2) #Here we are searching words from second file in words from first file
                    if text:
                        counter += 1 #Counting how much coincidences we have in this case
                    else:
                        continue

                if counter > highest_count: #This piece of code we need to write value with biggest amount of coincidence
                    highest_count = counter
                    input_list.append(item2)
                    one_piece_list.append(one_piece)
                    one_piece_km_list.append(one_piece_km)
                    counter = 0
                    line_new.write(i, 1, item)
                    line_new.write(i, 0, pos)

                elif counter > 0 and counter == highest_count:
                    input_list.append(item2)
                    one_piece_list.append(one_piece)
                    one_piece_km_list.append(one_piece_km)
                    counter = 0
                    line_new.write(i, 1, item)
                    line_new.write(i, 0, pos)

                else:
                    continue

        for x in input_list: #We are writing information about our coincidence
            line_new.write(i, 2, x) #Name of coincidence from second file
            line_new.write(i, 3, unit) #quantity name
            line_new.write(i, 4, amount) #How many we need
            i += 1
        for y in one_piece_list:
            if type(y) == str: #Check for string
                continue
            elif y is None: #Check for zero
                j += 1
                continue
            else:
                line_new.write(j, 5, float(y)) #Numbers about cost of one piece
                line_new.write(j, 7, float(y) * float(amount)) #Numbers about cost of all products
            j += 1
        for z in one_piece_km_list:
            if type(z) == str:
                continue
            elif z is None:
                h += 1
                continue
            else:
                line_new.write(h, 6, float(z))
                line_new.write(h, 8, float(z) * float(amount))
            h += 1
    book_new.close()


if __name__ == '__main__':
    reader('newin.xlsx', 'sqr.xlsx')
Can someone explain why my code didnt work properly? I attach my xlsx files too.

Attached Files

.xlsx   newin.xlsx (Size: 8.69 KB / Downloads: 262)
.xlsx   sqr.xlsx (Size: 10.64 KB / Downloads: 239)
Reply
#2
curious: if this is a one time operation, why not export each file to CSV and use diff?
Reply
#3
(Sep-07-2021, 11:27 AM)Larz60+ Wrote: curious: if this is a one time operation, why not export each file to CSV and use diff?

Sorry I am new one in Python and I dont know a lot of things. I have all info in xlsx files and new info files will be in xlsx format.
What do you mean diff? Will it help and program will work fine?
Reply
#4
you don't need package xlsxwriter as openpyxl van both read and write.

your goal is unclear to me, and code is difficult to follow.
Please show step by step the goal, and give me a few hours to respond.
Reply
#5
(Sep-10-2021, 08:51 AM)Larz60+ Wrote: you don't need package xlsxwriter as openpyxl van both read and write.

your goal is unclear to me, and code is difficult to follow.
Please show step by step the goal, and give me a few hours to respond.

Sorry for the long answer. The goal of my program is to create a third xlsx file with products and prices. In first xlsx file I have list of products that I need to buy. In second xlsx file I have list of all products of shop with prices. I need to automate search process because it take a lot of time to find products by ctrl+f in xlsx file.

Do you need that I describe my code more accurately? If my problem can still be solved))))

When I use search with 1 repetition, program find a lot of unnecessary lines. The problem is that when the number of repetitions is increased, it does not work correctly.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  .py pandas matplotlib .xlsx files QubeStory 1 801 Mar-23-2023, 09:38 AM
Last Post: buran
  Extracting information from .xlsx files hobbyist 0 1,606 Jan-06-2021, 07:20 PM
Last Post: hobbyist
  How can I speed up my openpyxl program reading Excel .xlsx files? deac33 0 3,415 May-04-2020, 08:02 PM
Last Post: deac33
  How to access all xlsx files in all subdirectories? Krszt 2 5,449 Mar-19-2019, 11:00 AM
Last Post: DeaD_EyE
  reading .xls files and saving as .xlsx jon0852 1 6,900 Oct-17-2017, 08:32 PM
Last Post: buran
  Looping .xlsx files in folder/subfolders copy pasting currentregion HarrisQ 4 5,576 Apr-17-2017, 06:35 AM
Last Post: HarrisQ

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020