Hm, I do not understand why you split the line, if you compare whole lines.
Things to pay attention for:
If you just want to compare whole not empty lines and stripping whitespaces:
Also important for later use: str.strip(both_sides), str.lstrip(left_side), str.rstrip(right_side).
To remove only tailing white spaces, use
Things to pay attention for:
- what is compared? Lines or some columns in the liens?
- What should happen with empty lines?
- What should happen if one line has leading white spaces and the references not?
- What should happen if one line has tailing white spaces and the references not?
- Do the reference have empty lines and leading white spaces?
If you just want to compare whole not empty lines and stripping whitespaces:
from io import StringIO test1 = """ 03/28/2021,P,6,LINE2 03/28/2021,P,9,LINE4 """ test2 = """ 03/28/2021,P,16,LINE1 03/28/2021,P,6,LINE2 03/28/2021,P,9,LINE3 03/28/2021,P,9,LINE4 03/28/2021,P,8,LINE5 03/28/2021,S,95,LINE6 03/28/2021,S,1,LINE7 03/28/2021,P,46,LINE8 """ file1 = StringIO(test1) file2 = StringIO(test2) # using StringIO to simulate an open file # file1 and file2 can also come from open() # TextIOWrapper, StringIO, BytesIO, ... supports # line iteration def get_references(text): references = set() # we want to look up fast # preserving the order is not required for # the references # a set contains only unique elements # this removes leading and tailing white spaces for line in map(str.strip, text): if not line: # skip empty lines # because of str.strip # the line does not contain white spaces # bool(empty_string) -> False continue # set has no append. # instead you add objects to the set references.add(line) return references def show_not_matching(text, references): line_iter = map(str.strip, text) # to get line numbers, enumerate is used # it just iterates over the iterable and # yields (number, elemten_of_iterable) lines = enumerate(line_iter, start=1) for line_number, line in lines: if not line: continue if line not in references: # string formatting print(f"[{line_number:>5}] Not matching -> {line}") if __name__ == "__main__": # with test data in source code ref = get_references(file1) show_not_matching(file2, ref) # later with real files # with open("file1.txt") as fd_ref: # refs = get_references(fd_ref) # # with open("file2.txt") as fd: # show_not_matching(fd, refs)And if you not want to compare the date:
from io import StringIO test1 = """ 03/28/2021,P,6,LINE2 03/28/2021,P,9,LINE4 """ test2 = """ 03/28/2021,P,16,LINE1 03/28/2021,P,6,LINE2 03/28/2021,P,9,LINE3 03/28/2021,P,9,LINE4 03/28/2021,P,8,LINE5 03/28/2021,S,95,LINE6 03/28/2021,S,1,LINE7 03/28/2021,P,46,LINE8 """ file1 = StringIO(test1) file2 = StringIO(test2) # using StringIO to simulate an open file # file1 and file2 can also come from open() # TextIOWrapper, StringIO, BytesIO, ... supports # line iteration def get_references(text): references = set() # we want to look up fast # preserving the order is not required for # the references # a set contains only unique elements # this removes leading and tailing white spaces for line in map(str.strip, text): if not line: # skip empty lines # because of str.strip # the line does not contain white spaces # bool(empty_string) -> False continue # set has no append. # instead you add objects to the set # just removing the date from line # _ is a throw away name _, line = line.split(",", maxsplit=1) references.add(line) print("References:", references) return references def show_not_matching(text, references): line_iter = map(str.strip, text) # to get line numbers, enumerate is used # it just iterates over the iterable and # yields (number, elemten_of_iterable) lines = enumerate(line_iter, start=1) for line_number, line in lines: if not line: continue # here the same # we want to remove the date from the # line we want to compare with the references # where the date was also removed # but we keep the original line, for # printing it _, line_to_compare = line.split(",", maxsplit=1) # now use the modified line to look it up in references if line_to_compare not in references: # string formatting print(f"[{line_number:>5}] Not matching -> {line}") if __name__ == "__main__": # with test data in source code ref = get_references(file1) show_not_matching(file2, ref) # later with real files # with open("file1.txt") as fd_ref: # refs = get_references(fd_ref) # # with open("file2.txt") as fd: # show_not_matching(fd, refs)This time without comments, but with real files:
def get_references(text): references = set() for line in map(str.strip, text): if not line: continue _, line = line.split(",", maxsplit=1) references.add(line) return references def show_not_matching(text, references): line_iter = map(str.strip, text) lines = enumerate(line_iter, start=1) for line_number, line in lines: if not line: continue _, line_to_compare = line.split(",", maxsplit=1) if line_to_compare not in references: print(f"[{line_number:>5}] Not matching -> {line}") if __name__ == "__main__": with open("file1.txt") as fd_ref: refs = get_references(fd_ref) with open("file2.txt") as fd: show_not_matching(fd, refs)Read the Python documentation, if you see functions you don't know.
enumerate
, map
, str.split
, set
, in
operator.Also important for later use: str.strip(both_sides), str.lstrip(left_side), str.rstrip(right_side).
To remove only tailing white spaces, use
str.lstrip
.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!