Python Forum
failing to print not matched lines from second file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
failing to print not matched lines from second file
#11
Why were you confused by square brackets? You use them in your original post.
Reply
#12
Outside square brackets is list comprehension and the [3] is string slicing.
Someone correct me if I am wrong please.
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply
#13
Hm, I do not understand why you split the line, if you compare whole lines.

Things to pay attention for:
  • what is compared? Lines or some columns in the liens?
  • What should happen with empty lines?
  • What should happen if one line has leading white spaces and the references not?
  • What should happen if one line has tailing white spaces and the references not?
  • Do the reference have empty lines and leading white spaces?

If you just want to compare whole not empty lines and stripping whitespaces:
from io import StringIO


test1 = """

03/28/2021,P,6,LINE2
03/28/2021,P,9,LINE4

"""


test2 = """

03/28/2021,P,16,LINE1
03/28/2021,P,6,LINE2
03/28/2021,P,9,LINE3
03/28/2021,P,9,LINE4
03/28/2021,P,8,LINE5
03/28/2021,S,95,LINE6
03/28/2021,S,1,LINE7
03/28/2021,P,46,LINE8

"""

file1 = StringIO(test1)
file2 = StringIO(test2)
# using StringIO to simulate an open file


# file1 and file2 can also come from open()
# TextIOWrapper, StringIO, BytesIO, ... supports
# line iteration


def get_references(text):
    references = set()
    # we want to look up fast
    # preserving the order is not required for
    # the references
    # a set contains only unique elements

    # this removes leading and tailing white spaces
    for line in map(str.strip, text):
        if not line:
            # skip empty lines
            # because of str.strip
            # the line does not contain white spaces
            # bool(empty_string) -> False
            continue
        # set has no append.
        # instead you add objects to the set
        references.add(line)
    return references


def show_not_matching(text, references):
    line_iter = map(str.strip, text)
    # to get line numbers, enumerate is used
    # it just iterates over the iterable and
    # yields (number, elemten_of_iterable)
    lines = enumerate(line_iter, start=1)
    for line_number, line in lines:
        if not line:
            continue
        if line not in references:
            # string formatting
            print(f"[{line_number:>5}] Not matching -> {line}")


if __name__ == "__main__":
    # with test data in source code
    ref = get_references(file1)
    show_not_matching(file2, ref)

    # later with real files
    # with open("file1.txt") as fd_ref:
    #     refs = get_references(fd_ref)
    #
    # with open("file2.txt") as fd:
    #     show_not_matching(fd, refs)
And if you not want to compare the date:
from io import StringIO


test1 = """

03/28/2021,P,6,LINE2
03/28/2021,P,9,LINE4

"""


test2 = """

03/28/2021,P,16,LINE1
03/28/2021,P,6,LINE2
03/28/2021,P,9,LINE3
03/28/2021,P,9,LINE4
03/28/2021,P,8,LINE5
03/28/2021,S,95,LINE6
03/28/2021,S,1,LINE7
03/28/2021,P,46,LINE8

"""

file1 = StringIO(test1)
file2 = StringIO(test2)
# using StringIO to simulate an open file


# file1 and file2 can also come from open()
# TextIOWrapper, StringIO, BytesIO, ... supports
# line iteration


def get_references(text):
    references = set()
    # we want to look up fast
    # preserving the order is not required for
    # the references
    # a set contains only unique elements

    # this removes leading and tailing white spaces
    for line in map(str.strip, text):
        if not line:
            # skip empty lines
            # because of str.strip
            # the line does not contain white spaces
            # bool(empty_string) -> False
            continue
        # set has no append.
        # instead you add objects to the set

        # just removing the date from line
        # _ is a throw away name
        _, line = line.split(",", maxsplit=1)
        references.add(line)
    print("References:", references)
    return references


def show_not_matching(text, references):
    line_iter = map(str.strip, text)
    # to get line numbers, enumerate is used
    # it just iterates over the iterable and
    # yields (number, elemten_of_iterable)
    lines = enumerate(line_iter, start=1)
    for line_number, line in lines:
        if not line:
            continue
        # here the same
        # we want to remove the date from the
        # line we want to compare with the references
        # where the date was also removed
        # but we keep the original line, for
        # printing it
        _, line_to_compare = line.split(",", maxsplit=1)
        # now use the modified line to look it up in references
        if line_to_compare not in references:
            # string formatting
            print(f"[{line_number:>5}] Not matching -> {line}")


if __name__ == "__main__":
    # with test data in source code
    ref = get_references(file1)
    show_not_matching(file2, ref)

    # later with real files
    # with open("file1.txt") as fd_ref:
    #     refs = get_references(fd_ref)
    #
    # with open("file2.txt") as fd:
    #     show_not_matching(fd, refs)
This time without comments, but with real files:
def get_references(text):
    references = set()
    for line in map(str.strip, text):
        if not line:
            continue
        _, line = line.split(",", maxsplit=1)
        references.add(line)
    return references


def show_not_matching(text, references):
    line_iter = map(str.strip, text)
    lines = enumerate(line_iter, start=1)
    for line_number, line in lines:
        if not line:
            continue
        _, line_to_compare = line.split(",", maxsplit=1)
        if line_to_compare not in references:
            print(f"[{line_number:>5}] Not matching -> {line}")


if __name__ == "__main__":
    with open("file1.txt") as fd_ref:
        refs = get_references(fd_ref)

    with open("file2.txt") as fd:
        show_not_matching(fd, refs)
Read the Python documentation, if you see functions you don't know.
enumerate, map, str.split, set, in operator.

Also important for later use: str.strip(both_sides), str.lstrip(left_side), str.rstrip(right_side).
To remove only tailing white spaces, use str.lstrip.
tester_V likes this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#14
By the way, you can use the csv Module.
This is also in the standard library.
tester_V likes this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#15
Build a set of strings then read the second file - this will fix the problem that you are facing.

with open(DIR/'lines_to_look_for.txt', 'r') as file:
    lines = set([line.strip().split(',')[3] for line in file])
 
with open(DIR/'check_for_lines.txt', 'r') as file:
    for line in file:
        line = line.strip()
        if line.split(',')[3] in lines:
            print(f'{line} MATCH')
        else:
            print(f'{line} NO MATCH')
Square Brackets is list comprehension and the [3] is string slicing. menator01 said right only.
tester_V likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Failing to connect by 'net use' tester_V 1 67 9 hours ago
Last Post: tester_V
  Failing to print sorted files tester_V 4 1,245 Nov-12-2022, 06:49 PM
Last Post: tester_V
  Saving the print result in a text file Calli 8 1,791 Sep-25-2022, 06:38 PM
Last Post: snippsat
  Failing reading a file and cannot exit it... tester_V 8 1,802 Aug-19-2022, 10:27 PM
Last Post: tester_V
  Failing regex tester_V 3 1,168 Aug-16-2022, 03:53 PM
Last Post: deanhystad
  Delete multiple lines from txt file Lky 6 2,287 Jul-10-2022, 12:09 PM
Last Post: jefsummers
  Print to a New Line when Appending File DaveG 0 1,217 Mar-30-2022, 04:14 AM
Last Post: DaveG
  Extracting Specific Lines from text file based on content. jokerfmj 8 2,960 Mar-28-2022, 03:38 PM
Last Post: snippsat
Sad Want to Save Print output in csv file Rasedul 5 10,924 Jan-11-2022, 07:04 PM
Last Post: snippsat
  Convert legacy print file to XLSX file davidm 1 1,803 Oct-17-2021, 05:08 AM
Last Post: davidm

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020