Python Forum
looking for sweeter code to compare parts of a list
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
looking for sweeter code to compare parts of a list
#1
i have a big (nearly a million lines) file that is being read in, one line at a time.  each line is .split() and there are about 3 dozen tokens for each line.  several tokens are checked to select which lines are to be used.  the check is an equality check for a few different tokens.  the tokens being checked are not contiguous, such as 4, 5, 9, 10, and 12 are checked.  i am doing these checks with a big long if statement with lots of ands.  i am wondering if there is any sweeter way to code this kind of thing. a program i am working on today is doing a lot of this kind of thing, from lots of cloud data i have.

    datadict = {}
    for line in sys.stdin:
       tokens = line.rstrip().split()
       if tokens[4] == 'foo' and\
          tokens[5] == 'bar' and\
          tokens[9] == 'xyzzy' and\
          tokens[10] == 'yzzyx' and\
          tokens[12] == 'Skaperen':
           processed += 1
           datadict[tokens[0]] = (tokens[1],tokens[2],tokens[3])
       elif tokens[4] == 'bar' and\
            tokens[5] == 'foo' and\
            tokens[9] == 'yzzyx' and\
            tokens[11] == 'xyzzy' and\
            tokens[12] == 'Skapare':
           processed += 1
           datadict[tokens[0]] = (tokens[2],tokens[1],tokens[3])
       else:
          skipped += 1
comparing long slices is not practical since the data to be checked is on either side of data that can vary.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
in line 7, you check item with index=10 and 0n line 14 - the one with index 11, is that really so?
Reply
#3
(Jun-18-2017, 06:08 AM)buran Wrote: in line 7, you check item with index=10 and 0n line 14 - the one with index 11, is that really so?
the posted code is an example, not a real case.  but it is much like a real case in that some selection rules do involve testing different tokens.  the test of tokens[10] vs. the test of tokens[11], while not a real case, was intend to show that test cases can vary like that.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
(Jun-18-2017, 07:46 AM)Skaperen Wrote: but it is much like a real case in that some selection rules do involve testing different tokens.
yeah, that was my question...
Reply
#5
maybe something like this

datadict = {}
options = [({4:'foo', 5:'bar', 9:'xyzzy', 10:'yzzyx', 14:'Skaperen'}, (1, 2, 3)),
           ({4:'bar', 5:'foo', 9:'yzzyx', 11:'xyzzy', 14:'Skaperen'}, (2, 1, 3))]
for line in sys.stdin:
   tokens = line.rstrip().split()
   for opt, get_tokens in options:
       if all(tokens[k] == value for k, value in opt.items()):
           processed += 1
           datadict[tokens[0]] = tuple((tokens[i] for i in get_tokens))
           break
       else:
          skipped += 1
note I assume initial value  of processed, skipped i set before the snippet you provide. Not sure if you really need processed (you can always check the len of datadict). maybe if you want to compare processed + skipped to total number of records to process?
Reply
#6
i was thinking of making a function for each type of test then applying them in the order of most likely first. that we the details of each type of test is away from the loop going through all the lines. turns out performance is not very bad. 600000 records only takes a few seconds, and this is not bad if it takes a few minutes (it will ultimately be run at most once an hour).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#7
(Jun-19-2017, 02:54 AM)Skaperen Wrote: that we the details of each type of test is away from the loop going through all the lines.
Not sure I understand that...
it exit the loop after first match, so not all test are performed.
I also was thinking that if you put the check in a separate function, it can make the code more clear and contained, although maybe a bit longer
By the way - I see I have a mistake in the above code - the else is part of the if statement, but it should be part of the for loop (as was my intention). In the above code the count of skipped is wrong.

datadict = {}
options = [({4:'foo', 5:'bar', 9:'xyzzy', 10:'yzzyx', 14:'Skaperen'}, (1, 2, 3)),
           ({4:'bar', 5:'foo', 9:'yzzyx', 11:'xyzzy', 14:'Skaperen'}, (2, 1, 3))]
for line in sys.stdin:
    tokens = line.rstrip().split()
    for opt, get_tokens in options:
        if all(tokens[k] == value for k, value in opt.items()):
            processed += 1
            datadict[tokens[0]] = tuple((tokens[i] for i in get_tokens))
            break
    else:
        skipped += 1
Reply
#8
those counts were just trivial details i tossed in to help show a complete structure. the first ugly real code (more proof of concept) version did not do counts.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Missing parts of Code Felipe1991_GVT 3 191 Mar-22-2024, 05:58 PM
Last Post: deanhystad
  How to expand and collapse individual parts of the code in Atom Lora 2 1,104 Oct-06-2022, 07:32 AM
Last Post: Lora
  Compare two Excel sheets with Python and list diffenrences dmkfon 1 14,489 Oct-09-2021, 03:30 PM
Last Post: Larz60+
  Compare response and name list in experiment knoxvillerailgrind 3 2,173 Jul-26-2020, 12:23 PM
Last Post: deanhystad
  Having a hard time combining two parts of code. Coozeki 6 3,011 May-10-2020, 06:50 AM
Last Post: Coozeki
  Compare Two Lists and Replace Items In a List by Index nagymusic 2 2,847 May-10-2020, 05:28 AM
Last Post: deanhystad
  how to compare a list to a list of lists kevthew 1 1,774 Dec-22-2019, 11:43 AM
Last Post: ibreeden
  Converting parts of a list to int for sorting menator01 2 2,188 Nov-03-2019, 03:00 PM
Last Post: menator01
  Adding adjacent parts of a list TrueStudentOfPython 1 2,326 Nov-09-2018, 02:40 AM
Last Post: ichabod801
  Compare element of list with line of file : if match, delete line silfer 4 3,477 Jul-21-2018, 02:44 PM
Last Post: silfer

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020