How can I make this as computationally efficient as possible?

**Gribouillis** · (This post was last modified: Apr-15-2018, 03:41 PM by Gribouillis.)

I think you can speed up things by generating the individual matching lines first. The following code prints a record for each word appearing in the five columns. Each record contains five lists of all the indexes where the word appears in the columns of sa_data

from collections import defaultdict, namedtuple

def word_idx_dict(seq):
    di = defaultdict(list)
    for i, k in enumerate(seq):
        di[k].append(i)
    return di

def line_matches(sa_data):
    dics  = [word_idx_dict(seq) for seq in sa_data]
    # get words appearing in all the columns
    s = set(dics[0])
    for d in dics[1:]:
        s &= set(d)
    words = sorted(s)
    Record = namedtuple('Record', 'word rownums')
    for w in words:
        yield Record(word=w, rownums=[d[w] for d in dics])

for rec in line_matches(sa_data):
    print(rec)

I think this sequence of records is fast to generate and it should be a better starting point than the raw sa_data array.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	BS4 - Is There A More Efficient Way Of Doing This?	digitalmatic7	4	4,988	Nov-28-2017, 11:33 AM Last Post: snippsat

How can I make this as computationally efficient as possible?

User Panel Messages

Announcements