How can I improve this piece of code?

aquerci · (This post was last modified: Nov-16-2019, 10:03 PM by aquerci.)

Hi all,

In these days I wrote a script. It works very well but a piece of his code is very slow because of a custom function. you can see it below:

def split_rules(rules, *index):
    rule = ""
    if len(index) == 1:
        rules_splitted = rules.split("\n")
        first_rule_line = rules_splitted[0]
        index_0 = index[0]
        rule_name = first_rule_line.split()[index_0]
        for line in rules_splitted:
            list = line.split()
            if rule_name == list[index_0]:
                rule = rule + line + "\n"
            else:
                rule_name = line.split()[index_0]
                rule = rule + "\n" + line + "\n"
        rules = rule.rstrip("\n")   
    elif len(index) == 2:
        rules_splitted = rules.split("\n")
        first_rule_line = rules_splitted[0]
        first_rule_line_splitted = first_rule_line.split()
        index_0 = index[0]
        index_1 = index[1]
        rule_name_1 = first_rule_line_splitted[index_0]
        rule_name_2 = first_rule_line_splitted[index_1]
        for line in rules_splitted:
            list = line.split()
            if rule_name_1 == list[index_0] and rule_name_2 == list[index_1]:
                rule = rule + line + "\n"
            else:
                rule_name_1 = list[index_0]
                rule_name_2 = list[index_1]
                rule = rule + "\n" + line + "\n"
        rules = rule.rstrip("\n")    
    elif len(index) == 3:
        rules_splitted = rules.split("\n")
        first_rule_line = rules_splitted[0]
        first_rule_line_splitted = first_rule_line.split()
        index_0 = index[0]
        index_1 = index[1]
        index_2 = index[2] 
        rule_name_1 = first_rule_line_splitted[index_0]
        rule_name_2 = first_rule_line_splitted[index_1]
        rule_name_3 = first_rule_line_splitted[index_2]
        for line in rules_splitted:
            list = line.split()
            if rule_name_1 == list[index_0] and rule_name_2 == list[index_1] and rule_name_3 == list[index_2]: 
                rule = rule + line + "\n"
            else:
                rule_name_1 = list[index_0]
                rule_name_2 = list[index_1]
                rule_name_3 = list[index_2]
                rule = rule + "\n" + line + "\n"
        rules = rule.rstrip("\n")   
    return(rules)

the "split_rules" function splits a text file containing many policies. what do I mean with policies? I mean anything like that:

Quote:from-zone A to-zone B policy CICCIO ...
from-zone A to-zone B policy CICCIO ...
from-zone A to-zone B policy PIPPO ...
from-zone X to-zone Y policy PIPPO ...
from-zone X to-zone Y policy PIPPO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PAPERINO ...
from-zone Z to-zone Z policy PAPERINO ...
from-zone Z to-zone Z policy CICCIO ...
from-zone Z to-zone Z policy CICCIO ...

this text file is composed by five policies (you can identify them keeping in mind three parameters: source zone, destination zone and policy name). my custom function can split them with a duble new line and save them in a new variable. it's the result:

text = open("test.txt", "r")
text = text.read()

print(split_rules(text, 1, 3, 5))

# print output:
#
# from-zone A to-zone B policy CICCIO ...
# from-zone A to-zone B policy CICCIO ...
# 
# from-zone A to-zone B policy PIPPO ...
# 
# from-zone X to-zone Y policy PIPPO ...
# from-zone X to-zone Y policy PIPPO ...
# 
# from-zone Z to-zone Z policy PLUTO ...
# from-zone Z to-zone Z policy PLUTO ...
# from-zone Z to-zone Z policy PLUTO ...
# 
# from-zone Z to-zone Z policy PAPERINO ...
# from-zone Z to-zone Z policy PAPERINO ...
# 
# from-zone Z to-zone Z policy CICCIO ...
# from-zone Z to-zone Z policy CICCIO ...

this function works, it reaches his goal very well but it seems that it is not enough efficent when it have to process more than 40000 lines. with 40000 lines it takes around 2 minutes to process the policies, but with 150000 lines it takes around 15 minutes!

with many lines it's not scalable. how can I improve my custom function to become it scalable and more efficient than before?

**Gribouillis** · (This post was last modified: Nov-17-2019, 10:52 AM by Gribouillis.)

Among the many things that can be sped up, the most blatant one is the construction of the output string by the statements

rule = ''
...
rule = rule + morerule
...
return rule

Instead of this, you could append lines to a list

outrule = []
...
outrule.append(some_line)
...
return "\n".join(outrule)

**buran** · Nov-17-2019, 10:24 AM

in addition to @Gribouillis' advise, to simplify your code:
create a function that parse a single line. make it return a tuple. compare tuples, not single elements.

def parse_line(line, items):
    split_line = line.split()
    return tuple(split_line[idx] for idx in items)

def split_rules(rules, *items):
    result = []
    prev_rule = None
    for line in rules.split('\n'):
        rule = parse_line(line, items)
        if prev_rule and rule != prev_rule:
            result.append('')
        prev_rule = rule
        result.append(line)
    return result

rules = """from-zone A to-zone B policy CICCIO ...
from-zone A to-zone B policy CICCIO ...
from-zone A to-zone B policy PIPPO ...
from-zone X to-zone Y policy PIPPO ...
from-zone X to-zone Y policy PIPPO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PAPERINO ...
from-zone Z to-zone Z policy PAPERINO ...
from-zone Z to-zone Z policy CICCIO ...
from-zone Z to-zone Z policy CICCIO ..."""

print('\n'.join(split_rules(rules, 1, 3, 5)))

Output:from-zone A to-zone B policy CICCIO ...
from-zone A to-zone B policy CICCIO ...

from-zone A to-zone B policy PIPPO ...

from-zone X to-zone Y policy PIPPO ...
from-zone X to-zone Y policy PIPPO ...

from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...

from-zone Z to-zone Z policy PAPERINO ...
from-zone Z to-zone Z policy PAPERINO ...

from-zone Z to-zone Z policy CICCIO ...
from-zone Z to-zone Z policy CICCIO ...

**Gribouillis** · Nov-17-2019, 10:57 AM

In addition to buran's code, you could consider using itertools.groupby() to group lines together. Also the function could be made more versatile by accepting file handles for input and output instead of strings. Alternately, it could take an iterable as input and generate lines of output with yield statements.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	An unexplainable error in .format statement - but only in a larger piece of code?	ToniE	4	699	Sep-05-2023, 12:50 PM Last Post: ToniE
	First piece of code	Bearwulf	2	768	Aug-02-2023, 04:03 PM Last Post: deanhystad
	(OpenCV) Help to improve code for object detection and finding center of it	saoko	0	1,204	May-14-2022, 05:34 PM Last Post: saoko
	Understanding a piece of code	Michael1	4	1,408	Jan-20-2022, 07:14 PM Last Post: Michael1
	How to make this piece concise	leoahum	0	1,336	Sep-23-2021, 09:23 PM Last Post: leoahum
	.maketrans() - a piece of code which needs some explanation	InputOutput007	5	2,962	Jan-28-2021, 05:05 PM Last Post: buran
	how can I improve the code to get it faster?	aquerci	2	1,714	Feb-15-2020, 02:52 PM Last Post: aquerci
	I am a newbie.Help me with this simple piece of code	feynarun	3	2,811	Jan-08-2020, 12:40 PM Last Post: perfringo
	Help improve code efficiency	benbrown03	9	4,351	Feb-20-2019, 03:45 PM Last Post: ichabod801
	How to improve the quality of my code?	grobattac37	3	2,485	Jan-25-2019, 06:17 PM Last Post: ichabod801

How can I improve this piece of code?

User Panel Messages

Announcements