Python Forum
How can I improve this piece of code?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can I improve this piece of code?
#1
Hi all,

In these days I wrote a script. It works very well but a piece of his code is very slow because of a custom function. you can see it below:
def split_rules(rules, *index):
    rule = ""
    if len(index) == 1:
        rules_splitted = rules.split("\n")
        first_rule_line = rules_splitted[0]
        index_0 = index[0]
        rule_name = first_rule_line.split()[index_0]
        for line in rules_splitted:
            list = line.split()
            if rule_name == list[index_0]:
                rule = rule + line + "\n"
            else:
                rule_name = line.split()[index_0]
                rule = rule + "\n" + line + "\n"
        rules = rule.rstrip("\n")   
    elif len(index) == 2:
        rules_splitted = rules.split("\n")
        first_rule_line = rules_splitted[0]
        first_rule_line_splitted = first_rule_line.split()
        index_0 = index[0]
        index_1 = index[1]
        rule_name_1 = first_rule_line_splitted[index_0]
        rule_name_2 = first_rule_line_splitted[index_1]
        for line in rules_splitted:
            list = line.split()
            if rule_name_1 == list[index_0] and rule_name_2 == list[index_1]:
                rule = rule + line + "\n"
            else:
                rule_name_1 = list[index_0]
                rule_name_2 = list[index_1]
                rule = rule + "\n" + line + "\n"
        rules = rule.rstrip("\n")    
    elif len(index) == 3:
        rules_splitted = rules.split("\n")
        first_rule_line = rules_splitted[0]
        first_rule_line_splitted = first_rule_line.split()
        index_0 = index[0]
        index_1 = index[1]
        index_2 = index[2] 
        rule_name_1 = first_rule_line_splitted[index_0]
        rule_name_2 = first_rule_line_splitted[index_1]
        rule_name_3 = first_rule_line_splitted[index_2]
        for line in rules_splitted:
            list = line.split()
            if rule_name_1 == list[index_0] and rule_name_2 == list[index_1] and rule_name_3 == list[index_2]: 
                rule = rule + line + "\n"
            else:
                rule_name_1 = list[index_0]
                rule_name_2 = list[index_1]
                rule_name_3 = list[index_2]
                rule = rule + "\n" + line + "\n"
        rules = rule.rstrip("\n")   
    return(rules)
the "split_rules" function splits a text file containing many policies. what do I mean with policies? I mean anything like that:
Quote:from-zone A to-zone B policy CICCIO ...
from-zone A to-zone B policy CICCIO ...
from-zone A to-zone B policy PIPPO ...
from-zone X to-zone Y policy PIPPO ...
from-zone X to-zone Y policy PIPPO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PAPERINO ...
from-zone Z to-zone Z policy PAPERINO ...
from-zone Z to-zone Z policy CICCIO ...
from-zone Z to-zone Z policy CICCIO ...

this text file is composed by five policies (you can identify them keeping in mind three parameters: source zone, destination zone and policy name). my custom function can split them with a duble new line and save them in a new variable. it's the result:
text = open("test.txt", "r")
text = text.read()

print(split_rules(text, 1, 3, 5))

# print output:
#
# from-zone A to-zone B policy CICCIO ...
# from-zone A to-zone B policy CICCIO ...
# 
# from-zone A to-zone B policy PIPPO ...
# 
# from-zone X to-zone Y policy PIPPO ...
# from-zone X to-zone Y policy PIPPO ...
# 
# from-zone Z to-zone Z policy PLUTO ...
# from-zone Z to-zone Z policy PLUTO ...
# from-zone Z to-zone Z policy PLUTO ...
# 
# from-zone Z to-zone Z policy PAPERINO ...
# from-zone Z to-zone Z policy PAPERINO ...
# 
# from-zone Z to-zone Z policy CICCIO ...
# from-zone Z to-zone Z policy CICCIO ...
this function works, it reaches his goal very well but it seems that it is not enough efficent when it have to process more than 40000 lines. with 40000 lines it takes around 2 minutes to process the policies, but with 150000 lines it takes around 15 minutes!

with many lines it's not scalable. how can I improve my custom function to become it scalable and more efficient than before?
Reply
#2
Among the many things that can be sped up, the most blatant one is the construction of the output string by the statements
rule = ''
...
rule = rule + morerule
...
return rule
Instead of this, you could append lines to a list
outrule = []
...
outrule.append(some_line)
...
return "\n".join(outrule)
Reply
#3
in addition to @Gribouillis' advise, to simplify your code:
create a function that parse a single line. make it return a tuple. compare tuples, not single elements.

def parse_line(line, items):
    split_line = line.split()
    return tuple(split_line[idx] for idx in items)

def split_rules(rules, *items):
    result = []
    prev_rule = None
    for line in rules.split('\n'):
        rule = parse_line(line, items)
        if prev_rule and rule != prev_rule:
            result.append('')
        prev_rule = rule
        result.append(line)
    return result

rules = """from-zone A to-zone B policy CICCIO ...
from-zone A to-zone B policy CICCIO ...
from-zone A to-zone B policy PIPPO ...
from-zone X to-zone Y policy PIPPO ...
from-zone X to-zone Y policy PIPPO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PLUTO ...
from-zone Z to-zone Z policy PAPERINO ...
from-zone Z to-zone Z policy PAPERINO ...
from-zone Z to-zone Z policy CICCIO ...
from-zone Z to-zone Z policy CICCIO ..."""

print('\n'.join(split_rules(rules, 1, 3, 5)))
Output:
from-zone A to-zone B policy CICCIO ... from-zone A to-zone B policy CICCIO ... from-zone A to-zone B policy PIPPO ... from-zone X to-zone Y policy PIPPO ... from-zone X to-zone Y policy PIPPO ... from-zone Z to-zone Z policy PLUTO ... from-zone Z to-zone Z policy PLUTO ... from-zone Z to-zone Z policy PLUTO ... from-zone Z to-zone Z policy PAPERINO ... from-zone Z to-zone Z policy PAPERINO ... from-zone Z to-zone Z policy CICCIO ... from-zone Z to-zone Z policy CICCIO ...
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#4
In addition to buran's code, you could consider using itertools.groupby() to group lines together. Also the function could be made more versatile by accepting file handles for input and output instead of strings. Alternately, it could take an iterable as input and generate lines of output with yield statements.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  An unexplainable error in .format statement - but only in a larger piece of code? ToniE 4 699 Sep-05-2023, 12:50 PM
Last Post: ToniE
  First piece of code Bearwulf 2 768 Aug-02-2023, 04:03 PM
Last Post: deanhystad
  (OpenCV) Help to improve code for object detection and finding center of it saoko 0 1,204 May-14-2022, 05:34 PM
Last Post: saoko
  Understanding a piece of code Michael1 4 1,408 Jan-20-2022, 07:14 PM
Last Post: Michael1
  How to make this piece concise leoahum 0 1,336 Sep-23-2021, 09:23 PM
Last Post: leoahum
  .maketrans() - a piece of code which needs some explanation InputOutput007 5 2,962 Jan-28-2021, 05:05 PM
Last Post: buran
  how can I improve the code to get it faster? aquerci 2 1,714 Feb-15-2020, 02:52 PM
Last Post: aquerci
  I am a newbie.Help me with this simple piece of code feynarun 3 2,811 Jan-08-2020, 12:40 PM
Last Post: perfringo
  Help improve code efficiency benbrown03 9 4,351 Feb-20-2019, 03:45 PM
Last Post: ichabod801
  How to improve the quality of my code? grobattac37 3 2,485 Jan-25-2019, 06:17 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020