Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Make my code more pythonic
#1
Hi,

I am a beginner in python. I've read the Fluent in Python book, and I am trying to dive deeper.
Here is a file based merge sort. I have a file that contains around 10 million numbers.Each number is in their own line. The program will read 1 million numbers at a time, sort them, and store them in temporary files. Finally, a merge process will merge all temporary files into one output file.

I am not very confident about what I coded. Can you guys help make this little program more pythonic? Is there anything I missed that could save more memory or cpu time? Any help is appreciated.

from itertools import zip_longest
import glob


def grouper(iterable, n, fillvalue=None):
    """Collect data into fixed-length chunks or blocks"""
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)


class MergeSort:
    # temp folder
    TEMP_FOLDER = "temp/"
    # number of lines to read at a time
    N = 1000000

    def __init__(self, input_file_name, output_file_name):
        self.input_file_name = input_file_name
        self.output_file_name = output_file_name

    def __divide(self):
        """read numbers to sort, divide into chunks, sort each chunk and write to temp files"""
        with open(self.input_file_name, "r") as fin:
            i = 1  # only used for logging purpose
            for lines in grouper(fin, self.N):
                print("Sorting batch " + str(i) + "...")
                data = sorted(lines, key=int)
                with open(self.TEMP_FOLDER + str(i) + ".text", "w+") as temp:
                    print("Writing batch " + str(i) + "...")
                    temp.writelines(data)
                i += 1

    def __merge(self):
        """merge sorted temp files into one"""
        files = []
        for f in glob.glob(self.TEMP_FOLDER + "*.text"):
            print(f)
            fin = open(f, "r")
            files.append(fin)

        with open(self.output_file_name, "w+") as fout:
            values = []
            for f in files:
                val = int(f.readline())
                values.append(val)

            while files:
                minval = min(values)
                index = values.index(minval)
                print(minval)
                fout.write(str(minval) + "\n")

                # read next line
                data = files[index].readline()
                if data:
                    del values[index]  # values.pop(index)
                    values.insert(index, int(data))
                else:
                    files[index].close()  # close temp file
                    del files[index]  # files.pop(index)
                    del values[index]  # values.pop(index)

    def mergesort(self):
        self.__divide()
        self.__merge()


if __name__ == '__main__':
    # file to sort
    INPUT_FILE_NAME = "numbers_to_sort.text"
    # sorted file
    OUTPUT_FILE_NAME = "sorted_numbers.text"
    MergeSort(INPUT_FILE_NAME, OUTPUT_FILE_NAME).mergesort()
Reply
#2
a great helper is to load:
pycodestyle: https://pypi.org/project/pycodestyle/
pip install pycodestyle
and
pylint: https://github.com/PyCQA/pylint
pip install pylint
any problems, see: pip install pylint

and run both against your code
Reply
#3
Two things stand out to me: one, class attributes should be lowercase with underscores for spaces. Two, don't use single letter variable names like 'i'. Use something more descriptive.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#4
(Sep-02-2018, 04:50 PM)ichabod801 Wrote: class attributes should be lowercase
Thank you. I meant to define those as constants. Is it a good place to put?
Reply
#5
(Sep-02-2018, 04:57 PM)aug828 Wrote: Thank you. I meant to define those as constants. Is it a good place to put?

I think TEMP_FOLDER is fine, since it is only used in the class. N i would think should be a parameter of __init__, with the current value as a default. It seems like someone might want to change that. If not, it's okay where it is.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#6
An excellent thing to do is to start writing a testing module for your module. Its role is to check automatically that
the module works as expected. Here is a starting point, supposing that your program's name is mymodule
""" testmymodule.py

unitary tests for mymodule
"""
import unittest as ut
from mymodule import grouper, MergeSort

class TestGrouper(ut.TestCase):

    def test_grouping_7_items_by_3(self):
        result = list(grouper('ABCDEFG', 3, 'x'))
        self.assertEqual(result, [
            ('A','B','C'), ('D', 'E', 'F'), ('G', 'x', 'x')])


if __name__ == '__main__':
    ut.main()
You can add tests for the MergeSort class and perhaps other tests for the grouper. The testing module can save you a lot of time later by detecting things that go wrong during the development phase of the main program.

The next great thing to do is to install the coverage module and to run it against your test module by typing something like
Output:
coverage run --omit=unittest,testmymodule testmymodule.py
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  "pythonic" way of splitting up my code? liggisten 4 602 Dec-30-2023, 08:23 PM
Last Post: Gribouillis
  hi need help to make this code work correctly atulkul1985 5 702 Nov-20-2023, 04:38 PM
Last Post: deanhystad
  newbie question - can't make code work tronic72 2 626 Oct-22-2023, 09:08 PM
Last Post: tronic72
  Cleaning my code to make it more efficient BSDevo 13 1,275 Sep-27-2023, 10:39 PM
Last Post: BSDevo
  how to make bot that sends instagram auto password reset code kraixx 2 1,283 Mar-04-2023, 09:59 PM
Last Post: jefsummers
  Make code non-blocking? Extra 0 1,099 Dec-03-2022, 10:07 PM
Last Post: Extra
  Make the code shorter quest 2 1,476 Mar-14-2022, 04:28 PM
Last Post: deanhystad
  How would you (as an python expert) make this code more efficient/simple coder_sw99 3 1,755 Feb-21-2022, 10:52 AM
Last Post: Gribouillis
  Pyspark - my code works but I want to make it better Kevin 1 1,745 Dec-01-2021, 05:04 AM
Last Post: Kevin
  List index out of range error when attempting to make a basic shift code djwilson0495 4 2,935 Aug-16-2020, 08:56 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020