Performance degradation with IO class inheritance - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Performance degradation with IO class inheritance (/thread-27699.html) |
Performance degradation with IO class inheritance - wsygzyx - Jun-17-2020 Hi all! I recently tried to build a new IO class from python built-in IO classes. I noticed some performance degradation for these new classes. I wondering if anyone knows what might be the possible cause/fix for this. from _io import FileIO, BufferedReader class FileIO_new(FileIO): () class BufferedReader_new(BufferedReader): () def built_in_io(): # a 3.5 MB file raw = FileIO('filename', 'r') buffer = BufferedReader(raw) lines = buffer.readlines() buffer.close() def new_io(): raw = FileIO_new('filenmae','r') buffer = BufferedReader_new(raw) lines = buffer.readlines() buffer.close() >>>%timeit built_in_io() 1.79 ms ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>>%timeit new_io() 5.25 ms ± 87 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) RE: Performance degradation with IO class inheritance - DeaD_EyE - Jun-17-2020 It's really slower. If you use a ramdisk, you've lesser problems with caching and buffers during testing. But I got the similar results. Here my testcode: import io import os import shlex import timeit from pathlib import Path from shutil import rmtree from subprocess import call from contextlib import contextmanager RAMDISK = Path("ramdisk") TESTFILE = RAMDISK / "testfile.bin" def mount_ramfs(): RAMDISK.mkdir(exist_ok=True) cmd = shlex.split(f"sudo mount -t ramfs ramfs {RAMDISK}") call(cmd) def umount_ramfs(): cmd = shlex.split(f"sudo umount {RAMDISK}") call(cmd) RAMDISK.rmdir() def make_test_file(): with TESTFILE.open("wb") as fd: fd.write(os.urandom(4 * 1024 ** 2)) @contextmanager def with_testfile(): mount_ramfs() make_test_file() try: yield TESTFILE except Exception as e: print(repr(e)) umount_ramfs() ## testcode ## class FileIO_new(io.FileIO): pass class BufferedReader_new(io.BufferedReader): pass def built_in_io(testfile): raw = io.FileIO(testfile, 'r') buffer = io.BufferedReader(raw) lines = buffer.readlines() buffer.close() def new_io(testfile): raw = FileIO_new(testfile, 'r') buffer = BufferedReader_new(raw) lines = buffer.readlines() buffer.close() with with_testfile() as tf: print(tf, tf.stat().st_size / 1024 ** 2, "MiB") result_built_in = timeit.timeit("built_in_io(tf)", globals=globals(), number=1000) result_new_io = timeit.timeit("new_io(tf)", globals=globals(), number=1000) print(f"BuiltIn: {result_built_in:.3f}") print(f"NewIO: {result_new_io:.3f}") The built-in _io is implemented in C.If you inherit from it, then Python comes into the game. I guess this is the cause why it's much slower. The time spent to creatine new instances seems not to be the problem. I changed the test code a bit, that I create the instances only once a dusing instead seek(0). Same result, no improvement. I guess that's why the project Borgbackup implemented the functions for file access/chunking etc. with Cython and C. Borgbackup is a tool for backups with deduplication written in Python: https://github.com/borgbackup/borg RE: Performance degradation with IO class inheritance - wsygzyx - Jun-18-2020 Thanks for answering! Just find it weird (and interesting) somehow the performance is not the same, but still outperforms the python implementation of io (_pyio) so much. Also thanks for the tricks! |