function to untabify - Printable Version

function to untabify - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: General (https://python-forum.io/forum-1.html)
+--- Forum: News and Discussions (https://python-forum.io/forum-31.html)
+--- Thread: function to untabify (/thread-37949.html)

function to untabify - Skaperen - Aug-12-2022

is there a function that comes with Python that can untabify a line of text that is UTF-8 encoded? it should have a parameter for the tab size and/or default to 8.

RE: function to untabify - ndc85430 - Aug-13-2022

Could you clarify what you expect such a function to do? My guess at the behaviour you want:

import unittest


class TestUntabify(unittest.TestCase):
    def test_it_removes_a_tab_of_the_given_size_from_a_line(self):
        tab_size = 4
        tab = " " * tab_size
        line = f"{tab}ぃかの"

        untabbed_line = untabify(line, tab_size)

        self.assertEqual(untabbed_line, "ぃかの")

    def test_it_does_not_remove_a_tab_that_is_smaller_than_the_given_size(self):
        tab_size = 4
        tab = " " * 2
        line = f"{tab}ぃかの"

        returned_line = untabify(line, tab_size)

        self.assertEqual(returned_line, line)

    def test_it_removes_tab_characters_from_the_line_if_their_number_match_the_given_size(self):
        tab_size = 2
        tab = "\t" * tab_size
        line = f"{tab}abc"

        untabbed_line = untabify(line, tab_size)

        self.assertEqual(untabbed_line, "abc")


if __name__ == "__main__":
    unittest.main()

I didn't implement the function, but I tend to use tests to clarify understanding of a problem. These were just the cases I could think of. Are there others?

RE: function to untabify - perfringo - Aug-13-2022

There is built-in textwrap which could have required functionality

RE: function to untabify - Skaperen - Aug-14-2022

we obviously overlooked str.expandtabs(). bytes can do this, too, but it will probably count Unicode characters as the number of bytes UTF-8 encodes them as, so the str version is probably the one to use.