English Grammar

***ichabod801*** · Jan-05-2020, 04:14 AM

One thing I found I needed when text processing was a lot of functions for handling English grammar. Correct plurals was one thing. I might have something like print('You won {} bucks.'.format(winnings)), but if winnings was 1, that looked stupid. I also wanted processing of lists with conjunctions like 'and' or 'or'. So [1, 1, 3, 5, 5] could be displayed as 'You rolled 1, 1, 3, 5, and 5.'.

These functions were all in the utility module, being used in most of the game modules. But this led to some rather unweildy things, like:

        card_text = utility.plural(cards, 'card')
        play_text = utility.plural(in_play, 'card')
        discard_text = utility.plural(discards, 'card')
        text = 'Deck of cards with {} {}, plus {} {} in play and {} {} discarded'
        return text.format(cards, card_text, in_play, play_text, discards, discard_text)

I'm trying to work on making it easier to program games for t_games. So I wanted this sort of functionality, but easier for the programmer to use. This is what I came up with, and I was wondering what people thought of it. Code first, then examples:

"""
text.py

An easy to use English grammar tool.

Constants:
NINETEEN: English words for 1-19. (list of str)
ORDINALS: Conversion of cardinal numbers to ordinal numbers. (dict of str: str)
PRIMES: All primes under 200. (list of int)
TENS: English words for multiples of 10. (list of str)
THOUSAND_UP: English words for powers of one thousand. (list of str)

Classes:
Text: An easy to use English grammar tool.
"""

NINETEEN = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten',
    'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen',
    'nineteen']

ORDINALS = {'zero': 'zeroth', 'one': 'first', 'two': 'second', 'three': 'third', 'four': 'fourth',
    'five': 'fifth', 'six': 'sixth', 'seven': 'seventh', 'eight': 'eighth', 'nine': 'ninth', 'ten': 'tenth',
    'eleven': 'eleventh', 'twelve': 'twelfth', 'thirteen': 'thirteenth', 'fourteen': 'fourteenth',
    'fifteen': 'fifteenth', 'sixteen': 'sixteenth', 'seventeen': 'seventeenth', 'eighteen': 'eighteenth',
    'nineteen': 'nineteenth', 'twenty': 'twentieth', 'thirty': 'thirtieth', 'forty': 'fortieth',
    'fifty': 'fiftieth', 'sixty': 'sixtieth', 'seventy': 'seventieth', 'eighty': 'eightieth',
    'ninety': 'ninetieth', 'hundred': 'hundredth', 'thousand': 'thousandth', 'million': 'millionth',
    'billion': 'billionth', 'trillion': 'trillionth', 'quadrillion': 'quadrillionth',
    'quintillion': 'quintillionth', 'sextillion': 'sextillionth', 'septillion': 'septillionth',
    'octillion': 'octillionth', 'nonillion': 'nonillionth', 'decillion': 'decillionth',
    'undecillion': 'undecillionth', 'duodecillion': 'duodecillionth', 'tredecillion': 'tredecillionth',
    'quatturodecillion': 'quatturodecillionth', 'quindecillion': 'quindecillionth',
    'sexdecillion': 'sexdecillionth', 'octodecillion': 'octodecillionth', 'novemdecillion':
    'novemdecillionth', 'vigintillion': 'vigintillionth'}

TENS = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']

THOUSAND_UP = ['', 'thousand', 'million', 'billion', 'trillion', 'quadrillion', 'quintillion',
    'sextillion', 'septillion', 'octillion', 'nonillion', 'decillion', 'undecillion', 'duodecillion',
    'tredecillion', 'quatturodecillion', 'quindecillion', 'sexdecillion', 'octodecillion',
    'novemdecillion', 'vigintillion']


class Text(object):
    """
    An easy to use English grammar tool.

    Attributes:
    main: The main object to display. (str, int, or list)
    side: An optiona secondary object to display. (str or int)
    mod: A modifier to the objects to display. (str)

    Methods:
    commas: Text representation of a list with commas. (str)
    default: Just return main as a string. (str)
    hundred_word: Give the word form of a number less than 100. (str)
    number: Text representation of a number. (str)
    number_plural: Text representation of a number with a plural. (str)
    plural: Pluralize a word. (str)
    thouseand_word: Give the word form of a number less than 1000. (str)

    Overridden Methods:
    __init__
    __format__
    __str__
    """

    format_types = 'nosSwzZ'

    def __init__(self, main = '', side = None, mod = None):
        """
        Store the objects to be displayed. (None)

        Parameters:
        main: The main object to display. (str, int, or list)
        side: An optiona secondary object to display. (str or int)
        mod: A modifier to the objects to display. (str)
        """
        # Set the specified attributes.
        self.main = main
        self.side = side
        self.mod = mod
        # Set the default mod based on main.
        if mod is None:
            if isinstance(main, str):
                self.mod = ''
            elif hasattr(self.main, '__len__'):
                self.mod = '{}'
            else:
                self.mod = None
        # Set the output type.
        if isinstance(main, str) and isinstance(side, int):
            self.output = self.plural
        elif isinstance(main, int) and isinstance(side, str):
            self.output = self.number_plural
        elif isinstance(main, int):
            self.output = self.number
        elif hasattr(self.main, '__len__'):
            self.output = self.commas
        else:
            self.output = self.default

    def __format__(self, format_spec):
        """
        Return a formatted text version of the text. (str)

        The additional recognized format types are:
            n: Force numbers to be numeric.
            o: Force numbers to be ordinal words.
            s: Use a serial comma with 'or'.
            S: Use a serial comma with 'and'.
            w: Force numbers to be words.
            z: No serial comma, with 'or'.
            Z: No serial comma, with 'and'.

        Parameters:
        format_spec: The format specification. (str)
        """
        # Use and remove format type if given.
        if format_spec and format_spec[-1] in self.format_types:
            target = self.output(format_spec[-1])
            format_spec = format_spec[:-1]
        else:
            target = self.output()
        # Return the text based on type with the rest of the format spec applied.
        format_text = '{{:{}}}'.format(format_spec)
        return format_text.format(target)

    def __str__(self):
        """Human readable text representation. (str)"""
        return self.output()

    def commas(self, format_type = 'S'):
        """
        Text representation of a list with commas. (str)

        Parameters:
        format_type: The type from the format specification. (str)
        """
        # Determine the conjunction.
        if format_type.isupper():
            conjunction = 'and'
        else:
            conjunction = 'or'
        if len(self.main) == 1:
            # Handle single elements.
            return self.mod.format(self.main)
        elif len(self.main) == 2:
            # Handle a pair of elements.
            template = '{0} {1} {0}'.format(self.mod, conjunction)
            return template.format(*self.main)
        else:
            # Handle three or more elements.
            base = ', '.join(self.mod.format(word) for word in self.main[:-1])
            if format_type.lower() == 'z':
                template = '{} {} {}'
            else:
                template = '{}, {} {}'
            return template.format(base, conjunction, self.mod.format(self.main[-1]))

    def default(self, format_type):
        """
        Just return main as a string. (str)

        Parameters:
        format_type: The type from the format specification. (str)
        """
        return str(self.main)

    def hundred_word(self, n):
        """
        Give the word form of a number less than 100. (str)

        Parameter:
        n: A number to give the word form of. (int)
        """
        n %= 100
        # Don't use zero for compound words.
        if not n:
            return ''
        # Numbers under nineteen are predefined.
        elif n < 20:
            return NINETEEN[n]
        # Number over nineteen must be combined with tens place numbers.
        else:
            word = TENS[n // 10]
            if n % 10:
                word = '{}-{}'.format(word, NINETEEN[n % 10])
            return word

    def number(self, format_type = ''):
        """
        Text representation of a number. (str)

        Parameters:
        format_type: The type from the format specification. (str)
        """
        if not format_type:
            format_type = 'n' if self.main > 10 else 'w'
        if format_type.lower() == 'n':
            return str(self.main)
        else:
            n = self.main
            # Handle zero.
            if not n:
                word = NINETEEN[n]
            else:
                # Loop thruogh powers of one thousand.
                word = ''
                level = 0
                while n:
                    # Add the thousand word with the word for the power of one thousand.
                    word = '{} {} {}'.format(self.thousand_word(n), THOUSAND_UP[level], word).strip()
                    n //= 1000
                    level += 1
            # Convert to an ordinal number if requested.
            if format_type.lower() == 'o':
                words = word.split()
                if '-' in words[-1]:
                    parts = words[-1].split('-')
                    parts[-1] = ORDINALS[parts[-1]]
                    words[-1] = '-'.join(parts)
                else:
                    words[-1] = ORDINALS[words[-1]]
                word = ' '.join(words)
            return word

    def number_plural(self, format_type = ''):
        """
        Text representation of a number with a plural. (str)

        Parameters:
        format_type: The type from the format specification. (str)
        """
        return '{} {}'.format(self.number(format_type), self.plural(format_type))

    def plural(self, format_type = ''):
        """
        Pluralize a word. (str)

        Parameters:
        format_type: The type from the format specification. (str)
        """
        # Get the word and the number.
        if isinstance(self.main, str):
            word, number = self.main, self.side
        else:
            word, number = self.side, self.main
        # Handle the singular.
        if number == 1:
            return str(word)
        # Handle the plural.
        elif self.mod:
            return self.mod
        elif word[-1] in 'sx' or word[-2:] in ('sh', 'ch'):
            return '{}es'.format(word)
        else:
            return '{}s'.format(word)

    def thousand_word(self, n):
        """
        Give the word form of a number less than 1000. (str)

        Parameter:
        n: A number to give the word form of. (int)
        """
        # Force the word to be less than one thousand.
        n %= 1000
        # Handle less than one hunded.
        if n < 100:
            return self.hundred_word(n)
        # Handle above and below one hundred.
        elif n % 100:
            return '{} hundred {}'.format(NINETEEN[n // 100], self.hundred_word(n))
        # Handle no hundred words.
        else:
            return '{} hundred'.format(NINETEEN[n // 100])

Examples:

>>> from text import Text
>>> print(Text(1, 'dog'))                      # not a plural
one dog
>>> print(Text(5, 'dog'))                      # simple plural
five dogs
>>> print(Text(12, 'monkey'))                  # default is words under 11, numerals over 10.
12 monkeys
>>> print('{:n}'.format(Text(1, 'dog')))       # you can force numerals with the n format type.
1 dog
>>> print('{:w}'.format(Text(12, 'monkey')))   # you can force words with the w format type.
twelve monkeys
>>> print(Text(5, 'die', 'dice'))              # you can specify the plural form.
five dice
>>> print(Text('dog', 5))                      # you can get just the plural by reversing the parameters.
dogs
>>> print('{:o} element'.format(Text(5)))      # you can get an ordinal word with the o format type.
fifth element
>>> knights = Text(['Arthur', 'Lancelot', 'Robin', 'Bedevere', 'Galahad'])
>>> print(knights)                             # default list behavior is serial comma with 'and'.
Arthur, Lancelot, Robin, Bedevere, and Galahad
>>> print('{:s}'.format(knights))              # you can specify 'or' with a lower case format type (S is the default).
Arthur, Lancelot, Robin, Bedevere, or Galahad
>>> print('{:Z}'.format(knights))              # you can omit the serial comma with the Z (or z for 'or') format type.
Arthur, Lancelot, Robin, Bedevere and Galahad
>>> print(Text(['Arthur', 'Lancelot', 'Robin', 'Bedevere', 'Galahad'], mod = '{!r}'))  # you can format the words.
'Arthur', 'Lancelot', 'Robin', 'Bedevere', and 'Galahad'
>>> print('{:-^30w}'.format(Text(12, 'monkey')))    # The formats should work with other format specifications.
--------twelve monkeys--------

Does this seem easy enough to use? Did I make it too confusing by putting it all in one class?

**Gribouillis** · (This post was last modified: Jan-05-2020, 08:32 AM by Gribouillis.)

I like the idea, but wouldn't it be a little lighter if instead of a Text class, you defined a special type of formatting strings, say fmtstr, so that you could write

s = fmtstr('{} had {:w}. The {:o} knight to enter the castle had {:w}').format(
              ['Arthur', 'Lancelot', 'Robin'], (2, 'monkey'), 1, (3, 'dog'))

The fmstr.format() method would select its behavior depending the types of its arguments: list, tuples, numbers etc.

**Larz60+** · Jan-05-2020, 09:41 AM

Also, see: http://www.nltk.org/howto/parse.html

***ichabod801*** · Jan-05-2020, 03:39 PM

(Jan-05-2020, 08:32 AM)Gribouillis Wrote: I like the idea, but wouldn't it be a little lighter if instead of a Text class, you defined a special type of formatting strings, say fmtstr, so that you could write
s = fmtstr('{} had {:w}. The {:o} knight to enter the castle had {:w}').format(
              ['Arthur', 'Lancelot', 'Robin'], (2, 'monkey'), 1, (3, 'dog'))
The fmstr.format() method would select its behavior depending the types of its arguments: list, tuples, numbers etc.

I like the idea of moving it out front, and I think that makes it a little clearer. But then you are having to distinguish between lists and tuples, which I think is a problem. The code might have the knights in a tuple, and then it might be misinterpreted by the fmtstr.

Perhaps if there was no default behavior for lists or tuples, and everything had to have an explicit format type, that might work:

s = fmtstr('{:S} had {:w}. The {:o} knight to enter the castle had {:w}').format(
              ('Arthur', 'Lancelot', 'Robin'), [2, 'monkey'], 1, (3, 'dog'))

So here the knights are interpreted to be comma-separated because of the S format type rather than it being a list, and [2, 'monkey'] and (3, 'dog') can both be handled as numbers with plurals even though they have different types.

English Grammar

User Panel Messages

Announcements