program wanted for Posix

Skaperen · May-11-2024, 11:54 PM

maybe you know the "head" command. maybe you know the "tail" command. i'd like to see someone create a "headtail" command that outputs the head of a file followed by the tail of that same file. if the file is so small that these would overlap or duplicate, then it should just output the whole file. arguments should allow specifying how many lines for the head part and how many lines for the tail part. and, of course, it must be written in Python3.

**Gribouillis** · (This post was last modified: May-12-2024, 10:19 AM by Gribouillis.)

You could use this head_tail() function to implement it

import itertools as itt
import more_itertools as moi
import unittest as ut


def head_tail(iterable, h, t):
    """Compute the head and the tail of an iterable

    Generates two iterables 'head' and 'tail'. The first one
    attempts to generate the h first items of the original
    iterable if possible. When the first iterable is
    exhausted, the second iterable attempts to generate the
    t last items remaining in the original iterable if that
    is possible.

    It is necessary to consume the 'head' iterable before
    attempting to iterate on the 'tail' iterable. The
    function more_itertools.consume() can be used to consume
    an iterable.

    Exemple:

    >>> head, tail = head_tail('abcdefghi', 4, 3)
    >>> print(list(head)) # prints ['a', 'b', 'c', 'd']
    >>> print(list(tail)) # prints ['g', 'h', 'i']
    """
    if (h < 0) or (t < 0):
        raise ValueError("nonnegative lengths expected")
    iterable = iter(iterable)

    yield itt.islice(iterable, None, h)

    if (t == 0):
        yield iter(())
    else:
        yield moi.islice_extended(iterable, -t, None)

class TestHeadTail(ut.TestCase):
    def test_large_data(self):
        L = "abcdefghijkl"
        head, tail = head_tail(L, 4, 3)
        self.assertEqual(list(head), list("abcd"))
        self.assertEqual(list(tail), list("jkl"))

    def test_tail_truncated(self):
        L = "abcdefghi"
        head, tail = head_tail(L, 7, 4)
        self.assertEqual(list(head), list("abcdefg"))
        self.assertEqual(list(tail), list("hi"))

    def test_zero_head(self):
        L = "abcdefgh"
        head, tail = head_tail(L, 0, 4)
        self.assertEqual(list(head), [])
        self.assertEqual(list(tail), list("efgh"))

    def test_zero_tail(self):
        L = "abcdefgh"
        head, tail = head_tail(L, 4, 0)
        self.assertEqual(list(head), list("abcd"))
        self.assertEqual(list(tail), [])


if __name__ == "__main__":
    ut.main()

**Gribouillis** · (This post was last modified: May-13-2024, 08:38 AM by Gribouillis.)

Here is a complete script (for a single file argument). Assumes utf8 encoding of the target file

#!/usr/bin/env python
from argparse import ArgumentParser
import itertools as itt
import more_itertools as moi
import sys
import unittest as ut


def head_tail(iterable, h, t):
    """Compute the head and the tail of an iterable

    Generates two iterables 'head' and 'tail'. The first one
    attempts to generate the h first items of the original
    iterable if possible. When the first iterable is
    exhausted, the second iterable attempts to generate the
    t last items remaining in the original iterable if that
    is possible.

    It is necessary to consume the 'head' iterable before
    attempting to iterate on the 'tail' iterable. The
    function more_itertools.consume() can be used to consume
    an iterable.

    Exemple:

    >>> head, tail = head_tail('abcdefghi', 4, 3)
    >>> print(list(head)) # prints ['a', 'b', 'c', 'd']
    >>> print(list(tail)) # prints ['g', 'h', 'i']
    """
    if (h < 0) or (t < 0):
        raise ValueError("nonnegative lengths expected")
    iterable = iter(iterable)

    yield itt.islice(iterable, None, h)

    if (t == 0):
        yield iter(())
    else:
        yield moi.islice_extended(iterable, -t, None)



class TestHeadTail(ut.TestCase):
    def test_large_data(self):
        L = "abcdefghijkl"
        head, tail = head_tail(L, 4, 3)
        self.assertEqual(list(head), list("abcd"))
        self.assertEqual(list(tail), list("jkl"))

    def test_tail_truncated(self):
        L = "abcdefghi"
        head, tail = head_tail(L, 7, 4)
        self.assertEqual(list(head), list("abcdefg"))
        self.assertEqual(list(tail), list("hi"))

    def test_zero_head(self):
        L = "abcdefgh"
        head, tail = head_tail(L, 0, 4)
        self.assertEqual(list(head), [])
        self.assertEqual(list(tail), list("efgh"))

    def test_zero_tail(self):
        L = "abcdefgh"
        head, tail = head_tail(L, 4, 0)
        self.assertEqual(list(head), list("abcd"))
        self.assertEqual(list(tail), [])

def main():
    parser = ArgumentParser(description='Print head and tail of files')
    parser.add_argument('-n', '--headsize', type=int, default=10, dest='headsize', help='defaults to 10 lines')
    parser.add_argument('-m', '--tailsize', type=int, default=10, dest='tailsize', help='defaults to 10 lines')
    parser.add_argument('file', metavar='FILE')
    arg = parser.parse_args()
    with open(arg.file) as infile:
        print(f'==> (head) {arg.file} <==')
        head, tail = head_tail(infile, arg.headsize, arg.tailsize)
        sys.stdout.writelines(head)
        print(f'==> (tail) {arg.file} <==')
        sys.stdout.writelines(tail)



if __name__ == "__main__":
    main()

Output:λ python paillasse/pf/headtail.py -h
usage: headtail.py [-h] [-n HEADSIZE] [-m TAILSIZE] FILE

Print head and tail of files

positional arguments:
  FILE

options:
  -h, --help            show this help message and exit
  -n HEADSIZE, --headsize HEADSIZE
                        defaults to 10 lines
  -m TAILSIZE, --tailsize TAILSIZE
                        defaults to 10 lines
λ

Example

Output:λ python paillasse/pf/headtail.py paillasse/pf/headtail.py -n 5 -m 7
==> (head) paillasse/pf/headtail.py <==
from argparse import ArgumentParser
import itertools as itt
import more_itertools as moi
import sys
import unittest as ut
==> (tail) paillasse/pf/headtail.py <==
        print(f'==> (tail) {arg.file} <==')
        sys.stdout.writelines(tail)



if __name__ == "__main__":
    main()

***snippsat*** · (This post was last modified: May-13-2024, 03:06 PM by snippsat.)

Skaperen you should show some effort when posting these requests 💤

(May-13-2024, 08:32 AM)Gribouillis Wrote: Here is a complete script (for a single file argument). Assumes utf8 encoding of the target file

Good,to make more robust as it fail sometime on files with Unicodes on Windows.
Windows dos bad guessing:

Error:
UnicodeDecodeError: 'charmap' codec can't decode byte .....

with open(arg.file, encoding='utf-8', errors='ignore') as infile:
    print(f'==> (head) {arg.file} <==')

Here my take on this and i use Typer for CLI application.

import typer
from collections import deque

app = typer.Typer()

@app.command()
def headtail(filename: str, head: int = 5, tail: int = 5):
    tail_deque = deque(maxlen=tail)
    head_list = []
    total_lines = 0
    try:
        with open(filename, encoding='utf-8', errors='ignore') as file:
            for i, line in enumerate(file):
                if i < head:
                    head_list.append(line)
                tail_deque.append(line)
                total_lines += 1
        if total_lines <= head + tail:
            for line in head_list:
                print(line, end='')
        else:
            for line in head_list:
                print(line, end='')
            if tail > 0:
                print("...........")
            for line in tail_deque:
                print(line, end='')
    except FileNotFoundError:
        typer.echo(f"Error: The file '{filename}' does not exist.", err=True)
    except Exception as e:
        typer.echo(f"An error occurred: {e}", err=True)

if __name__ == "__main__":
    app()

Compare to see that it work the same.

Output:G:\div_code\reader_env
λ python head_tail_grib.py -n 2 -m 7 contry.txt
==> (head) contry.txt <==
Tokyo;35.6897
Jakarta;-6.1750
==> (tail) contry.txt <==
Nanyang;32.9987
Hangzhou;30.2500
Foshan;23.0292
Nagoya;35.1833
Taipei;25.0375
Tongshan;34.2610
Dhanbād;23.7998
G:\div_code\reader_env

λ python head_tail.py contry.txt --head 2 --tail 7
Tokyo;35.6897
Jakarta;-6.1750
............
Nanyang;32.9987
Hangzhou;30.2500
Foshan;23.0292
Nagoya;35.1833
Taipei;25.0375
Tongshan;34.2610
Dhanbād;23.7998
G:\div_code\reader_env

Update: had two "with open" need only one.

Skaperen · May-18-2024, 06:21 PM

(May-13-2024, 11:08 AM)snippsat Wrote: Skaperen you should show some effort when posting these requests

i usually have, at least, made some effort offline, in this case i searched for such a program and found none. then i tried to think what would be needed to do it, though i really didn't want to actually do it or at least not make it public because i would not want support it.

Unicode and utf8 has been troubling at times. some byte codes and combination imply or mean some things that Unicode considers to be invalid. i'm unsure how best to resolve things like that. one tool i have lets me display binary bytes interleaved with ascii characters. it lets me get idea what is in there. i just see the ascii between dots and "random" characters. if that text was utf8, it could be worse.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	the posix "cut" command	Skaperen	0	2,423	Oct-28-2018, 08:29 PM Last Post: Skaperen
	program wanted in python	Skaperen	2	3,477	Aug-07-2018, 12:05 AM Last Post: Skaperen
	program wanted: diff that ignores numbers	Skaperen	0	2,340	Jun-16-2018, 02:05 AM Last Post: Skaperen
	program wanted: clean up pyc files	Skaperen	6	7,228	Jun-13-2017, 05:42 PM Last Post: snippsat
	anyone interested in a project of making python versions of various POSIX commands	Skaperen	10	12,686	Oct-17-2016, 08:36 AM Last Post: wavic

program wanted for Posix

User Panel Messages

Announcements