Bottom Page

Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
a script i need: column alignment
#1
i often need a command/program/script that can read in a text file with many columns that are not lined up and line them up to make all the lines be consistent. clearly it must read in the entire file to know the width and alignment (right if every line has a number in that column, else left) in order to print out the first line (and all the remaining lines).

in almost all such cases, such poorly aligned text comes from a big pipeline or other complex command. so i need to be sure it can read from stdin, at least.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#2
If wrapping is an option, you can make a best guess, and wrap anything encountered that is longer than the estimated size.
The only way I can see getting it to fit every case to to read the file twice. Even then, some data may be too long to to accommodate without wrapping.

Have you looked on Pypi?
Quote
#3
For my tabulating/table needs I have used tabulate.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Quote
#4
(Jan-09-2019, 05:43 AM)Larz60+ Wrote: Have you looked on Pypi?

does pypi have a "what all it really really really does" list to look through? no, i have not looked through the list of pypi packages.

(Jan-09-2019, 09:03 AM)perfringo Wrote: For my tabulating/table needs I have used tabulate.
i was not really thinking of this as tabular data. i was trying to just neatly line up the output when i ran a shell command that scans through a bunch of tarballs, filtering for certain files from inside each tarball, prefixed by the path to that tarball. the output was very messy, despite having wide 172 character terminal. i don't do spreadsheets or else, maybe i would have thought along those lines. thanks for the clue and link.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#5
If you're on linux, there's the column program that you could pipe the output into. It's part of util-linux (useful if you want to look up the source), but I think it's included with any linux install.
Quote
#6
(Jan-09-2019, 04:46 PM)nilamo Wrote: If you're on linux, there's the column program that you could pipe the output into. It's part of util-linux (useful if you want to look up the source), but I think it's included with any linux install.
that command multi-columnizes its input, much like my columnize class did. what i am wanting in this thread is to line up existing columns, padding on the left for numbers, else padding on the right. the number of columns would stay the same.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#7
Ok, so how's this for a start? It doesn't read the entire input stream (because that could be infinite, if you're piping), it just buffers a few lines at a time and makes sure those lines... line up.

def format_by_columns(stream, sep="\t", row_chunk_size=None):
    if not row_chunk_size:
        import shutil
        term_size = shutil.get_terminal_size()
        row_chunk_size = term_size.lines

    buffer = []

    def write_line(col_sizes=None):
        nonlocal buffer

        if not col_sizes:
            col_sizes = list(map(lambda row: list(map(len, row)), buffer))

        to_write = buffer.pop(0)
        line = []
        for ndx, col in enumerate(to_write):
            col_size = max(map(lambda row: row[ndx] if ndx<len(row) else 0, col_sizes))
            col = col.ljust(col_size)
            line.append(col)
        return " ".join(line), col_sizes

    for line in stream:
        cols = line.split(sep)
        buffer.append(cols)

        if len(buffer) >= row_chunk_size:
            out, _ = write_line()
            yield out
    
    # finished processing input, so write everything
    col_sizes = None
    while buffer:
        out, col_sizes = write_line(col_sizes)
        yield out


if __name__ == "__main__":
    text = """first	second	third	fourth	fifth
1	2	3	4	5
spam	eggs	fried	fish	bacon
Bravely bold Sir Robin	Rode forth from Camelot.	He was not afraid to die,	Oh brave Sir Robin.
He was not at all afraid	To be killed in nasty ways.	Brave, brave, brave, brave Sir Robin."""

    for line in format_by_columns(text.split("\n")):
        print(line)
Output:
first second third fourth fifth 1 2 3 4 5 spam eggs fried fish bacon Bravely bold Sir Robin Rode forth from Camelot. He was not afraid to die, Oh brave Sir Robin. He was not at all afraid To be killed in nasty ways. Brave, brave, brave, brave Sir Robin.
snippsat and Gribouillis like this post
Quote
#8
(Jan-09-2019, 03:57 PM)Skaperen Wrote: i don't do spreadsheets or else, maybe i would have thought along those lines. thanks for the clue and link.
What about the column/row master Pandas?
(Jan-09-2019, 03:57 PM)Skaperen Wrote: the output was very messy, despite having wide 172 character terminal.
One Pandas strength is taking messy data in an present it in a orderly form.
I mean it take almost any data out there in,when in DataFrame manipulate data is easier than most stuff out there like filter/group/sum...ect and a lot more.
Quote
#9
i went ahead and coded my own. i'll post it below so you can red mark it for my bad coding. but it does work on test data and a file with a mix of "tar tvf" listing of different tarballs, prefix with paths to that tarball.

aligncolumns.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""stdin to stdout,align columns, right padded except for numbers."""
from __future__ import print_function
from sys import stdin, stdout

input = stdin
output = stdout

flag = []
wide = []
save = []

for line in input:
    cols = line.rstrip().split()
    save.append(cols)
    if len(flag) < len(cols):
        flag = flag + [0]*(len(cols)-len(flag))
        wide = wide + [0]*(len(cols)-len(wide))
    for x in range(len(cols)):
        wide[x] = max(wide[x],len(cols[x]))
        try:
            float(cols[x])
        except:
            flag[x] = 1
input.close()

for cols in save:
    for x in range(len(cols)):
        pad = ' '*(wide[x]-len(cols[x]))
        if flag[x]:
            cols[x] = cols[x]+pad
        else:
            cols[x] = pad+cols[x]
    print(' '.join(cols),file=output)
output.close()

exit(0)
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote

Top Page

Forum Jump:


Users browsing this thread: 1 Guest(s)