Posts: 4,654
Threads: 1,497
Joined: Sep 2016
i often need a command/program/script that can read in a text file with many columns that are not lined up and line them up to make all the lines be consistent. clearly it must read in the entire file to know the width and alignment (right if every line has a number in that column, else left) in order to print out the first line (and all the remaining lines).
in almost all such cases, such poorly aligned text comes from a big pipeline or other complex command. so i need to be sure it can read from stdin, at least.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 12,050
Threads: 487
Joined: Sep 2016
If wrapping is an option, you can make a best guess, and wrap anything encountered that is longer than the estimated size.
The only way I can see getting it to fit every case to to read the file twice. Even then, some data may be too long to to accommodate without wrapping.
Have you looked on Pypi?
Posts: 1,950
Threads: 8
Joined: Jun 2018
For my tabulating/table needs I have used tabulate.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 4,654
Threads: 1,497
Joined: Sep 2016
Jan-09-2019, 03:57 PM
(This post was last modified: Jan-09-2019, 04:08 PM by Skaperen.)
(Jan-09-2019, 05:43 AM)Larz60+ Wrote: Have you looked on Pypi?
does pypi have a "what all it really really really does" list to look through? no, i have not looked through the list of pypi packages.
(Jan-09-2019, 09:03 AM)perfringo Wrote: For my tabulating/table needs I have used tabulate. i was not really thinking of this as tabular data. i was trying to just neatly line up the output when i ran a shell command that scans through a bunch of tarballs, filtering for certain files from inside each tarball, prefixed by the path to that tarball. the output was very messy, despite having wide 172 character terminal. i don't do spreadsheets or else, maybe i would have thought along those lines. thanks for the clue and link.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 3,458
Threads: 101
Joined: Sep 2016
If you're on linux, there's the column program that you could pipe the output into. It's part of util-linux (useful if you want to look up the source), but I think it's included with any linux install.
Posts: 4,654
Threads: 1,497
Joined: Sep 2016
(Jan-09-2019, 04:46 PM)nilamo Wrote: If you're on linux, there's the column program that you could pipe the output into. It's part of util-linux (useful if you want to look up the source), but I think it's included with any linux install. that command multi-columnizes its input, much like my columnize class did. what i am wanting in this thread is to line up existing columns, padding on the left for numbers, else padding on the right. the number of columns would stay the same.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 3,458
Threads: 101
Joined: Sep 2016
Jan-11-2019, 04:24 PM
(This post was last modified: Jan-11-2019, 06:48 PM by nilamo.)
Ok, so how's this for a start? It doesn't read the entire input stream (because that could be infinite, if you're piping), it just buffers a few lines at a time and makes sure those lines... line up.
def format_by_columns(stream, sep="\t", row_chunk_size=None):
if not row_chunk_size:
import shutil
term_size = shutil.get_terminal_size()
row_chunk_size = term_size.lines
buffer = []
def write_line(col_sizes=None):
nonlocal buffer
if not col_sizes:
col_sizes = list(map(lambda row: list(map(len, row)), buffer))
to_write = buffer.pop(0)
line = []
for ndx, col in enumerate(to_write):
col_size = max(map(lambda row: row[ndx] if ndx<len(row) else 0, col_sizes))
col = col.ljust(col_size)
line.append(col)
return " ".join(line), col_sizes
for line in stream:
cols = line.split(sep)
buffer.append(cols)
if len(buffer) >= row_chunk_size:
out, _ = write_line()
yield out
# finished processing input, so write everything
col_sizes = None
while buffer:
out, col_sizes = write_line(col_sizes)
yield out
if __name__ == "__main__":
text = """first second third fourth fifth
1 2 3 4 5
spam eggs fried fish bacon
Bravely bold Sir Robin Rode forth from Camelot. He was not afraid to die, Oh brave Sir Robin.
He was not at all afraid To be killed in nasty ways. Brave, brave, brave, brave Sir Robin."""
for line in format_by_columns(text.split("\n")):
print(line) Output: first second third fourth fifth
1 2 3 4 5
spam eggs fried fish bacon
Bravely bold Sir Robin Rode forth from Camelot. He was not afraid to die, Oh brave Sir Robin.
He was not at all afraid To be killed in nasty ways. Brave, brave, brave, brave Sir Robin.
Posts: 7,324
Threads: 123
Joined: Sep 2016
Jan-11-2019, 05:46 PM
(This post was last modified: Jan-11-2019, 05:46 PM by snippsat.)
(Jan-09-2019, 03:57 PM)Skaperen Wrote: i don't do spreadsheets or else, maybe i would have thought along those lines. thanks for the clue and link. What about the column/row master Pandas?
(Jan-09-2019, 03:57 PM)Skaperen Wrote: the output was very messy, despite having wide 172 character terminal. One Pandas strength is taking messy data in an present it in a orderly form.
I mean it take almost any data out there in,when in DataFrame manipulate data is easier than most stuff out there like filter/group/sum...ect and a lot more.
Posts: 4,654
Threads: 1,497
Joined: Sep 2016
i went ahead and coded my own. i'll post it below so you can red mark it for my bad coding. but it does work on test data and a file with a mix of "tar tvf" listing of different tarballs, prefix with paths to that tarball.
aligncolumns.py #!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""stdin to stdout,align columns, right padded except for numbers."""
from __future__ import print_function
from sys import stdin, stdout
input = stdin
output = stdout
flag = []
wide = []
save = []
for line in input:
cols = line.rstrip().split()
save.append(cols)
if len(flag) < len(cols):
flag = flag + [0]*(len(cols)-len(flag))
wide = wide + [0]*(len(cols)-len(wide))
for x in range(len(cols)):
wide[x] = max(wide[x],len(cols[x]))
try:
float(cols[x])
except:
flag[x] = 1
input.close()
for cols in save:
for x in range(len(cols)):
pad = ' '*(wide[x]-len(cols[x]))
if flag[x]:
cols[x] = cols[x]+pad
else:
cols[x] = pad+cols[x]
print(' '.join(cols),file=output)
output.close()
exit(0)
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
|