![]() |
all i want to do is count the lines in each file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: all i want to do is count the lines in each file (/thread-33699.html) Pages:
1
2
|
all i want to do is count the lines in each file - Skaperen - May-18-2021 all i want to do is count the lines in each file but there are strange binary bytes or older ISO codes.
from sfc import * def pf(t,n): if os.path.exists(n): with open(n) as f: c = len([x for x in f]) print(t,str(c).rjust(8),n) fn = '.edit_log' argv.pop(0) cd() # be in home directory with open(fn) as el: nt = {} for ee in el: t,e,n = ee.strip().split()[:3] if n in nt: nt[n].append(t) else: nt[n] = [t] ns = sorted(nt.keys()) tn = {} for n in ns: nt[n].sort() for n in ns: t = nt[n][-1] if t[13]=='-': t=t[0:7]+t[8:13]+t[14:16]+t[17:] tn[t] = n for t,n in sorted([x for x in sorted(tn.items())]): if argv: for a in argv: if n.endswith(a): if os.path.exists(n): pf(t,n) break else: # none requested so print all if os.path.exists(n): pf(t,n)i think the only things it uses from sfc are os and cd (called with no args changes directory to user home directory. RE: all i want to do is count the lines in each file - Larz60+ - May-18-2021 you need to know the file encoding. Though not foolproof, you can usually find it with chardet: https://pypi.org/project/chardet/ RE: all i want to do is count the lines in each file - Skaperen - May-18-2021 maybe it will be simpler to read most of the file in binary and count the b'\n' in the bytes string i get. really big files i don't need the exact number, just a general size. RE: all i want to do is count the lines in each file - Larz60+ - May-18-2021 There are some strange encodings, and the easiest way to deal with them is to find the proper encoding. RE: all i want to do is count the lines in each file - Skaperen - May-18-2021 if it's a strange encoding i don't care if this is accurate. this is operation on my edit log which has time of edit and a file path. this script lists recently edited files with time. i'm inserting a column for number of lines. for my Python source files and most other text and source, the encoding will be ASCII or UTF-8. the file causing problem was a C file from 2006 when i was using ISO8859 for the copyright symbol. cat that line today (the Linux kernel is doing UTF-8 reasonably well) and the copyright is just a question in an inverted cell. i've already done this and have it showing ">9999" (if the number of lines exceeds 9999) and narrowed it to 5 characters. if the file exceeds 1048575 bytes then it prints ">####". i do f.read(1048576) . i could reduce that.
RE: all i want to do is count the lines in each file - perfringo - May-19-2021 If I need to have quick look at number of lines in file then I use wc in terminal.cat my_filename | wc -lIt is easy to use in Python with subprocess. Specific example on file named shakespeare.txt; check_output returns bytes so I converted it into integer: >>> import subprocess >>> source = subprocess.Popen(['cat', 'shakespeare.txt'], stdout=subprocess.PIPE) >>> lines = int(subprocess.check_output(['wc', '-l'], stdin=source.stdout)) >>> lines 4155 RE: all i want to do is count the lines in each file - Gribouillis - May-19-2021 An alternative with more_itertools.ilen() λ cat paillasse/sometest.py | wc -l 60 λ python ... >>> from more_itertools import ilen >>> with open('paillasse/sometest.py') as ifh: ... print(ilen(ifh)) ... 60 RE: all i want to do is count the lines in each file - snippsat - May-19-2021 (May-18-2021, 11:42 PM)Skaperen Wrote: if it's a strange encoding i don't care if this is accurate. this is operation on my edit log which has time of edit and a file path.Can just ignore encoding errors,there is a parameter for this errors="ignore" or errors='replace' (will be ? ).So can do a version showing this can just copy my own code from this Thread and make a little change. import os def find_files(file_type, path): os.chdir(path) with os.scandir(path) as it: for entry in it: if entry.name.endswith(file_type) and entry.is_file(): yield entry.name def count_lines(files): for file in files: with open(file, encoding='utf-8', errors="ignore") as f: for line_nr, _ in enumerate(f, -1): pass yield file, line_nr + 1 if __name__ == '__main__': path = r'E:\div_code' file_type = '.txt' files = find_files(file_type, path) line_count = count_lines(files) print(list(line_count)) Compare with wc λ wc -l *.txt 1396 W2Testfile.txt 3599 alice_in_wonderland.txt 1807 test.txt 6802 total RE: all i want to do is count the lines in each file - Pedroski55 - May-21-2021 Hi, I don't understand lambda in this post, can you please explain? I tried: Quote:λ wc -l /home/pedro/summer2021/19BE/scansforOCR/*.text in bash and just got: Quote:pedro@pedro-HP:~$ λ wc /home/pedro/summer2021/19BE/scansforOCR/*.text RE: all i want to do is count the lines in each file - snippsat - May-21-2021 (May-21-2021, 11:01 AM)Pedroski55 Wrote: Hi, I don't understand lambda in this post, can you please explain?You shall not use λ ,it's default sign because i use cmderwc --help Usage: wc [OPTION]... [FILE]... or: wc [OPTION]... --files0-from=F Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. A word is a non-zero-length sequence of characters delimited by white space. With no FILE, or when FILE is -, read standard input. The options below may be used to select which counts are printed, always in the following order: newline, word, character, byte, maximum line length. -c, --bytes print the byte counts -m, --chars print the character counts -l, --lines print the newline counts --files0-from=F read input from the files specified by NUL-terminated names in file F; If F is - then read names from standard input -L, --max-line-length print the maximum display width -w, --words print the word counts --help display this help and exit --version output version information and exit GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Report wc translation bugs to <http://translationproject.org/team/> Full documentation at: <http://www.gnu.org/software/coreutils/wc> or available locally via: info '(coreutils) wc invocation' |