Posts: 4,647
Threads: 1,494
Joined: Sep 2016
it looks like the tarfile module is intended to always handle individual files in the archive by only reading them in from the file system or (when creating a tar file) or writing them to the file system (when extracting from a tar file). is there a way to use that module to access the individual files. an example use case is a script that reads in a compressed tar file and uncompresses it, extracts all files, uncompresses any contained files that are compresses, rebuilds the tar file with the compressed files now uncompressed, and compresses that new tar file.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,647
Threads: 1,494
Joined: Sep 2016
Oct-18-2019, 11:07 PM
(This post was last modified: Oct-18-2019, 11:35 PM by Skaperen.)
how do you get a file-like object to write a new member to an archive? also, how is the archive raw contents provided to the object and the archive raw contents obtained from the output archive object, all at the same time? do i need to use coroutines, threads, or processes, to avoid locking logic? let's say the input archive is arriving over socket (size and member count is unknown) and the output archive goes back out over the same socket as each compressed member is uncompressed (so the archive compression works better) ... all this being done with no writable file system space available.
the way i did this in C was with stateful objects (opaque pointer in first arg of method call) with methods to provide data and obtain data. obtaining data is pretty simple to do in Python ,,, how much data is ready is how much you get (an empty sequence would mean nothing was ready, yet, not EOF). providing data would be slightly more complicated. a mutable sequence can be reduced in size by how much the object can make use of in that call. the other way to do it with immutable sequences would be to return the number of items used and let the caller do the slicing or return a sliced like-sequence with the remaining data. in C, i used ring buffers which led me to develop the virtual ring buffer (VRB, a way to optimize ring buffers in virtual memory environments). i think i prefer the mutable sequence (e.g. bytearray) way because it is easier on the caller.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,647
Threads: 1,494
Joined: Sep 2016
and i see no module supporting cpio. i have a few hundred cpio files i'd like to convert to tar format.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,647
Threads: 1,494
Joined: Sep 2016
i guess i will have to use linux tools. i'm still thinking of the C way where re-implementing something doesn't make it slower. one of the goals i have is to re-implement as much as possible in an architecture-portable way that does not require re-compiling. if the tar command can't be re-implement using the tar library, and it's really just a front-end to the tar command, it's not what i want to do. i can use the tar command myself (in the code) probably more efficiently.
one of my big projects is building a cloud run-time that is ready to go in new architectures. almost all architectures are running in some IaaS or SaaS cloud service, somewhere. i saw 7 different ones in an early service about 15 years ago.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.