Python Forum
how many bytes in a file before zero padding
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how many bytes in a file before zero padding
#1
i have a bunch (2823) of large files typically around 4GB in size but none larger than 2**33-1 (8589934591) bytes in size. these files have typically 100's of MB to a few GB of actual data followed by zero padding out to the end of the file.

i need to truncate all of these files to remove the zero padding without losing any data of non-zero bytes. because there is over 10 TB of data, i want to avoid reading all the files sequentially. but the data can have some sequences of zeros in it still followed by non-zero data, so a binary search is ruled out. my current thought is to read the files backwards, a page size, page aligned at a time an scan that page for any non-zero bytes to record the position of the last non-zero byte. i will just be recording these sizes for now and do the actual truncation at a later date (when i receive all the drives ... i just have a few samples for now).

is this something that is easy to do in Python or should i just do this in C?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Messages In This Thread
how many bytes in a file before zero padding - by Skaperen - Feb-02-2019, 02:07 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Get amount of bytes in a file chesschaser 1 1,582 Aug-23-2021, 03:24 PM
Last Post: deanhystad
  replace bytes with other byte or bytes BigOldArt 1 10,660 Feb-02-2019, 11:00 PM
Last Post: snippsat
  how to read a text file as bytes Skaperen 1 8,207 May-29-2018, 08:12 AM
Last Post: killerrex
  Aes padding Hairy_Ape 1 2,359 May-23-2018, 12:05 AM
Last Post: scidam
  Correct way to change bytes in a file? Raptor88 16 26,609 Feb-23-2017, 06:08 PM
Last Post: Raptor88

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020