Posts: 1,298
Threads: 38
Joined: Sep 2016
A .wav file (as well as most other audio and images actually) consists of two main parts, a header and the actual data. For a .wav file the 'header' is 44 bytes (0 - 43) and the data is the remainder of the file (bytes 44 - end). The 'header' is further divided into 'file' and 'format' bytes (chunks). The following is something I wrote to 'read' that information. It's some what long as I included a lot of comments. It only requires the built-in struct module. This is based on the information provided here: .WAV format. Hope this help you in understanding what's going on 'under the hood'
#! /usr/bin/env/ python3
import struct
"""
be = big endian, le = little endian followed by field(bytes) size
file_id = chunk 0-3; be, 4, ascii each character = 1 byte, default 'RIFF'
file_sz = chunk 4-7; le, 4, 32 bit integer, file size minus 8 bytes
file_type = chunk 8-11; be, 4, File type, default = 'WAVE'
file_format = chunk 12-15; be, 4, format chunk marker, default = 'fmt ', NOTE there is a space at the end of the chunk
format_data = chunk 16-19; le, 4, length of format data listed above, default 16 (8, 16, 24, 32)
format_type = chunk 20-21; le, 2, type of format, default 1 (PCM), other number indicates compression
num_ch = chunk 22-23; le, 2, number of channels, 1=mono, 2=stereo (up to 6 channels ?)
sample_rate = chunk 24-27; le, 4, sample frames per sec (i.e. Hz),default 44100 (CD quality)
byte_rate = chunk 28-31; = le, 4, (sample_rate * bits_sample * no_chan) / 8
blk_align = chunk 32-33; le, 2, (bits_sample * no_chan) / 8, rounded up to next whole number.
(A sample frame for a 16-bit mono wave is 2 bytes. A sample frame for a 16-bit stereo wave is 4 bytes. Etc)
bits_sample = chunk 34-35; le, 2, bits per sample, (ie, a 16-bit waveform would have wBitsPerSample = 16)
d_tag = chunk 36-39; be, 4, marks beginning of data, default = 'data'
data_sz = chunk 40-43; le, 4, size of data section; (number of samples * no_chan * bits_sample) / 8
"""
def main(w_file):
with open(w_file, 'rb') as bif:
bif.seek(0)
file_id = str(bif.read(4), encoding='utf-8')
bif.seek(4)
sz = bif.read(4)
file_sz = struct.unpack('<i', sz)
bif.seek(8)
file_type = str(bif.read(4), encoding='utf-8')
bif.seek(12)
file_format = str(bif.read(4), encoding='utf-8')
bif.seek(16)
fd = bif.read(4)
format_data = struct.unpack('<2h', fd)
bif.seek(20)
ft = bif.read(2)
fmt = struct.unpack('<h', ft)
if fmt[0] == 1:
format_type = 'PCM (Uncompressed)'
else:
format_type = 'Compressed'
bif.seek(22)
nc = bif.read(2)
num_ch = struct.unpack('<h', nc)
bif.seek(24)
sr = bif.read(4)
sample_rate = struct.unpack('<i', sr)
bif.seek(28)
br = bif.read(4)
byte_rate = struct.unpack('<i', br)
bif.seek(32)
ba = bif.read(2)
blk_algn = struct.unpack('<h', ba)
bif.seek(34)
bps = bif.read(2)
bits_sample = struct.unpack('<h', bps)
bif.seek(36)
d_tag = str(bif.read(4), encoding='utf-8')
bif.seek(40)
ds = bif.read(4)
data_size = struct.unpack('<i', ds)
info = (file_id, file_sz[0] + 8, file_type, file_format, format_data[0], format_type, num_ch[0], sample_rate[0],
byte_rate[0], blk_algn[0], bits_sample[0], d_tag, data_size[0])
return info
if __name__ == '__main__':
wav_file = 'SineWave_440Hz.wav' # Make what ever wave file you'd like
stats = main(wav_file)
print("\n.WAV File information for ", wav_file)
print("=" * 79, "\n")
print("File ID: {} \nFile Size: {} \nFile Type: {} \nFile Format: {} \nLength of Format Data: {}"
"\nType of Format: {} \nNumber of Channels: {} \nSample Rate: {} \nByte Rate: {} \nBlock Align: {}"
"\nBits Per Sample: {} \nData Header: {} \nSize of Data Section: {}".format(
stats[0], stats[1], stats[2], stats[3], stats[4], stats[5], stats[6], stats[7], stats[8], stats[9], stats[10],
stats[11], stats[12]))
the output:
Output: C:\Python36\python.exe C:/Python/Sound/read_wave.py
.WAV File information for SineWave_440Hz.wav
===============================================================================
File ID: RIFF
File Size: 264644
File Type: WAVE
File Format: fmt
Length of Format Data: 16
Type of Format: PCM (Uncompressed)
Number of Channels: 1
Sample Rate: 44100
Byte Rate: 88200
Block Align: 2
Bits Per Sample: 16
Data Header: data
Size of Data Section: 264600
Process finished with exit code 0
I did not include the actual data, but concept remains the same. Just 'seek(44), read the appropriate bits per sample (i.e . 8, 16, 24, 32) then unpack it. Writing the file is pretty much just the reverse, except you 'pack' instead of 'unpack'.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Posts: 331
Threads: 2
Joined: Feb 2017
It's very nice to see how to parse informations from a binary file!
I opened my testing .waw file to see it and its exactly as you found. Only between a fmt chunk and a data chunk there is a chunk with id tags, probably some later (and now common) extension to .wav specification. The data chunk still starts with "data", so seek to that would work for chunk size/block.
top of wav
Output: 00000000 52 49 46 46 e8 f8 61 04 57 41 56 45 66 6d 74 20 |RIFF..a.WAVEfmt |
00000010 10 00 00 00 01 00 02 00 44 ac 00 00 10 b1 02 00 |........D.......|
00000020 04 00 10 00 4c 49 53 54 c0 00 00 00 49 4e 46 4f |....LIST....INFO|
00000030 49 41 52 54 17 00 00 00 42 61 63 68 2c 20 4a 6f |IART....Bach, Jo|
00000040 68 61 6e 6e 20 53 65 62 61 73 74 69 61 6e 00 00 |hann Sebastian..|
00000050 49 43 52 44 05 00 00 00 32 30 30 33 00 00 49 47 |ICRD....2003..IG|
00000060 4e 52 0a 00 00 00 43 6c 61 73 73 69 63 61 6c 00 |NR....Classical.|
00000070 49 4e 41 4d 30 00 00 00 56 69 6f 6c 69 6e 20 43 |INAM0...Violin C|
00000080 6f 6e 63 65 72 74 6f 20 4e 6f 2e 32 20 69 6e 20 |oncerto No.2 in |
00000090 45 2c 20 42 57 56 20 31 30 34 32 3a 20 31 2e 20 |E, BWV 1042: 1. |
000000a0 41 6c 6c 65 67 72 6f 00 49 50 52 44 1b 00 00 00 |Allegro.IPRD....|
000000b0 4a 2e 53 2e 42 61 63 68 3a 20 56 69 6f 6c 69 6e |J.S.Bach: Violin|
000000c0 20 43 6f 6e 63 65 72 74 6f 73 00 00 49 50 52 54 | Concertos..IPRT|
000000d0 02 00 00 00 31 00 49 53 46 54 0e 00 00 00 4c 61 |....1.ISFT....La|
000000e0 76 66 35 37 2e 35 36 2e 31 30 31 00 64 61 74 61 |vf57.56.101.data|
000000f0 fc f7 61 04 00 00 00 00 00 00 00 00 00 00 00 00 |..a.............|
Posts: 1,298
Threads: 38
Joined: Sep 2016
Feb-25-2017, 12:10 AM
(This post was last modified: Feb-25-2017, 12:26 AM by sparkz_alot.)
Looking at your output, it appears they added what, 184 bytes to the header? From what I've read (actually glanced over) was that over time the Windows/IBM wave format has evolved to include a number of enhancements, including what is shown in your output. That said, since it buggers up the start of the 'data' section, I suppose one would first have to determine where "data" actually appears in newer files in order to know the size of the header and in turn the size of the actual data. For instance, as of 2007, MS has allowed for 18 different speakers (channels)...say whaaa?
I wonder if the additional bytes are the new 'standard' size for optional information and if so, what sort of info can be put in it?
I suppose I'll get some more recent wave files and see what I can see.
btw, the MS article is located here: https://msdn.microsoft.com/en-us/library...s.85).aspx
Seems I answered my own questions. I came across this document that explains (at least as of 1994) the newer format. You have to scroll down a bit to the .WAV section.
http://www-mmsp.ece.mcgill.ca/Documents/...IFFNEW.pdf
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
|