Python Forum

Full Version: From theory to practice
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hi,

I am rather new to the whole python thing... that being said I have dabble in several areas of signal processing. I am an engineering student and had to do a few projects for my courses. The way these projects worked was pretty much always the same you took what your professor showed you in the lecture and altered it...
one of my problems is this: the examples from the lectures generally included a huge amount of "toolset" that just made everything work (I assume) so that you could focus on actually creating filters... What I am looking for is not a magical line of code that fixes everything... I want to understand what I am doing...

So... now I wanted to do a project of my own and from scratch... Nothing too complicated for now: Read in a .wav-file, make it mono (if applicable), reduce the bitdepth to 8 bit (maybe even less), write a new file. The signal related actions aren't really the problem... but I just cannot get my head around how python does things. At this point I am not even sure where my problem is...

So here is what I have so far:
import numpy as np
import time, wave, pyaudio
import scipy.io.wavfile

bitdepth = 8
input_file = "16AGR_The_Hunted_Percy(OUT).wav"
channles = 0

stepsize = (1.0 - (-1.0))/(2**bitdepth)

print "Reading file"
original = scipy.io.wavfile.read(input_file)
audio = original[1]

print audio

This gives me this output:
Output:
Reading file [[-1 -2]  [ 1  1]  [-4 -3]  ...,  [ 4 -2]  [-4  2]  [ 4 -1]]
So from what I have gathered scipy.io.wavfile.read returns a list with three objects the second of which should contain the audio data (in this case stereo). Since there are about 200k samples in the file the output does not show all of them.. fine.. but what is that? [-1 -2]?  That is no audio sample I have ever seen... and all integers? I have no idea what I am seeing here... and at this point no idea where to look for answers...
Looking at the documentation for functions you use is always a good first step.
Looks like the data you get is a 2-dimensional array because you're reading a stereo file (first column is the left channel, the second right), and the data is signed ints because of the file format (either 16-bit PCM or 32-bit PCM, whatever those might be).

That's as much as I can tell, having no experience whatsoever when it comes to working with audio files.
I have never used scipy's wavfile before, but I tried it now and it worked very well ...

As you have noticed wavfile.read returns tuple with sample rate and numpy array containing samples. If your wav file contains something like stereo 8-bit signed, then loaded data are indeed integers in range -128 to 127. And there is a lot of them, so its quite possible that first few thousand rows represent just silence on start... but I know nothing about audio and how its represented by sample value.

I tried it on one audio file - load, add channels, divide, write and it worked and sounded pretty well (beside being mono). It seems that with some basic understaning of "audio format" it would be quite easy to do "funny tricks" like speed it up or down, resample, limit/boost volume, add some noise ...

Input wav:
Torturing it:
Output wav:
okay... so re-reading the numpy array documentation I get that it actually has a field for the data type which in this case is int16.. so 16bit signed integer.

I was a bit hung up on this:

Output:
[[-1 -2]  [ 1  1]  [-4 -3]  ...,  [ 4 -2]  [-4  2]  [ 4 -1]] (44100, array([[-1, -2],        [ 1,  1],        [-4, -3],        ...,        [ 4, -2],        [-4,  2],        [ 4, -1]], dtype=int16))
first one is printing out "audio" and the other is printing out "original" from the code of the original post... one is with a lot of commas... (the way I expected it to be after reading the manual) and the other isn't... just drops one in the middle where it doesn't really belong.

Okay... so now that I have an actual idea of how the data is represented I can think about altering the way I want to.

First I wanted to have it converted to mono... so generally you take the left and right samples and average them:

length = audio.shape[0]
mono = (audio[0:length:1, 0]+audio[0:length:1, 1])/2
Now there are two things I am wondering about:
The shape of the new array is this:
Output:
(210331,)
This means it's still a n by 2 array, right?

And secondly what is the best way to convert the datatype into float? So that I can work with a range of values from -1 to 1.
Quote:what is the best way to convert the datatype into float? So that I can work with a range of values from -1 to 1.
You can use astype to convert numpy arrays.
>>> a = np.random.randint(0, 255, size=(5,5), dtype=np.uint8)
>>> a
array([[ 87, 184,  85,  57, 195],
       [129, 227,   1,  14,   9],
       [168, 157, 167,   9, 247],
       [  6, 136,  98,  70, 214],
       [105, 221, 123,  55,  54]], dtype=uint8)
>>> b = a.astype(np.float)
>>> b
array([[  87.,  184.,   85.,   57.,  195.],
       [ 129.,  227.,    1.,   14.,    9.],
       [ 168.,  157.,  167.,    9.,  247.],
       [   6.,  136.,   98.,   70.,  214.],
       [ 105.,  221.,  123.,   55.,   54.]])
Now, you can probably find an appropriate function in some library to normalize for you but normalizing yourself isn't that bad anyway.

For example:  
>>> 2 * (a / 255.0) - 1
array([[-0.31764706,  0.44313725, -0.33333333, -0.55294118,  0.52941176],
       [ 0.01176471,  0.78039216, -0.99215686, -0.89019608, -0.92941176],
       [ 0.31764706,  0.23137255,  0.30980392, -0.92941176,  0.9372549 ],
       [-0.95294118,  0.06666667, -0.23137255, -0.45098039,  0.67843137],
       [-0.17647059,  0.73333333, -0.03529412, -0.56862745, -0.57647059]])
(Feb-24-2017, 04:36 PM)Mekire Wrote: [ -> ]Now, you can probably find an appropriate function in some library to normalize for you but normalizing yourself isn't that bad anyway.
As I said... the whole audio part is not really my problem.. it is the python part... I have trouble using the tool and understanding how it works and what it does int the background

Speaking of normalizing...

audio = np.float32(audio)
max_value = max(np.absolute(audio))
This gives me an error that I do not understand in the slightest:

Output:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Stackoverflow gives me many posts of people who have been getting this message while doing completely different things...
(Feb-24-2017, 03:53 PM)bertibott Wrote: [ -> ]
length = audio.shape[0]
mono = (audio[0:length:1, 0]+audio[0:length:1, 1])/2

You might consider to use just : instead of 0:length:1, its little easier to type and you dont need to define new variable...

And with typed variables there is always danger of overflowing - if you try with numpy int16 to compute 20000+20000, then 40000 > 32767?, so it goes over and around and you end with -25000ish number (and there is no warning). So sometimes its good to divide first and sum second, or convert to bigger int and at the end convert back.

(Feb-24-2017, 03:53 PM)bertibott Wrote: [ -> ]Now there are two things I am wondering about:
The shape of the new array is this:
Output:
(210331,)
This means it's still a n by 2 array, right?

And secondly what is the best way to convert the datatype into float? So that I can work with a range of values from -1 to 1.

Shape of your array is (210331,), that is a tuple with just one number -> it is 1-D array with single dimension 210331. So no, its not n by 2 array.

You can convert datatype with .astype() method, but if you check your mono's datatype, you might discover that its already float64 due to implicit conversion.

Quickstart numpy tutorial covers pretty much everything mentioned in this thread and much more ...
(Feb-24-2017, 04:51 PM)bertibott Wrote: [ -> ]This gives me an error that I do not understand in the slightest:
Error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Stackoverflow gives me many posts of people who have been getting this message while doing completely different things...
Generally you are going to want to use np.max and np.min in numpy contexts.  The builtin will work in one dimensional cases but not multi dimensional.
>>> a = np.array([4,5,6])
>>> max(a)
6
>>> a = np.array([[4,5,6],[5,6,7]])
>>> max(a)
Error:
Traceback (most recent call last):  File "<stdin>", line 1, in <module> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> np.max(a)
7
>>>
(Feb-24-2017, 04:51 PM)bertibott Wrote: [ -> ]
audio = np.float32(audio)
max_value = max(np.absolute(audio))
This gives me an error that I do not understand in the slightest:
Output:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Perhaps audio is 2-D array and poor max is confused and doesn't know if 2-D array means list of lists (cols) or list of lists (rows) or megalist of numbers.

In that case you need np.max() with optional axis parameter.
okay guys.. thank you for now! I guess it was all alittle more complicated than i thought it would be...
But sticking with numpy methods in a numpy enviroment makes sense.. I will have to read the full tutorial after all...

btw... with your help I was actually able to get a first rudimentary version of my project to work! :)
Pages: 1 2