Python Forum

Full Version: How to plot histogram from 2 arrays?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have two arrays where the first array shows the unique values extracted from a column and the second array stores the frequency of these unique values. How can I correctly plot the histogram?

(array([0, 1, 2, 3, 4], dtype=uint8), array([ 1, 20, 20, 30, 45], dtype=int64))

0,1,2,3,4 are the unique values and the numbers in the second array shows the frequency of each of the value.

I tried to do the code as below but it is not producing the correct histogram

b = ([0,1,2,3,4],[1,20,20,39,45])
x = b[0]
print (x)
y = b[1]
print(y)
plt.hist([x,y], bins='auto') 
plt.show()]
Thank you
In this case you don't need a hist function, try bar instead:

b = ([0,1,2,3,4],[1,20,20,39,45])
plt.bar(*b)
plt.show()
maybe you would wish to normalize bar heights,

b = ([0,1,2,3,4],[1,20,20,39,45])
b_normalized = (b[0], [k / sum(b[1]) for k in b[1]])
plt.bar(*b_normalized)
plt.show()
Thanks but I think I will need to elaborate my problem further. I have about 100 files and I am trying to extract one column from each of these files and plot the frequencies of numbers appearing in this column. I tried using the list method but it takes too long and when I am trying to use the array method, I am just not able to get the output as intended. The size of the array may change as some files may only have number 1 to 4 and others 1 to 6 etc. I am not sure how I should append the array when it is looping the next files. Below is my complete code:

for file in hdf_file[1:100]:
    inputf = hdf_folder + file
    if re.search(text,file) and (text1,file):
        with h5py.File(inputf,'r') as g:
            a = np.array(np.unique(g[['No_of_Books'],return_counts=True))
 
plt.bar(a, height=a[1])
plt.show()

(Mar-26-2019, 02:02 PM)python_newbie09 Wrote: [ -> ]Thanks but I think I will need to elaborate my problem further. I have about 100 files and I am trying to extract one column from each of these files and plot the frequencies of numbers appearing in this column. I tried using the list method but it takes too long and when I am trying to use the array method, I am just not able to get the output as intended. The size of the array may change as some files may only have number 1 to 4 and others 1 to 6 etc. I am not sure how I should append the array when it is looping the next files. Below is my complete code:

for file in hdf_file[1:100]:
    inputf = hdf_folder + file
    if re.search(text,file) and (text1,file):
        with h5py.File(inputf,'r') as g:
            a = np.array(np.unique(g[['No_of_Books'],return_counts=True))
 
plt.bar(a, height=a[1])
plt.show()

the output from this code will return as below and the idea is to sum the values for each of the number 1...6 shown in the first array



[[ 1 2 3 4 5 6]
[2348 51 10 3910 10 10]]
[[ 0 1 2 3 4 6]
[ 7 2022 50 10 11160 20]]
[[ 0 1 4 5 6]
[ 5 546 10829 10 10]]
[[ 0 4]
[ 2 7738]]
[[ 0 2 3 4]
[ 8 40 20 170324]]
[[ 0 1 2 3 4 6]
[ 3 3210 50 10 166969 10]]
[[ 0 2 3 4]
[ 6 40 10 8644]]
[[ 0 1 2 3 4]
[ 9 2035 50 10 1514]]
res = []
for file in hdf_file[1:100]:
    inputf = hdf_folder + file
    if re.search(text,file) and (text1,file):
        with h5py.File(inputf,'r') as g:
            res.append(np.array(np.unique(g[['No_of_Books'],return_counts=True)))
  

size = map(lambda x: len(x[0]), res)
acc = np.zeros(size)
for ix, vals in res:
    acc[ix] += vals
freq = acc / acc.sum()
scale = 100
plt.bar(np.arange(size), height=freq * scale)
plt.show()
This code was not tested.
(Mar-27-2019, 02:41 AM)scidam Wrote: [ -> ]
res = []
for file in hdf_file[1:100]:
    inputf = hdf_folder + file
    if re.search(text,file) and (text1,file):
        with h5py.File(inputf,'r') as g:
            res.append(np.array(np.unique(g[['No_of_Books'],return_counts=True)))
  

size = map(lambda x: len(x[0]), res)
acc = np.zeros(size)
for ix, vals in res:
    acc[ix] += vals
freq = acc / acc.sum()
scale = 100
plt.bar(np.arange(size), height=freq * scale)
plt.show()
This code was not tested.

Thanks. Tried it but it is producing this error; TypeError: expected sequence object with len >= 0 or a single integer
Any idea why?
Because I haven't the data, I cann't reproduce the problem and do any tests;

Suppose that res is predefined:

res = [[[1, 2, 3, 4, 5, 6],
[2348 ,51, 10, 3910, 10, 10]],
[[ 1 ,2, 3, 4 ,5, 6],
[ 7 ,2022, 50, 10, 11160, 20]],
[[ 1, 2, 3, 4, 5],
[ 5, 546, 10829, 10, 10]],
[[ 0, 4],
[ 2, 7738]],
[[ 1, 2, 3, 4],
[ 8, 40, 20, 170324]],
[[ 1, 2, 3, 4, 5, 6],
[ 3, 3210, 50, 10, 166969, 10]],
[[ 1, 2, 3, 4],
[ 6, 40, 10, 8644]],
[[ 1, 2 ,3 ,4 ,5],
[ 9 ,2035, 50, 10, 1514]]]
size = max(map(lambda x: len(x[0]), res))
acc = np.zeros(size)
for ix, vals in res:
    acc[np.array(ix)-1] += vals
freq = acc / acc.sum()
scale = 100
plt.bar(np.arange(size), height=freq * scale)
plt.show()
The code I just posted is working fine on my computer; Note, because indicies produced by numpy.unique(...,return_counts=True) starts from 0, you will need to change acc[np.array(ix)-1] += vals to acc[ix] += vals.
There was an error in line size= map...; it should be size = max(map(...)