Not able to crack a simple visualization – missing something basic

Not able to crack a simple visualization – missing something basic – plz guide - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Not able to crack a simple visualization – missing something basic – plz guide (/thread-25335.html)

Not able to crack a simple visualization – missing something basic – plz guide - darpInd - Mar-27-2020

Hello Readers,
I have been trying to get a simple visualization but not able to crack and getting demotivated. Please help!
Objective:- I want to create a simple scatter plot which have marker / dot size as per the size of observation (‘obs’ column)
Dataframe name = df1

Output:	price	rating	obs
0	3	0	4
1	2	0	1
2	1	0	4
3	3	1	8
4	2	1	21
5	1	1	20
6	3	2	26
7	2	2	22
8	1	2	23
9	3	3	15
10	2	3	12
11	1	3	9
12	3	4	7
13	2	4	4
14	1	4	4

I have tried following:

import matplotlib.pyplot as plt
import pandas as pd
plt.scatter(df1.price,df1.rating)
plt.show()

of course, it does provides scatter plot, but doesn't provide size of dots.
So I tried below by adding scale variable in scatter plot- but it throws error

import matplotlib.pyplot as plt
import pandas as pd
plt.scatter(df1.price,df1.rating, s = df1.obs)
plt.show()

Then I tried following to build my understanding.

N = 45
x, y = np.random.rand(2, N)
c = np.random.randint(1, 5, size=N)
s = np.random.randint(10, 220, size=N)
fig, ax = plt.subplots()
scatter = ax.scatter(x, y, c=c, s=s)
plt.show()

It gives nice plot -- and I tried to replicate this one for my dataframe like below but getting error again :(

x = df1['price']
y = df1['rating']
s = df1['obs']
fig, ax = plt.subplots()
scatter = ax.scatter(x, y,  s=s)
plt.show()

Then I thought that may be I have to convert dataframe column into numpy array first and then scatter() will recognise it, so I did below, again FAILED!! please help me crack it and provide some guidance on what I am missing.. must be some basic as I am new

import numpy as np
import matplotlib.pyplot as plt
x=df1['price'].to_numpy()
y = df1['rating'].to_numpy()
scale=df1['obs'].to_numpy()
fig,ax = plt.subplots()
scatter = ax.scatter (x,y,s=scale)
plt.show()

RE: Not able to crack a simple visualization – missing something basic – plz guide - buran - Mar-27-2020

(Mar-27-2020, 07:22 AM)darpInd Wrote: So I tried below by adding scale variable in scatter plot- but it throws error

what error does it show - post the full traceback. It should work

import pandas
import matplotlib.pyplot as plt

df = pandas.DataFrame({'price':[3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1,],
      'rating':[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
      'obs':[4, 1, 4, 8, 21, 20, 26, 22, 23, 15, 12, 9, 7, 4, 4]})

plt.scatter(df.price, df.rating, s=df.obs*10) # I scale df.obs by factor of 10 just to see the difference in size better
plt.show()

[attachment=805]

RE: Not able to crack a simple visualization – missing something basic – plz guide - darpInd - Mar-27-2020

Here is the error.. btw I tried by doing df1.obs*10.. it is the same error which I got without using *10:

Error:AttributeError                            Traceback (most recent call last)
<ipython-input-40-c5c9e7d1608e> in <module>
      1 import matplotlib.pyplot as plt
      2 import pandas as pd
----> 3 plt.scatter(df1.price,df1.rating, s = df1.obs*10)
      4 plt.show()

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, data, **kwargs)
   2845         verts=verts, edgecolors=edgecolors,
   2846         plotnonfinite=plotnonfinite, **({"data": data} if data is not
-> 2847         None else {}), **kwargs)
   2848     sci(__ret)
   2849     return __ret

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, data, *args, **kwargs)
   1599     def inner(ax, *args, data=None, **kwargs):
   1600         if data is None:
-> 1601             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1602 
   1603         bound = new_sig.bind(ax, *args, **kwargs)

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, **kwargs)
   4496                 offsets=offsets,
   4497                 transOffset=kwargs.pop('transform', self.transData),
-> 4498                 alpha=alpha
   4499                 )
   4500         collection.set_transform(mtransforms.IdentityTransform())

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\collections.py in __init__(self, paths, sizes, **kwargs)
    883         Collection.__init__(self, **kwargs)
    884         self.set_paths(paths)
--> 885         self.set_sizes(sizes)
    886         self.stale = True
    887 

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\collections.py in set_sizes(self, sizes, dpi)
    855             self._sizes = np.asarray(sizes)
    856             self._transforms = np.zeros((len(self._sizes), 3, 3))
--> 857             scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor
    858             self._transforms[:, 0, 0] = scale
    859             self._transforms[:, 1, 1] = scale

AttributeError: 'str' object has no attribute 'sqrt'

(Mar-27-2020, 07:59 AM)buran Wrote:
(Mar-27-2020, 07:22 AM)darpInd Wrote: So I tried below by adding scale variable in scatter plot- but it throws error
what error does it show - post the full traceback. It should work
import pandas
import matplotlib.pyplot as plt

df = pandas.DataFrame({'price':[3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1,],
      'rating':[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
      'obs':[4, 1, 4, 8, 21, 20, 26, 22, 23, 15, 12, 9, 7, 4, 4]})

plt.scatter(df.price, df.rating, s=df.obs*10) # I scale df.obs by factor of 10 just to see the difference in size better
plt.show()

RE: Not able to crack a simple visualization – missing something basic – plz guide - buran - Mar-27-2020

As the error indicates you have strings in your dataframe. Probably you read from file, without converting the type?
You need to convert to number (at least the column used to calcuate the size)

RE: Not able to crack a simple visualization – missing something basic – plz guide - darpInd - Mar-27-2020

(Mar-27-2020, 08:18 AM)buran Wrote: As the error indicates you have strings in your dataframe. Probably you read from file, without converting the type?
You need to convert to number (at least the column used to calcuate the size)

Thanks you buran!! That was indeed the case, i converted to numeric and got it resolved!! Somehow , I assumed that my dataframe is having only int, hence the toruble. Good learning though!