Python Forum
Not able to crack a simple visualization – missing something basic – plz guide - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Not able to crack a simple visualization – missing something basic – plz guide (/thread-25335.html)



Not able to crack a simple visualization – missing something basic – plz guide - darpInd - Mar-27-2020

Hello Readers,
I have been trying to get a simple visualization but not able to crack and getting demotivated. Please help!
Objective:- I want to create a simple scatter plot which have marker / dot size as per the size of observation (‘obs’ column)
Dataframe name = df1

Output:
price rating obs 0 3 0 4 1 2 0 1 2 1 0 4 3 3 1 8 4 2 1 21 5 1 1 20 6 3 2 26 7 2 2 22 8 1 2 23 9 3 3 15 10 2 3 12 11 1 3 9 12 3 4 7 13 2 4 4 14 1 4 4
I have tried following:
import matplotlib.pyplot as plt
import pandas as pd
plt.scatter(df1.price,df1.rating)
plt.show()
of course, it does provides scatter plot, but doesn't provide size of dots.
So I tried below by adding scale variable in scatter plot- but it throws error

import matplotlib.pyplot as plt
import pandas as pd
plt.scatter(df1.price,df1.rating, s = df1.obs)
plt.show()
Then I tried following to build my understanding.

N = 45
x, y = np.random.rand(2, N)
c = np.random.randint(1, 5, size=N)
s = np.random.randint(10, 220, size=N)
fig, ax = plt.subplots()
scatter = ax.scatter(x, y, c=c, s=s)
plt.show()
It gives nice plot -- and I tried to replicate this one for my dataframe like below but getting error again :(
x = df1['price']
y = df1['rating']
s = df1['obs']
fig, ax = plt.subplots()
scatter = ax.scatter(x, y,  s=s)
plt.show()
Then I thought that may be I have to convert dataframe column into numpy array first and then scatter() will recognise it, so I did below, again FAILED!! please help me crack it and provide some guidance on what I am missing.. must be some basic as I am new

import numpy as np
import matplotlib.pyplot as plt
x=df1['price'].to_numpy()
y = df1['rating'].to_numpy()
scale=df1['obs'].to_numpy()
fig,ax = plt.subplots()
scatter = ax.scatter (x,y,s=scale)
plt.show()



RE: Not able to crack a simple visualization – missing something basic – plz guide - buran - Mar-27-2020

(Mar-27-2020, 07:22 AM)darpInd Wrote: So I tried below by adding scale variable in scatter plot- but it throws error
what error does it show - post the full traceback. It should work
import pandas
import matplotlib.pyplot as plt

df = pandas.DataFrame({'price':[3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1,],
      'rating':[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
      'obs':[4, 1, 4, 8, 21, 20, 26, 22, 23, 15, 12, 9, 7, 4, 4]})

plt.scatter(df.price, df.rating, s=df.obs*10) # I scale df.obs by factor of 10 just to see the difference in size better
plt.show()
[attachment=805]


RE: Not able to crack a simple visualization – missing something basic – plz guide - darpInd - Mar-27-2020

Here is the error.. btw I tried by doing df1.obs*10.. it is the same error which I got without using *10:
Error:
AttributeError Traceback (most recent call last) <ipython-input-40-c5c9e7d1608e> in <module> 1 import matplotlib.pyplot as plt 2 import pandas as pd ----> 3 plt.scatter(df1.price,df1.rating, s = df1.obs*10) 4 plt.show() C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, data, **kwargs) 2845 verts=verts, edgecolors=edgecolors, 2846 plotnonfinite=plotnonfinite, **({"data": data} if data is not -> 2847 None else {}), **kwargs) 2848 sci(__ret) 2849 return __ret C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, data, *args, **kwargs) 1599 def inner(ax, *args, data=None, **kwargs): 1600 if data is None: -> 1601 return func(ax, *map(sanitize_sequence, args), **kwargs) 1602 1603 bound = new_sig.bind(ax, *args, **kwargs) C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, **kwargs) 4496 offsets=offsets, 4497 transOffset=kwargs.pop('transform', self.transData), -> 4498 alpha=alpha 4499 ) 4500 collection.set_transform(mtransforms.IdentityTransform()) C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\collections.py in __init__(self, paths, sizes, **kwargs) 883 Collection.__init__(self, **kwargs) 884 self.set_paths(paths) --> 885 self.set_sizes(sizes) 886 self.stale = True 887 C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\collections.py in set_sizes(self, sizes, dpi) 855 self._sizes = np.asarray(sizes) 856 self._transforms = np.zeros((len(self._sizes), 3, 3)) --> 857 scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor 858 self._transforms[:, 0, 0] = scale 859 self._transforms[:, 1, 1] = scale AttributeError: 'str' object has no attribute 'sqrt'
(Mar-27-2020, 07:59 AM)buran Wrote:
(Mar-27-2020, 07:22 AM)darpInd Wrote: So I tried below by adding scale variable in scatter plot- but it throws error
what error does it show - post the full traceback. It should work
import pandas
import matplotlib.pyplot as plt

df = pandas.DataFrame({'price':[3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1,],
      'rating':[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
      'obs':[4, 1, 4, 8, 21, 20, 26, 22, 23, 15, 12, 9, 7, 4, 4]})

plt.scatter(df.price, df.rating, s=df.obs*10) # I scale df.obs by factor of 10 just to see the difference in size better
plt.show()



RE: Not able to crack a simple visualization – missing something basic – plz guide - buran - Mar-27-2020

As the error indicates you have strings in your dataframe. Probably you read from file, without converting the type?
You need to convert to number (at least the column used to calcuate the size)


RE: Not able to crack a simple visualization – missing something basic – plz guide - darpInd - Mar-27-2020

(Mar-27-2020, 08:18 AM)buran Wrote: As the error indicates you have strings in your dataframe. Probably you read from file, without converting the type?
You need to convert to number (at least the column used to calcuate the size)

Thanks you buran!! That was indeed the case, i converted to numeric and got it resolved!! Somehow , I assumed that my dataframe is having only int, hence the toruble. Good learning though!