Python Forum

Full Version: matplotlib Plotting smooth line with nans
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I want to automate a chart (we call it snake charts, screenshot below) that so far we've been building in Excel.
It's a scatter plot with smoothed lines in between where the y-values are just rankings (1-2-3-4...) so that we can determine the order of the attributes.
It turns out that snake charts are not conventional (apparantly we made this thing up?) and I can't figure out how to smooth the line, knowing that we have nan's in the list.
My chart is ready, except for the lines between the dots, those should be smoothed if they can be (if they're connecting more than 2 dots)

I've read a lot of options on how to smooth lines, including that I should "mask" nan's
here: https://stackoverflow.com/questions/5283...ith-pyplot
and here: https://www.adamsmith.haus/python/answer...-in-python
and here: https://www.geeksforgeeks.org/how-to-plo...atplotlib/
and here: https://matplotlib.org/devdocs/gallery/l..._demo.html

...but none of those options seem to be able to solve my issue. Does anyone know how I can do this?
Note: in practice I can have 5 values, then a nan, and then 5 more values. So I can't simply skip the first nan.

import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import pandas as pd
import numpy as np 

data = pd.DataFrame({"brand": ["a", "a", "a", "a", "b", "b", "b", "b"],
                     "attribute": ["attr1", "attr2", "attr3", "attr4", "attr1", "attr2", "attr3", "attr4"],
                     "score": [np.nan, 0.55, 0.25, 0.15, 0.26, 0.45, 0.20, 0.15],
                     "order": [1, 2, 3, 4, 1, 2, 3, 4]})

colours= pd.DataFrame({"brand": ["a", "b"], "hex_color": ["#859F84", "#F57921"]})

element_column = "brand"
elements = data[element_column].unique().tolist()

for element in elements:
    x = data.loc[data[element_column] == element, "score"]
    y = data.loc[data[element_column] == element, "order"]
    colour = colours.loc[colours[element_column] == element, "hex_color"].item()    
    y = np.ma.masked_where(np.isnan(y), y)
    plt.scatter(x, y, c=colour)
    plt.plot(x, y, c=colour)

labels = data[['attribute', 'order']].drop_duplicates().copy()
plt.yticks(labels["order"], labels["attribute"])
plt.show()
What I want:
[Image: 275566857_4846251875423126_7378958004873...e=623032F0]
Can you get what you want if there are no NANs? Seems like a really small number of points for smoothing to work.
Hello deanhystad,

Yes, that is actually very simple with a very small tweak (cf code below).
(source: https://www.geeksforgeeks.org/how-to-plo...atplotlib/)

Code result (so with no nan's, I actually want "attr1" not to have data or a line for brand a):
[Image: 275603798_4848250668556580_9305822478926...e=6230BAFF]

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np 
from scipy.interpolate import make_interp_spline


data = pd.DataFrame({"brand": ["a", "a", "a", "a", "b", "b", "b", "b"],
                     "attribute": ["attr1", "attr2", "attr3", "attr4", "attr1", "attr2", "attr3", "attr4"],
                     "score": [0.20, 0.55, 0.25, 0.15, 0.26, 0.45, 0.20, 0.15],
                     "order": [1, 2, 3, 4, 1, 2, 3, 4]})
 
colours= pd.DataFrame({"brand": ["a", "b"], "hex_color": ["#859F84", "#F57921"]})
 
element_column = "brand"
elements = data[element_column].unique().tolist()

#data needs to be sorted for that spline to work
data = data.sort_values(by=['order'])

for element in elements:
    x = data.loc[data[element_column] == element, "score"]
    y = data.loc[data[element_column] == element, "order"]
    colour = colours.loc[colours[element_column] == element, "hex_color"].item()   

    X_Y_Spline = make_interp_spline(y, x)
    Y_ = np.linspace(y.min(), y.max(), 500)
    X_ = X_Y_Spline(Y_)
    plt.plot(X_, Y_, c=colour)
    plt.scatter(x, y, c=colour)
 
labels = data[['attribute', 'order']].drop_duplicates().copy()
plt.yticks(labels["order"], labels["attribute"])
plt.show()
You are ok with that? That is exactly what I thought would happen and I think it unacceptable. The range of the smooth line is much greater than the range of the points.

If that is ok, you'll likely be fine replacing NAN with a value interpolated from surrounding points: b =(a+c)/2.
Hm... Not sure I follow what you mean.
What I want is exactly the same graph as in my original post.
Let's assume that "attr2" was missing in my original data, then I expect:
* a single datapoint "attr1" (but no lines here)
* nothing at "attr2"
* a straigth line going from "attr3" to "attr4"

Just to be sure I'm clear: my post was an answer to "can you get what you need if there are no nan's", then yes, the chart that I showed in my reply would be what i wanted