Python Forum
Visualizing musician chart performance and ranking using pandas and matplotlib
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Visualizing musician chart performance and ranking using pandas and matplotlib
#1
I am enrolled in a not for school credit Udemy course by Colt Steele on data anlaysis in Python using common data visulization modules like pandas, matplotlib, plotly, and seaborn.

The task I am working on calls students to:

Quote:Create a line plot seen in the image attached. It shows the chart performance (rank) of the following songs:
  • All I Want For Christmas Is You by Mariah Carey
  • Rockin' Around The Christmas Tree by Brenda Lee
  • Jingle Bell Rock by Bobby Helms
The date range spans from 2016-12-25 to 2021-01-01
Notice the customized x-axis tick marks, the legend, the title, and the axis labels! Also the figure is 10x7
To invert the y-axis, use plt.gca().invert_yaxis()

This screenshot shows the expected data visualization and end product. See here:
   

This screenshot shows my valiant but evidentlyvery broken feeble attempt:
   

Here is my code snippet:

import pandas as pd
# import plotly.express as px
import matplotlib.pyplot as plt

hot_billboard = pd.read_csv('data/billboard_charts.csv')
# hot_billboard.info()

fig, ax = plt.subplots()
hot_billboard_dates_parsed = hot_billboard.loc[(hot_billboard['date'] >= '2016-12-25') & (hot_billboard['date'] <= '2020-01-01')]

rockin = hot_billboard_dates_parsed[hot_billboard_dates_parsed['song'] == "Rockin' Around The Christmas Tree"]
alliwant = hot_billboard_dates_parsed[hot_billboard_dates_parsed['song'] == "All I Want For Christmas Is You"]
jingle = hot_billboard_dates_parsed[hot_billboard_dates_parsed['song'] == "Jingle Bell Rock"]

jingle.plot.line(x='date',y='rank',ax=ax, color='green', legend=False)
alliwant.plot.line(x='date',y='rank',ax=ax, color='red', legend=False)
rockin.plot.line(x='date',y='rank',ax=ax, color='blue', legend=False)

plt.gca().invert_yaxis()
Take note: The instructor asks students to decorate the plot with custom x/y ticks, a legend, and use certain colors. I am not concerned with these cosmetic configurations at this point. One step at a time. For now I am focused on resolving the overlapping x-axis increments and the red (alliwant) data points from extending beyond the paramaters of the x-axis. Can anyone on these forums identify what variable or method I may be (mis)using which is causing to skew the red line? What do I need to modify in my code to tell Python, pandas, matplotlib to keep all three musical artists' #1 rank Series data within the bounds of the datetime x-axis (Christmas 2016-2021)?

Thanks.
Reply
#2
Got some data to try out? (billboard_charts.csv perhaps!?)

hot_billboard = pd.read_csv('data/billboard_charts.csv')
Reply
#3
Thank you @Pedroski55 for your reply!

Although I’m not sure I understand your question.

If you are suggesting that I am missing the hot_billboard = pd.read_csv('data/billboard_charts.csv') line, then to clarify: I have already included it in my original post in my code snippet at line #4. So my code is complete and works.

However if in your question you are asking for my csv file so you can test my code on your local computer, then I hesitate because the data set reaches back 70 years from the present date. Even though it is a text file, it is ~18 MB large which is too much for the forum to handle. They won’t allow me to upload the full raw file. So moments ago I cropped the entries starting from 2015 and earlier (the exercise I am working on calls students to process data from 2016-2020). The forum still wouldn't allow me because it's 1.9 MB and the foum explains that they limit uploads to 250 KB max.

Therefore I have uploaded the original data set in full to my personal Dropbox.

I hope that works.

Even still, I am not quite sure if I have answered your question, Pedroski55. If you could kindly elaborate further on what you mean when you say: “Got some data to try out? (billboard_charts.csv perhaps!?)”, then I will do my best to follow up. Thanks.
Reply
#4
Yes, what I meant was: "Please supply some data for testing."

I think just a sample of your csv, maybe 100 lines would suffice, but I downloaded the big csv from your Dropbox anyway, just for fun!

I will have a look when I have time!
Reply
#5
Most check that types are right first.
import pandas as pd
# import plotly.express as px
import matplotlib.pyplot as plt

hot_billboard = pd.read_csv('billboard_charts.csv')
>>> hot_billboard.dtypes
date               object
rank                int64
song               object
artist             object
last-week         float64
peak-rank           int64
weeks-on-board      int64
dtype: object

>>> hot_billboard.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 330087 entries, 0 to 330086
Data columns (total 7 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   date            330087 non-null  object 
 1   rank            330087 non-null  int64  
 2   song            330087 non-null  object 
 3   artist          330087 non-null  object 
 4   last-week       297775 non-null  float64
 5   peak-rank       330087 non-null  int64  
 6   weeks-on-board  330087 non-null  int64  
dtypes: float64(1), int64(3), object(3)
memory usage: 17.6+ MB
So a problem date is a object(which mean a string)
The no plot with date will work,so have to convert to Pandas datetime64.
>>> hot_billboard['date'] = pd.to_datetime(hot_billboard['date'])
>>> hot_billboard.dtypes
date              datetime64[ns]
rank                       int64
song                      object
artist                    object
last-week                float64
peak-rank                  int64
weeks-on-board             int64
dtype: object
Add this line after read billbord,then it should work.
hot_billboard['date'] = pd.to_datetime(hot_billboard['date']) 
Reply
#6
Eureka! Thank you @snippsat!

That makes total sense. I should have known. There are a few ways to parse date and time objects when they are in their original string format. Here is what I ended up using to properly cast and parse the date Series - - when reading the CSV file in the opening lines of my code snippet:

hot_billboard = pd.read_csv('data/billboard_charts.csv',parse_dates=['date'])
Now my graph appears exactly as I set out to achieve.

Thanks again. Smile
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question I’m trying to create a Power BI Matplotlib quadrant chart and I’m a little stumped. Nidwolff 1 523 Mar-04-2024, 06:07 AM
Last Post: Danishhafeez
  visualizing huge correation matrix erdemath 3 2,063 Oct-13-2021, 09:44 AM
Last Post: erdemath
  Rename labels of a bar chart Matplotlib smalatray 1 4,410 Jul-01-2020, 01:48 AM
Last Post: hussainmujtaba
  Matplotlib bar chart ollarch 0 1,440 Mar-04-2020, 10:45 AM
Last Post: ollarch
  Spacing pie chart colours evenly in matplotlib? Giovanni_diJacopo 1 3,356 Jul-12-2019, 12:31 PM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020