Visualizing musician chart performance and ranking using pandas and matplotlib - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Visualizing musician chart performance and ranking using pandas and matplotlib (/thread-42047.html) |
Visualizing musician chart performance and ranking using pandas and matplotlib - Drone4four - Apr-28-2024 I am enrolled in a not for school credit Udemy course by Colt Steele on data anlaysis in Python using common data visulization modules like pandas, matplotlib, plotly, and seaborn. The task I am working on calls students to: Quote:Create a line plot seen in the image attached. It shows the chart performance (rank) of the following songs: This screenshot shows the expected data visualization and end product. See here: [attachment=2839] This screenshot shows my valiant but evidentlyvery broken feeble attempt: [attachment=2840] Here is my code snippet: import pandas as pd # import plotly.express as px import matplotlib.pyplot as plt hot_billboard = pd.read_csv('data/billboard_charts.csv') # hot_billboard.info() fig, ax = plt.subplots() hot_billboard_dates_parsed = hot_billboard.loc[(hot_billboard['date'] >= '2016-12-25') & (hot_billboard['date'] <= '2020-01-01')] rockin = hot_billboard_dates_parsed[hot_billboard_dates_parsed['song'] == "Rockin' Around The Christmas Tree"] alliwant = hot_billboard_dates_parsed[hot_billboard_dates_parsed['song'] == "All I Want For Christmas Is You"] jingle = hot_billboard_dates_parsed[hot_billboard_dates_parsed['song'] == "Jingle Bell Rock"] jingle.plot.line(x='date',y='rank',ax=ax, color='green', legend=False) alliwant.plot.line(x='date',y='rank',ax=ax, color='red', legend=False) rockin.plot.line(x='date',y='rank',ax=ax, color='blue', legend=False) plt.gca().invert_yaxis()Take note: The instructor asks students to decorate the plot with custom x/y ticks, a legend, and use certain colors. I am not concerned with these cosmetic configurations at this point. One step at a time. For now I am focused on resolving the overlapping x-axis increments and the red ( alliwant ) data points from extending beyond the paramaters of the x-axis. Can anyone on these forums identify what variable or method I may be (mis)using which is causing to skew the red line? What do I need to modify in my code to tell Python, pandas, matplotlib to keep all three musical artists' #1 rank Series data within the bounds of the datetime x-axis (Christmas 2016-2021)?Thanks. RE: Visualizing musician chart performance and ranking using pandas and matplotlib - Pedroski55 - Apr-29-2024 Got some data to try out? (billboard_charts.csv perhaps!?) hot_billboard = pd.read_csv('data/billboard_charts.csv') RE: Visualizing musician chart performance and ranking using pandas and matplotlib - Drone4four - May-05-2024 Thank you @Pedroski55 for your reply! Although I’m not sure I understand your question. If you are suggesting that I am missing the hot_billboard = pd.read_csv('data/billboard_charts.csv') line, then to clarify: I have already included it in my original post in my code snippet at line #4. So my code is complete and works.However if in your question you are asking for my csv file so you can test my code on your local computer, then I hesitate because the data set reaches back 70 years from the present date. Even though it is a text file, it is ~18 MB large which is too much for the forum to handle. They won’t allow me to upload the full raw file. So moments ago I cropped the entries starting from 2015 and earlier (the exercise I am working on calls students to process data from 2016-2020). The forum still wouldn't allow me because it's 1.9 MB and the foum explains that they limit uploads to 250 KB max. Therefore I have uploaded the original data set in full to my personal Dropbox. I hope that works. Even still, I am not quite sure if I have answered your question, Pedroski55. If you could kindly elaborate further on what you mean when you say: “Got some data to try out? (billboard_charts.csv perhaps!?)”, then I will do my best to follow up. Thanks. RE: Visualizing musician chart performance and ranking using pandas and matplotlib - Pedroski55 - May-05-2024 Yes, what I meant was: "Please supply some data for testing." I think just a sample of your csv, maybe 100 lines would suffice, but I downloaded the big csv from your Dropbox anyway, just for fun! I will have a look when I have time! RE: Visualizing musician chart performance and ranking using pandas and matplotlib - snippsat - May-07-2024 Most check that types are right first.import pandas as pd # import plotly.express as px import matplotlib.pyplot as plt hot_billboard = pd.read_csv('billboard_charts.csv') >>> hot_billboard.dtypes date object rank int64 song object artist object last-week float64 peak-rank int64 weeks-on-board int64 dtype: object >>> hot_billboard.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 330087 entries, 0 to 330086 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 330087 non-null object 1 rank 330087 non-null int64 2 song 330087 non-null object 3 artist 330087 non-null object 4 last-week 297775 non-null float64 5 peak-rank 330087 non-null int64 6 weeks-on-board 330087 non-null int64 dtypes: float64(1), int64(3), object(3) memory usage: 17.6+ MBSo a problem date is a object (which mean a string)The no plot with date will work,so have to convert to Pandas datetime64 .>>> hot_billboard['date'] = pd.to_datetime(hot_billboard['date']) >>> hot_billboard.dtypes date datetime64[ns] rank int64 song object artist object last-week float64 peak-rank int64 weeks-on-board int64 dtype: objectAdd this line after read billbord,then it should work. hot_billboard['date'] = pd.to_datetime(hot_billboard['date']) RE: Visualizing musician chart performance and ranking using pandas and matplotlib - Drone4four - May-12-2024 Eureka! Thank you @snippsat! That makes total sense. I should have known. There are a few ways to parse date and time objects when they are in their original string format. Here is what I ended up using to properly cast and parse the date Series - - when reading the CSV file in the opening lines of my code snippet:hot_billboard = pd.read_csv('data/billboard_charts.csv',parse_dates=['date'])Now my graph appears exactly as I set out to achieve. Thanks again. |