Python Forum
Inconsistent sorting with the .sort_values() function
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Inconsistent sorting with the .sort_values() function
#1
Hi,

I am trying to sort rows of data in a pandas Dataframe using the .sort_values() function. The goal is to change the order so that the numbers are displayed from lowest to highest. When I select the column that I want (Inflation Yearly Basis) it reorders it in a randomized way. However, when I select another column from the same Dataframe (Inflation Monthly Basis) that has the same type of data, it displays properly from lowest to highest.

Appreciate any help –– thanks!

Here's the code:

import pandas as pd
import requests
import json
import pandas_datareader.data as webdata
import matplotlib as mpl
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import datetime
from datetime import date
import numpy as np
from bs4 import BeautifulSoup


url = 'https://www.global-rates.com/en/economic-indicators/inflation/consumer-prices/cpi/cpi.aspx'
r  = requests.get(url)
data = r.text
soup = BeautifulSoup(data, 'lxml')
data = []
table = soup.find('table', cellpadding="2")
table_body = table 

rows = table_body.find_all('tr', {'class': lambda x : x.startswith('tabledata')})
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele]) # Get rid of empty values

global_cpi_rates = pd.DataFrame (data, columns = ['Name','Country/Region','Period','Inflation Monthly Basis','Inflation Yearly Basis'])    
global_cpi_rates = global_cpi_rates.sort_values('Inflation Yearly Basis')
global_cpi_rates
Reply
#2
An example would be nice. My guess without any data is that your inflation numbers are strings, not floats. Sorting numerical strings gives funny results like 1000 is less than 2 because in the ascii table "1" is lower than "2".
Reply
#3
Finally found the attachment function. Please see attachments for data examples.

Attached Files

.csv   monthly_sort.csv (Size: 1.99 KB / Downloads: 1)
.csv   yearly_sort.csv (Size: 1.99 KB / Downloads: 0)
.csv   original_data.csv (Size: 1.99 KB / Downloads: 0)
Reply
#4
Thanks again. I looked into what you said and you are correct. I changed the column to a numeric dtype using apply(pd.to_numeric) and was able to sort it correctly.

The only issue now is that the column name has been removed (assuming because it is no longer a string?) so I can't figure out how to combine the two columns (Country/Region, Inflation Yearly Basis) back into a dataframe so I can plot it. Any ideas?

I've attached the code.

Attached Files

.py   Sorting Issue.py (Size: 1.71 KB / Downloads: 1)
Reply
#5
I would do the conversion when reading the csv file.
import pandas as pd
import io

data = io.StringIO(
    """
Name, Percent
A ,5%
B, 100%
C, 20%
D ,3%
"""
)

def percent_to_float(valuestr):
    """Convert percent to floats.  Strip trailing %, convert to float and divide by 100"""
    return float(valuestr.replace("%", "")) / 100.0

df = pd.read_csv(data, converters={"Name": str.strip, "Percent": percent_to_float})
print(df.sort_values(by=["Percent"]))
Output:
Name Percent 3 D 0.03 0 A 0.05 2 C 0.20 1 B 1.00
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  ValueError: Found input variables with inconsistent numbers of samples: [5, 6] bongielondy 6 25,922 Jun-28-2021, 05:23 AM
Last Post: ricslato
  ValueError: Found input variables with inconsistent numbers of sample robert2joe 0 4,268 Mar-25-2020, 11:10 AM
Last Post: robert2joe
  ValueError: Found input variables with inconsistent numbers of samples: [0, 3] ayaz786amd 2 9,620 Nov-27-2018, 07:12 AM
Last Post: ayaz786amd

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020