Python Forum
Pandas Import CSV count between numerical values within 1 Column
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas Import CSV count between numerical values within 1 Column
#1
fileuploadid completedate ... Difference successful (SUM)
0 156691956 03/06/19 00:00 ... 0 1
1 156691958 03/06/19 00:00 ... 0 1
2 156730185 10/06/19 00:00 ... 0 1
3 156815823 25/06/19 00:00 ... 0 1
4 156752103 14/06/19 00:00 ... -4 1

Hi There, fairly new to python. I have imported the csv above, head() output shown. I am been trying to start a script that will read the column 'Difference' and tell me how many instances of rows are there with a numerical value between 0 and 1.

Can I do this in one simple query or do you need to create a list of all rows that contain a value of 0 and 1 and then do a count of that list?

From there I can build the script to compute what normally is done through excel like this:

=COUNTIFS(F:F, ">=0", F:F, "<=1")

What I have seen from courses and online hasn't worked so far.

One of the examples I have seen is this:
df = pd.DataFrame({'a': [1, 2, 3]})
df[(df.a > 1) & (df.a < 3)].count()
But when applied to my datadrame it's not giving the output I am expecting.

Any assistance anybody can provide will be greatly appreciated.
Reply
#2
I managed to get a workaround but I don't think this this is the best practice:

import pandas as pd
tsjune = pd.read_csv("/Users/phil/Documents/datacsv/tsj.csv", delimiter = ',')

one = tsjune[(tsjune.Difference >= 0) & (tsjune.Difference <=1)]
two = tsjune[(tsjune.Difference > 1) & (tsjune.Difference <=3)]
three = tsjune[(tsjune.Difference > 3) & (tsjune.Difference <30)]
four = tsjune[tsjune.Difference >= 30]


onecount = one.count()
twocount = two.count()
threecount = three.count()
fourcount = four.count()

total = onecount + twocount + threecount + fourcount

onepercentage = round(onecount/total*100, 2)
twopercentage = round(twocount/total*100, 2)
threepercentage = round(threecount/total*100, 2)
fourpercentage = round(fourcount/total*100, 2)

print(onepercentage['Difference'])
print(twopercentage['Difference'])
print(threepercentage['Difference'])
print(fourpercentage['Difference'])
Which gives me a percentage output of:

26.89
28.29
32.8
12.01

Any better ways of writing this query?
Reply
#3
You need to work around .cut method.
As a starting point, consider the following:

df.groupby(pd.cut(df.x, [0, 1, 3, 30, df.Difference.max()], right=False)).count().values / len(df) * 100
Reply
#4
Thanks! I'll give this a shot and post the results.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Find duplicates in a pandas dataframe list column on other rows Calab 2 606 Sep-18-2024, 07:38 PM
Last Post: Calab
  Find strings by index from a list of indexes in a different Pandas dataframe column Calab 3 773 Aug-26-2024, 04:52 PM
Last Post: Calab
  attempt to split values from within a dataframe column mbrown009 9 3,962 Jun-20-2024, 07:59 PM
Last Post: AdamHensley
  Assigning conditional values in Pandas Scott 3 1,629 Dec-19-2023, 03:10 AM
Last Post: Larz60+
  HTML Decoder pandas dataframe column mbrown009 3 1,871 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  Increase df column values decimals SriRajesh 2 1,491 Nov-14-2022, 05:20 PM
Last Post: deanhystad
  pandas column percentile nuncio 7 3,322 Aug-10-2022, 04:41 AM
Last Post: nuncio
  import csv adding a header with pandas Soares 0 1,805 Dec-16-2021, 12:16 PM
Last Post: Soares
  pandas: Compute the % of the unique values in a column JaneTan 1 2,163 Oct-25-2021, 07:55 PM
Last Post: jefsummers
  update values in one dataframe based on another dataframe - Pandas iliasb 2 11,010 Aug-14-2021, 12:38 PM
Last Post: jefsummers

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020