Python Forum

Full Version: Pandas Import CSV count between numerical values within 1 Column
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
fileuploadid completedate ... Difference successful (SUM)
0 156691956 03/06/19 00:00 ... 0 1
1 156691958 03/06/19 00:00 ... 0 1
2 156730185 10/06/19 00:00 ... 0 1
3 156815823 25/06/19 00:00 ... 0 1
4 156752103 14/06/19 00:00 ... -4 1

Hi There, fairly new to python. I have imported the csv above, head() output shown. I am been trying to start a script that will read the column 'Difference' and tell me how many instances of rows are there with a numerical value between 0 and 1.

Can I do this in one simple query or do you need to create a list of all rows that contain a value of 0 and 1 and then do a count of that list?

From there I can build the script to compute what normally is done through excel like this:

=COUNTIFS(F:F, ">=0", F:F, "<=1")

What I have seen from courses and online hasn't worked so far.

One of the examples I have seen is this:
df = pd.DataFrame({'a': [1, 2, 3]})
df[(df.a > 1) & (df.a < 3)].count()
But when applied to my datadrame it's not giving the output I am expecting.

Any assistance anybody can provide will be greatly appreciated.
I managed to get a workaround but I don't think this this is the best practice:

import pandas as pd
tsjune = pd.read_csv("/Users/phil/Documents/datacsv/tsj.csv", delimiter = ',')

one = tsjune[(tsjune.Difference >= 0) & (tsjune.Difference <=1)]
two = tsjune[(tsjune.Difference > 1) & (tsjune.Difference <=3)]
three = tsjune[(tsjune.Difference > 3) & (tsjune.Difference <30)]
four = tsjune[tsjune.Difference >= 30]


onecount = one.count()
twocount = two.count()
threecount = three.count()
fourcount = four.count()

total = onecount + twocount + threecount + fourcount

onepercentage = round(onecount/total*100, 2)
twopercentage = round(twocount/total*100, 2)
threepercentage = round(threecount/total*100, 2)
fourpercentage = round(fourcount/total*100, 2)

print(onepercentage['Difference'])
print(twopercentage['Difference'])
print(threepercentage['Difference'])
print(fourpercentage['Difference'])
Which gives me a percentage output of:

26.89
28.29
32.8
12.01

Any better ways of writing this query?
You need to work around .cut method.
As a starting point, consider the following:

df.groupby(pd.cut(df.x, [0, 1, 3, 30, df.Difference.max()], right=False)).count().values / len(df) * 100
Thanks! I'll give this a shot and post the results.