Pandas Import CSV count between numerical values within 1 Column

ptaylor520 · (This post was last modified: Jul-16-2019, 03:03 AM by scidam.)

fileuploadid completedate ... Difference successful (SUM)
0 156691956 03/06/19 00:00 ... 0 1
1 156691958 03/06/19 00:00 ... 0 1
2 156730185 10/06/19 00:00 ... 0 1
3 156815823 25/06/19 00:00 ... 0 1
4 156752103 14/06/19 00:00 ... -4 1

Hi There, fairly new to python. I have imported the csv above, head() output shown. I am been trying to start a script that will read the column 'Difference' and tell me how many instances of rows are there with a numerical value between 0 and 1.

Can I do this in one simple query or do you need to create a list of all rows that contain a value of 0 and 1 and then do a count of that list?

From there I can build the script to compute what normally is done through excel like this:

=COUNTIFS(F:F, ">=0", F:F, "<=1")

What I have seen from courses and online hasn't worked so far.

One of the examples I have seen is this:

df = pd.DataFrame({'a': [1, 2, 3]})
df[(df.a > 1) & (df.a < 3)].count()

But when applied to my datadrame it's not giving the output I am expecting.

Any assistance anybody can provide will be greatly appreciated.

ptaylor520 · (This post was last modified: Jul-15-2019, 04:58 PM by ptaylor520.)

I managed to get a workaround but I don't think this this is the best practice:

import pandas as pd
tsjune = pd.read_csv("/Users/phil/Documents/datacsv/tsj.csv", delimiter = ',')

one = tsjune[(tsjune.Difference >= 0) & (tsjune.Difference <=1)]
two = tsjune[(tsjune.Difference > 1) & (tsjune.Difference <=3)]
three = tsjune[(tsjune.Difference > 3) & (tsjune.Difference <30)]
four = tsjune[tsjune.Difference >= 30]


onecount = one.count()
twocount = two.count()
threecount = three.count()
fourcount = four.count()

total = onecount + twocount + threecount + fourcount

onepercentage = round(onecount/total*100, 2)
twopercentage = round(twocount/total*100, 2)
threepercentage = round(threecount/total*100, 2)
fourpercentage = round(fourcount/total*100, 2)

print(onepercentage['Difference'])
print(twopercentage['Difference'])
print(threepercentage['Difference'])
print(fourpercentage['Difference'])

Which gives me a percentage output of:

26.89
28.29
32.8
12.01

Any better ways of writing this query?

**scidam** · Jul-16-2019, 03:02 AM

You need to work around .cut method.
As a starting point, consider the following:

df.groupby(pd.cut(df.x, [0, 1, 3, 30, df.Difference.max()], right=False)).count().values / len(df) * 100

ptaylor520 · Jul-16-2019, 08:13 AM

Thanks! I'll give this a shot and post the results.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Assigning conditional values in Pandas	Scott	3	800	Dec-19-2023, 03:10 AM Last Post: Larz60+
	HTML Decoder pandas dataframe column	mbrown009	3	1,030	Sep-29-2023, 05:56 PM Last Post: deanhystad
	attempt to split values from within a dataframe column	mbrown009	8	2,354	Apr-10-2023, 02:06 AM Last Post: mbrown009
	Increase df column values decimals	SriRajesh	2	1,109	Nov-14-2022, 05:20 PM Last Post: deanhystad
	pandas column percentile	nuncio	7	2,445	Aug-10-2022, 04:41 AM Last Post: nuncio
	import csv adding a header with pandas	Soares	0	1,560	Dec-16-2021, 12:16 PM Last Post: Soares
	pandas: Compute the % of the unique values in a column	JaneTan	1	1,780	Oct-25-2021, 07:55 PM Last Post: jefsummers
	update values in one dataframe based on another dataframe - Pandas	iliasb	2	9,264	Aug-14-2021, 12:38 PM Last Post: jefsummers
	Pandas Data frame column condition check based on length of the value	aditi06	1	2,695	Jul-28-2021, 11:08 AM Last Post: jefsummers
	Help with Scipy optimize for numerical problem	Jbjbjb1	0	1,556	Jun-22-2021, 05:03 AM Last Post: Jbjbjb1

Pandas Import CSV count between numerical values within 1 Column

User Panel Messages

Announcements