Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Pandas Import CSV count between numerical values within 1 Column
#1
fileuploadid completedate ... Difference successful (SUM)
0 156691956 03/06/19 00:00 ... 0 1
1 156691958 03/06/19 00:00 ... 0 1
2 156730185 10/06/19 00:00 ... 0 1
3 156815823 25/06/19 00:00 ... 0 1
4 156752103 14/06/19 00:00 ... -4 1

Hi There, fairly new to python. I have imported the csv above, head() output shown. I am been trying to start a script that will read the column 'Difference' and tell me how many instances of rows are there with a numerical value between 0 and 1.

Can I do this in one simple query or do you need to create a list of all rows that contain a value of 0 and 1 and then do a count of that list?

From there I can build the script to compute what normally is done through excel like this:

=COUNTIFS(F:F, ">=0", F:F, "<=1")

What I have seen from courses and online hasn't worked so far.

One of the examples I have seen is this:
df = pd.DataFrame({'a': [1, 2, 3]})
df[(df.a > 1) & (df.a < 3)].count()
But when applied to my datadrame it's not giving the output I am expecting.

Any assistance anybody can provide will be greatly appreciated.
Quote
#2
I managed to get a workaround but I don't think this this is the best practice:

import pandas as pd
tsjune = pd.read_csv("/Users/phil/Documents/datacsv/tsj.csv", delimiter = ',')

one = tsjune[(tsjune.Difference >= 0) & (tsjune.Difference <=1)]
two = tsjune[(tsjune.Difference > 1) & (tsjune.Difference <=3)]
three = tsjune[(tsjune.Difference > 3) & (tsjune.Difference <30)]
four = tsjune[tsjune.Difference >= 30]


onecount = one.count()
twocount = two.count()
threecount = three.count()
fourcount = four.count()

total = onecount + twocount + threecount + fourcount

onepercentage = round(onecount/total*100, 2)
twopercentage = round(twocount/total*100, 2)
threepercentage = round(threecount/total*100, 2)
fourpercentage = round(fourcount/total*100, 2)

print(onepercentage['Difference'])
print(twopercentage['Difference'])
print(threepercentage['Difference'])
print(fourpercentage['Difference'])
Which gives me a percentage output of:

26.89
28.29
32.8
12.01

Any better ways of writing this query?
Quote
#3
You need to work around .cut method.
As a starting point, consider the following:

df.groupby(pd.cut(df.x, [0, 1, 3, 30, df.Difference.max()], right=False)).count().values / len(df) * 100
Quote
#4
Thanks! I'll give this a shot and post the results.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Dropping a column from pandas dataframe marco_ita 6 281 Sep-07-2019, 08:36 AM
Last Post: marco_ita
  How to drop column in pandas SriMekala 3 294 Aug-26-2019, 06:36 PM
Last Post: snippsat
  How to use the excel filename as a value to populate new column, using Pandas? Dequanharrison 5 301 Jun-26-2019, 11:11 PM
Last Post: Dequanharrison
  pandas change row value an existing column with conditionals Gigux 1 248 Jun-22-2019, 08:04 PM
Last Post: Gigux
  Custom timeinterval converted to hourly values using Pandas? SinPy 1 595 Jun-07-2019, 05:06 AM
Last Post: heiner55
  Pandas: faster method to count occurrences frame 0 651 May-26-2019, 07:45 PM
Last Post: frame
  import pandas as pd not working in pclinuxos loren41 3 235 May-19-2019, 03:49 PM
Last Post: Larz60+
  How to delete column if entire column values are "nan" Sri 4 482 Apr-13-2019, 12:16 PM
Last Post: Sri
  Text to column pandas ms5573 0 530 Aug-25-2018, 08:18 PM
Last Post: ms5573
  Splitting values in column in a pandas dataframe based on a condition hey_arnold 1 1,309 Jul-24-2018, 02:18 PM
Last Post: hey_arnold

Forum Jump:


Users browsing this thread: 1 Guest(s)