Python Forum
Substr on Pandas Dataframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Substr on Pandas Dataframe
#1
Hi everyone,

I have a DF and I want to set an if statement in a function to sum a value if the first part of a field = '10'. This would be easy in SAS with the substr function. Can I do it in a dataframe or do I need to put it into an array and slice?

I have pasted the DF below, the column headers don't align well but you can make it out.

Output:
HSC Country Month Imports_(NZD) Harmonised System Description 0 101210015 New Zealand 201903 191,550 Horses; live, pure-bred breeding animals, thor... 1 101210015 New Zealand 201904 190,550 Horses; live, pure-bred breeding animals, thor... 2 101290010 New Zealand 201903 76,660 Horses; live, other than pure-bred breeding an... 3 101290010 New Zealand 201904 1,187,430 Horses; live, other than pure-bred breeding an... 4 101290013 New Zealand 201904 1,257,700 Horses; live, other than pure-bred breeding an...
What i want is an output with month as the index and then a new variable summed Import by is substr(hsc,0,2) = '01' which is grouped by month. I just want help with first variable and then I am going to create a few more summs based on the HSC that are grouped by month and have them as the new columns.

I hope that makes sense. Please let me know if you need more info.

Thanks
Reply
#2
(Sep-01-2019, 06:21 AM)Scott Wrote: I have a DF and I want to set an if statement in a function to sum a value if the first part of a field = '10'.
You need to convert values to strings first and use .str.startswith method.

Take a look at the following minimal example I just wrote:

import pandas as pd
df = pd.DataFrame({"x": [100, 1000, 1000, 1919, 124], "y": [1, 2, 3, 4, 5]})
df.loc[df.x.astype(str).str.startswith('10'), 'y'].sum()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  HTML Decoder pandas dataframe column mbrown009 3 962 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  Use pandas to obtain cartesian product between a dataframe of int and equations? haihal 0 1,091 Jan-06-2023, 10:53 PM
Last Post: haihal
  Pandas Dataframe Filtering based on rows mvdlm 0 1,397 Apr-02-2022, 06:39 PM
Last Post: mvdlm
  Pandas dataframe: calculate metrics by year mcva 1 2,269 Mar-02-2022, 08:22 AM
Last Post: mcva
  Pandas dataframe comparing anto5 0 1,243 Jan-30-2022, 10:21 AM
Last Post: anto5
  PANDAS: DataFrame | Replace and others questions moduki1 2 1,759 Jan-10-2022, 07:19 PM
Last Post: moduki1
  PANDAS: DataFrame | Saving the wrong value moduki1 0 1,527 Jan-10-2022, 04:42 PM
Last Post: moduki1
  update values in one dataframe based on another dataframe - Pandas iliasb 2 9,105 Aug-14-2021, 12:38 PM
Last Post: jefsummers
  empty row in pandas dataframe rwahdan 3 2,423 Jun-22-2021, 07:57 PM
Last Post: snippsat
Question Pandas - Creating additional column in dataframe from another column Azureaus 2 2,920 Jan-11-2021, 09:53 PM
Last Post: Azureaus

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020