Python Forum
Substr on Pandas Dataframe - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Substr on Pandas Dataframe (/thread-20818.html)



Substr on Pandas Dataframe - Scott - Sep-01-2019

Hi everyone,

I have a DF and I want to set an if statement in a function to sum a value if the first part of a field = '10'. This would be easy in SAS with the substr function. Can I do it in a dataframe or do I need to put it into an array and slice?

I have pasted the DF below, the column headers don't align well but you can make it out.

Output:
HSC Country Month Imports_(NZD) Harmonised System Description 0 101210015 New Zealand 201903 191,550 Horses; live, pure-bred breeding animals, thor... 1 101210015 New Zealand 201904 190,550 Horses; live, pure-bred breeding animals, thor... 2 101290010 New Zealand 201903 76,660 Horses; live, other than pure-bred breeding an... 3 101290010 New Zealand 201904 1,187,430 Horses; live, other than pure-bred breeding an... 4 101290013 New Zealand 201904 1,257,700 Horses; live, other than pure-bred breeding an...
What i want is an output with month as the index and then a new variable summed Import by is substr(hsc,0,2) = '01' which is grouped by month. I just want help with first variable and then I am going to create a few more summs based on the HSC that are grouped by month and have them as the new columns.

I hope that makes sense. Please let me know if you need more info.

Thanks


RE: Substr on Pandas Dataframe - scidam - Sep-02-2019

(Sep-01-2019, 06:21 AM)Scott Wrote: I have a DF and I want to set an if statement in a function to sum a value if the first part of a field = '10'.
You need to convert values to strings first and use .str.startswith method.

Take a look at the following minimal example I just wrote:

import pandas as pd
df = pd.DataFrame({"x": [100, 1000, 1000, 1919, 124], "y": [1, 2, 3, 4, 5]})
df.loc[df.x.astype(str).str.startswith('10'), 'y'].sum()