Python Forum
How to retrieve records in a DataFrame (Python/Pandas) that contains leading or trail
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to retrieve records in a DataFrame (Python/Pandas) that contains leading or trail
#1
I would like to know how to write a formula that would identify/display records of string/object data type on a Pandas DataFrame that contains leading or trailing spaces.

The purpose for this is to get an audit on a Jupyter notebook of such records before applying any strip functions.

The goal is for the script to identify these records automatically without having to type the name of the columns manually. The scope should be any column of str/object data type that contains a value that includes either a leading or trailing spaces or both.

Please notice. I would like to see the resulting output in a dataframe format.

Link to sample datafram data:

Thank you!
Reply
#2
Why don't you try? Then, if you encounter problems, ask here?

I would just save your csv as Excel and use openpyxl. Seems easy!

I always think, maybe wrongly, pandas is for BIG DATA or rocket scientists!

I didn't find any leading or trailing spaces in your file.

When I put some in, I found them

mystring1 = ' a space in front'
mystring2 = 'a space behind '
Reply
#3
(Sep-04-2022, 09:44 AM)Pedroski55 Wrote: I always think, maybe wrongly, pandas is for BIG DATA or rocket scientists!
That's just wrong Pandas can be used for all kind of big or small tasks.

@mmunozjr som tips for this task.
import pandas as pd

d = {
    'Quarters' : ['quarter1','quarter2','quarter3','quarter4'],
     'Description': ['AA year', 'BB year', ' CC year', 'DD year '],
     'Revenue': [23.5, 54.6, 5.45, 41.87]
}

df = pd.DataFrame(d)
>>> df
   Quarters Description  Revenue
0  quarter1     AA year    23.50
1  quarter2     BB year    54.60
2  quarter3     CC year     5.45
3  quarter4    DD year     41.87
So CC and DD has leading and trailing space.
Pandas has Regex and method build in that can be used for this.
Eg with startswith and endswith
>>> df[df['Description'].str.startswith(' ')]
   Quarters Description  Revenue
2  quarter3     CC year     5.45
>>> df[df['Description'].str.endswith(' ')]
   Quarters Description  Revenue
3  quarter4    DD year     41.87
Regex.
>>> df['Description'].str.findall(r'^\s.*')
0            []
1            []
2    [ CC year]
3            []
Name: Description, dtype: object
>>> df['Description'].str.findall(r'.*\s$')
0            []
1            []
2            []
3    [DD year ]

# Find  both
>>> df['Description'].str.findall(r'(^[ ].*)|(.*[ ]$)')
0                []
1                []
2    [( CC year, )]
3    [(, DD year )]

>>> df['Description'].str.findall(r'(^[ ].*)|(.*[ ]$)').sum()
[(' CC year', ''), ('', 'DD year ')]
Look at How to use Regex in Pandas
Reply
#4
Quote:That's just wrong Pandas can be used for all kind of big or small tasks.

You are right!

This picture Plimpton322 on this webpage shows a Babylonian clay tablet from about 4000 years ago!

But it is really a df!

The column on the right is just a row number index!!!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition sanky1990 0 770 Dec-04-2023, 09:48 PM
Last Post: sanky1990
  Question on pandas.dataframe merging two colums shomikc 4 882 Jun-29-2023, 11:30 AM
Last Post: snippsat
  Removing leading whitespaces palladium 1 743 Mar-24-2023, 04:15 PM
Last Post: bowlofred
  Pandas AttributeError: 'DataFrame' object has no attribute 'concat' Sameer33 5 5,830 Feb-17-2023, 06:01 PM
Last Post: Sameer33
  help how to get size of pandas dataframe into MB\GB mg24 1 2,454 Jan-28-2023, 01:23 PM
Last Post: snippsat
  pandas dataframe into csv .... exponent issue mg24 10 1,889 Jan-20-2023, 08:15 PM
Last Post: deanhystad
  Removing leading\trailing spaces azizrasul 8 2,795 Oct-23-2022, 11:06 PM
Last Post: azizrasul
  How to assign a value to pandas dataframe column rows based on a condition klllmmm 0 876 Sep-08-2022, 06:32 AM
Last Post: klllmmm
  -i option changes sys.path (removes leading empty string '') markanth 6 2,033 Aug-26-2022, 09:27 PM
Last Post: markanth
  "Vlookup" in pandas dataframe doug2019 3 1,927 May-09-2022, 01:35 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020