Python Forum
Vectorized parsing in dataFrame
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Vectorized parsing in dataFrame
#1
I wanted to know if there is a simpler way of doing this:
data={'A': [1, 2, 3], 'B': ['123456', '789012', '345678']}
cd=pd.DataFrame(data)
cd
Out[183]: 
   A       B
0  1  123456
1  2  789012
2  3  345678

f = lambda s:s[0:3]
fr=lambda s:s[-3:]

def codes(x):
    left=x.apply(f)
    right=x.apply(fr)
    return left, right

l, r = codes(cd['B'])
What I mean by vectorized is something like cd['B'][?:,0:3], maybe ?

Thanks.

Moderator zivoni: please use code tags for future posts
Reply
#2
You can use .str.extract() with regular expression pattern. Examples:
Output:
In [13]: df = pd.DataFrame({'C':['123456', '789012', '345678'], 'D':['12345678', '123', '2']}) In [14]: df Out[14]:         C         D 0  123456  12345678 1  789012       123 2  345678         2 In [15]: df.C.str.extract("(.{3})(.{3})")  # works only for strings with length 6 Out[15]:      0    1 0  123  456 1  789  012 2  345  678 In [16]: df.D.str.extract("(?=(.{,3})).*?(.{,3}$)")  # should work for any length of s, same as s[:3] and s[-3:] Out[16]:      0    1 0  123  678 1  123  123 2    2    2
Reply
#3
thanks. great!
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020