Posts: 11
Threads: 4
Joined: Jun 2020
I am trying to make a regular express for df1(dataframe).
I want to remove the expression related NOPOP.NoPop and NONPOP information in 3rd column.
In order to achieve quick search, I put 3rd column as a index of dataframe.
And operated it in "df.filter" way with regex.
1 2 3 4 5 |
import pandas as pd
k = [[ 'a' , 'b' , 'c' , 'NOPOP' ],[ 'd' , 'e' , 'f' , 'POP' ],[ 'g' , 'h' , 'i' , 'j' ],[ 'k' , 'l' , 'm' , 'Pop' ],[ 'n' , 'o' , 'p' , 'NoPop_AA' ],[ 'q' , 'r' , 's' , 'NONPOP' ]]
df_exp = pd.DataFrame(k)
df1 = df_exp.set_index([ 3 ])
df2 = df1. filter (regex = '[^NOPOP]|[^NoPop]|[^NONPOP]' , axis = 0 )
|
Output: Out[263]:
0 1 2
3
NOPOP a b c
POP d e f
j g h i
Pop k l m
NoPop_AA n o p
NONPOP q r s
The result did not delete "NOPOP.NoPop and NONPOP" related information, why not?
my desire output is just like below
Output: 0 1 2
3
POP d e f
j g h i
Pop k l m
Posts: 7,326
Threads: 123
Joined: Sep 2016
Can use str.contains for this.
1 2 3 4 5 6 7 8 9 10 11 |
import pandas as pd
k = [
[ "a" , "b" , "c" , "NOPOP" ],
[ "d" , "e" , "f" , "POP" ],
[ "g" , "h" , "i" , "j" ],
[ "k" , "l" , "m" , "Pop" ],
[ "n" , "o" , "p" , "NoPop_AA" ],
[ "q" , "r" , "s" , "NONPOP" ],
]
df_exp = pd.DataFrame(k)
|
1 2 3 4 5 6 7 8 |
>>> df_exp = df_exp[~df_exp[ 3 ]. str .contains( 'NOPOP|NoPop|NONPOP' )]
>>> df1 = df_exp.set_index([ 3 ])
>>> df1
0 1 2
3
POP d e f
j g h i
Pop k l m
|
Posts: 11
Threads: 4
Joined: Jun 2020
Thank you for your quick reply. It's workable, achieved my goal.
(Jun-05-2020, 11:56 AM)snippsat Wrote: Can use str.contains for this.
1 2 3 4 5 6 7 8 9 10 11 |
import pandas as pd
k = [
[ "a" , "b" , "c" , "NOPOP" ],
[ "d" , "e" , "f" , "POP" ],
[ "g" , "h" , "i" , "j" ],
[ "k" , "l" , "m" , "Pop" ],
[ "n" , "o" , "p" , "NoPop_AA" ],
[ "q" , "r" , "s" , "NONPOP" ],
]
df_exp = pd.DataFrame(k)
|
1 2 3 4 5 6 7 8 |
>>> df_exp = df_exp[~df_exp[ 3 ]. str .contains( 'NOPOP|NoPop|NONPOP' )]
>>> df1 = df_exp.set_index([ 3 ])
>>> df1
0 1 2
3
POP d e f
j g h i
Pop k l m
|
Posts: 11
Threads: 4
Joined: Jun 2020
Sorry for another question.
I wonder if .str.contains includes specified functions just like re module?
For example: ' ^AA' expresses only searching words start with AA.
(Jun-05-2020, 11:56 AM)snippsat Wrote: Can use str.contains for this.
1 2 3 4 5 6 7 8 9 10 11 |
import pandas as pd
k = [
[ "a" , "b" , "c" , "NOPOP" ],
[ "d" , "e" , "f" , "POP" ],
[ "g" , "h" , "i" , "j" ],
[ "k" , "l" , "m" , "Pop" ],
[ "n" , "o" , "p" , "NoPop_AA" ],
[ "q" , "r" , "s" , "NONPOP" ],
]
df_exp = pd.DataFrame(k)
|
1 2 3 4 5 6 7 8 |
>>> df_exp = df_exp[~df_exp[ 3 ]. str .contains( 'NOPOP|NoPop|NONPOP' )]
>>> df1 = df_exp.set_index([ 3 ])
>>> df1
0 1 2
3
POP d e f
j g h i
Pop k l m
|
Posts: 7,326
Threads: 123
Joined: Sep 2016
Jun-12-2020, 10:14 AM
(This post was last modified: Jun-12-2020, 10:15 AM by snippsat.)
(Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module? Yes str.contains can take regular expression patterns as in the re module.
Quote:For example: '^AA' expresses only searching words start with AA.
Yes that would work,Pandas have a lot build in so there is also a str.startswith.
If wonder if something works,then is best to do a test.
1 2 3 4 5 6 7 8 |
import pandas as pd
d = {
'Quarters' : [ 'quarter1' , 'quarter2' , 'quarter3' , 'quarter4' ],
'Description' : [ 'AA year' , 'BB year' , 'CC year' , 'AA year' ],
'Revenue' : [ 23.5 , 54.6 , 5.45 , 41.87 ]
}
df = pd.DataFrame(d)
|
Test usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
>>> df[df[ 'Description' ]. str .contains(r '^AA' )]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df[ 'Description' ]. str .contains(r '^AA|BB' )]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
>>>
>>> df[df[ 'Description' ]. str .startswith( 'AA' )]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df[ 'Description' ]. str .startswith(( 'AA' , 'BB' ))]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
|
Posts: 11
Threads: 4
Joined: Jun 2020
Thank you for your reply. After trying your code, I got it. I think it is convenient for me to use .str.contains(r'^AA').
(Jun-12-2020, 10:14 AM)snippsat Wrote: (Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module? Yes str.contains can take regular expression patterns as in the re module.
Quote:For example: '^AA' expresses only searching words start with AA.
Yes that would work,Pandas have a lot build in so there is also a str.startswith.
If wonder if something works,then is best to do a test.
1 2 3 4 5 6 7 8 |
import pandas as pd
d = {
'Quarters' : [ 'quarter1' , 'quarter2' , 'quarter3' , 'quarter4' ],
'Description' : [ 'AA year' , 'BB year' , 'CC year' , 'AA year' ],
'Revenue' : [ 23.5 , 54.6 , 5.45 , 41.87 ]
}
df = pd.DataFrame(d)
|
Test usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
>>> df[df[ 'Description' ]. str .contains(r '^AA' )]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df[ 'Description' ]. str .contains(r '^AA|BB' )]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
>>>
>>> df[df[ 'Description' ]. str .startswith( 'AA' )]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df[ 'Description' ]. str .startswith(( 'AA' , 'BB' ))]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
|
Posts: 11
Threads: 4
Joined: Jun 2020
Jun-15-2020, 07:34 AM
(This post was last modified: Jun-15-2020, 07:39 AM by cools0607.)
sorry for another question.
I tried to search lots of data from Excel. After importing data to list(data structure).
I tried two methods.
1. using list with re module search.
2. Transfer list --> dataframe and then apply with .str.contains() method
Both of them can be workable. But dataframe is more slower than pandas dataframe. Is it reasonable?
PS: python console shows below user warning
1 2 |
UserWarning: This pattern has match groups. To actually get the groups, use str .extract.
return func( self , * args, * * kwargs)
|
(Jun-12-2020, 10:14 AM)snippsat Wrote: (Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module? Yes str.contains can take regular expression patterns as in the re module.
Quote:For example: '^AA' expresses only searching words start with AA.
Yes that would work,Pandas have a lot build in so there is also a str.startswith.
If wonder if something works,then is best to do a test.
1 2 3 4 5 6 7 8 |
import pandas as pd
d = {
'Quarters' : [ 'quarter1' , 'quarter2' , 'quarter3' , 'quarter4' ],
'Description' : [ 'AA year' , 'BB year' , 'CC year' , 'AA year' ],
'Revenue' : [ 23.5 , 54.6 , 5.45 , 41.87 ]
}
df = pd.DataFrame(d)
|
Test usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
>>> df[df[ 'Description' ]. str .contains(r '^AA' )]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df[ 'Description' ]. str .contains(r '^AA|BB' )]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
>>>
>>> df[df[ 'Description' ]. str .startswith( 'AA' )]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df[ 'Description' ]. str .startswith(( 'AA' , 'BB' ))]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
|
|