Posts: 11
Threads: 4
Joined: Jun 2020
I am trying to make a regular express for df1(dataframe).
I want to remove the expression related NOPOP.NoPop and NONPOP information in 3rd column.
In order to achieve quick search, I put 3rd column as a index of dataframe.
And operated it in "df.filter" way with regex.
import pandas as pd
k=[['a','b','c','NOPOP'],['d','e','f','POP'],['g','h','i','j'],['k','l','m','Pop'],['n','o','p','NoPop_AA'],['q','r','s','NONPOP']]
df_exp=pd.DataFrame(k)
df1=df_exp.set_index([3])
df2=df1.filter(regex='[^NOPOP]|[^NoPop]|[^NONPOP]', axis=0) Output: Out[263]:
0 1 2
3
NOPOP a b c
POP d e f
j g h i
Pop k l m
NoPop_AA n o p
NONPOP q r s
The result did not delete "NOPOP.NoPop and NONPOP" related information, why not?
my desire output is just like below
Output: 0 1 2
3
POP d e f
j g h i
Pop k l m
Posts: 7,319
Threads: 123
Joined: Sep 2016
Can use str.contains for this.
import pandas as pd
k = [
["a", "b", "c", "NOPOP"],
["d", "e", "f", "POP"],
["g", "h", "i", "j"],
["k", "l", "m", "Pop"],
["n", "o", "p", "NoPop_AA"],
["q", "r", "s", "NONPOP"],
]
df_exp = pd.DataFrame(k) >>> df_exp = df_exp[~df_exp[3].str.contains('NOPOP|NoPop|NONPOP')]
>>> df1 = df_exp.set_index([3])
>>> df1
0 1 2
3
POP d e f
j g h i
Pop k l m
Posts: 11
Threads: 4
Joined: Jun 2020
Thank you for your quick reply. It's workable, achieved my goal.
(Jun-05-2020, 11:56 AM)snippsat Wrote: Can use str.contains for this.
import pandas as pd
k = [
["a", "b", "c", "NOPOP"],
["d", "e", "f", "POP"],
["g", "h", "i", "j"],
["k", "l", "m", "Pop"],
["n", "o", "p", "NoPop_AA"],
["q", "r", "s", "NONPOP"],
]
df_exp = pd.DataFrame(k) >>> df_exp = df_exp[~df_exp[3].str.contains('NOPOP|NoPop|NONPOP')]
>>> df1 = df_exp.set_index([3])
>>> df1
0 1 2
3
POP d e f
j g h i
Pop k l m
Posts: 11
Threads: 4
Joined: Jun 2020
Sorry for another question.
I wonder if .str.contains includes specified functions just like re module?
For example: ' ^AA' expresses only searching words start with AA.
(Jun-05-2020, 11:56 AM)snippsat Wrote: Can use str.contains for this.
import pandas as pd
k = [
["a", "b", "c", "NOPOP"],
["d", "e", "f", "POP"],
["g", "h", "i", "j"],
["k", "l", "m", "Pop"],
["n", "o", "p", "NoPop_AA"],
["q", "r", "s", "NONPOP"],
]
df_exp = pd.DataFrame(k) >>> df_exp = df_exp[~df_exp[3].str.contains('NOPOP|NoPop|NONPOP')]
>>> df1 = df_exp.set_index([3])
>>> df1
0 1 2
3
POP d e f
j g h i
Pop k l m
Posts: 7,319
Threads: 123
Joined: Sep 2016
Jun-12-2020, 10:14 AM
(This post was last modified: Jun-12-2020, 10:15 AM by snippsat.)
(Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module? Yes str.contains can take regular expression patterns as in the re module.
Quote:For example: '^AA' expresses only searching words start with AA.
Yes that would work,Pandas have a lot build in so there is also a str.startswith.
If wonder if something works,then is best to do a test.
import pandas as pd
d = {
'Quarters' : ['quarter1','quarter2','quarter3','quarter4'],
'Description': ['AA year', 'BB year', 'CC year', 'AA year'],
'Revenue': [23.5, 54.6, 5.45, 41.87]
}
df = pd.DataFrame(d) Test usage:
>>> df[df['Description'].str.contains(r'^AA')]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df['Description'].str.contains(r'^AA|BB')]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
>>> # Using str.startswith
>>> df[df['Description'].str.startswith('AA')]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df['Description'].str.startswith(('AA', 'BB'))]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
Posts: 11
Threads: 4
Joined: Jun 2020
Thank you for your reply. After trying your code, I got it. I think it is convenient for me to use .str.contains(r'^AA').
(Jun-12-2020, 10:14 AM)snippsat Wrote: (Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module? Yes str.contains can take regular expression patterns as in the re module.
Quote:For example: '^AA' expresses only searching words start with AA.
Yes that would work,Pandas have a lot build in so there is also a str.startswith.
If wonder if something works,then is best to do a test.
import pandas as pd
d = {
'Quarters' : ['quarter1','quarter2','quarter3','quarter4'],
'Description': ['AA year', 'BB year', 'CC year', 'AA year'],
'Revenue': [23.5, 54.6, 5.45, 41.87]
}
df = pd.DataFrame(d) Test usage:
>>> df[df['Description'].str.contains(r'^AA')]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df['Description'].str.contains(r'^AA|BB')]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
>>> # Using str.startswith
>>> df[df['Description'].str.startswith('AA')]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df['Description'].str.startswith(('AA', 'BB'))]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
Posts: 11
Threads: 4
Joined: Jun 2020
Jun-15-2020, 07:34 AM
(This post was last modified: Jun-15-2020, 07:39 AM by cools0607.)
sorry for another question.
I tried to search lots of data from Excel. After importing data to list(data structure).
I tried two methods.
1. using list with re module search.
2. Transfer list --> dataframe and then apply with .str.contains() method
Both of them can be workable. But dataframe is more slower than pandas dataframe. Is it reasonable?
PS: python console shows below user warning
UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
return func(self, *args, **kwargs) (Jun-12-2020, 10:14 AM)snippsat Wrote: (Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module? Yes str.contains can take regular expression patterns as in the re module.
Quote:For example: '^AA' expresses only searching words start with AA.
Yes that would work,Pandas have a lot build in so there is also a str.startswith.
If wonder if something works,then is best to do a test.
import pandas as pd
d = {
'Quarters' : ['quarter1','quarter2','quarter3','quarter4'],
'Description': ['AA year', 'BB year', 'CC year', 'AA year'],
'Revenue': [23.5, 54.6, 5.45, 41.87]
}
df = pd.DataFrame(d) Test usage:
>>> df[df['Description'].str.contains(r'^AA')]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df['Description'].str.contains(r'^AA|BB')]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
>>> # Using str.startswith
>>> df[df['Description'].str.startswith('AA')]
Description Quarters Revenue
0 AA year quarter1 23.50
3 AA year quarter4 41.87
>>> df[df['Description'].str.startswith(('AA', 'BB'))]
Description Quarters Revenue
0 AA year quarter1 23.50
1 BB year quarter2 54.60
3 AA year quarter4 41.87
|