Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need your help
#1
HII,

How to extract only NP TAG from the below tree. 


i need this only e.g (NP Electrical/JJ power/NN)

toks = nltk.word_tokenize(line)
                postoks = nltk.tag.pos_tag(toks)
                
                grammar= r""" NP:{<JJ>*<NN.*>+} """
                
                chunker = nltk.RegexpParser(grammar)
                tree= chunker.parse(postoks)
Output:
(S   (NP Electrical/JJ power/NN)   can/MD   be/VB   generated/VBN   by/IN   (NP means/NNS)   of/IN   (NP nuclear/JJ power/NN)   ./.   In/IN   (NP nuclear/JJ power/NN station/NN)   ,/,   (NP electrical/JJ power/NN)   is/VBZ   generated/VBN   by/IN   (NP nuclear/JJ reaction/NN)   ./.   Here/RB   ,/,   (NP heavy/JJ radioactive/JJ elements/NNS such/JJ)   as/IN   (NP Uranium/NNP)   (/(   (NP U235/NNP)   )/)   or/CC   (NP Thorium/NNP)   (/(   (NP Th232/NNP)   )/)   are/VBP   subjected/VBN   to/TO   (NP nuclear/JJ fission/NN)   ./.   This/DT   (NP fission/NN)   is/VBZ   done/VBN   in/IN   a/DT   (NP special/JJ apparatus/NN)   called/VBN   as/IN   (NP reactor/NN)   ./.)
Reply
#2
HII, 

actually, I am using "nltk.RegexpParser" for identify finding noun phrase, but it is taking too time. So, please tell, how can I reduce the time?
or any alternative method for doing it....
 toks = nltk.word_tokenize(line)
                    postoks = nltk.tag.pos_tag(toks)
                
                    grammar= r""" NP:{<JJ>*<NN.*>+} """
                        
                    chunker = nltk.RegexpParser(grammar)
                    tree= chunker.parse(postoks)
                    NounPhrase= [" ".join([a for (a,b) in subtree.leaves()])for subtree in tree.subtrees(filter=lambda t: t.label() == 'NP')]
Reply
#3
How about a regular expression?  
>>> import re
>>> regex = re.compile(r"(\(NP[^)]*\))")
>>> text = '''
... (S
...   (NP Electrical/JJ power/NN)
...   can/MD
...   be/VB
...   generated/VBN
...   by/IN
...   (NP means/NNS)
...   of/IN
...   (NP nuclear/JJ power/NN)
...   ./.
...   In/IN
...   (NP nuclear/JJ power/NN station/NN)
...   ,/,
...   (NP electrical/JJ power/NN)
...   is/VBZ
...   generated/VBN
...   by/IN
...   (NP nuclear/JJ reaction/NN)
...   ./.
...   Here/RB
...   ,/,
...   (NP heavy/JJ radioactive/JJ elements/NNS such/JJ)
...   as/IN
...   (NP Uranium/NNP)
...   (/(
...   (NP U235/NNP)
...   )/)
...   or/CC
...   (NP Thorium/NNP)
...   (/(
...   (NP Th232/NNP)
...   )/)
...   are/VBP
...   subjected/VBN
...   to/TO
...   (NP nuclear/JJ fission/NN)
...   ./.
...   This/DT
...   (NP fission/NN)
...   is/VBZ
...   done/VBN
...   in/IN
...   a/DT
...   (NP special/JJ apparatus/NN)
...   called/VBN
...   as/IN
...   (NP reactor/NN)
...   ./.)
... '''

>>> matches = regex.findall(text)
>>> for match in matches:
...   print(match)
...
(NP Electrical/JJ power/NN)
(NP means/NNS)
(NP nuclear/JJ power/NN)
(NP nuclear/JJ power/NN station/NN)
(NP electrical/JJ power/NN)
(NP nuclear/JJ reaction/NN)
(NP heavy/JJ radioactive/JJ elements/NNS such/JJ)
(NP Uranium/NNP)
(NP U235/NNP)
(NP Thorium/NNP)
(NP Th232/NNP)
(NP nuclear/JJ fission/NN)
(NP fission/NN)
(NP special/JJ apparatus/NN)
(NP reactor/NN)
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020