Need your help

desul · Mar-24-2017, 05:38 PM

HII,

How to extract only NP TAG from the below tree.

i need this only e.g (NP Electrical/JJ power/NN)

toks = nltk.word_tokenize(line)
                postoks = nltk.tag.pos_tag(toks)
                
                grammar= r""" NP:{<JJ>*<NN.*>+} """
                
                chunker = nltk.RegexpParser(grammar)
                tree= chunker.parse(postoks)

Output:(S
  (NP Electrical/JJ power/NN)
  can/MD
  be/VB
  generated/VBN
  by/IN
  (NP means/NNS)
  of/IN
  (NP nuclear/JJ power/NN)
  ./.
  In/IN
  (NP nuclear/JJ power/NN station/NN)
  ,/,
  (NP electrical/JJ power/NN)
  is/VBZ
  generated/VBN
  by/IN
  (NP nuclear/JJ reaction/NN)
  ./.
  Here/RB
  ,/,
  (NP heavy/JJ radioactive/JJ elements/NNS such/JJ)
  as/IN
  (NP Uranium/NNP)
  (/(
  (NP U235/NNP)
  )/)
  or/CC
  (NP Thorium/NNP)
  (/(
  (NP Th232/NNP)
  )/)
  are/VBP
  subjected/VBN
  to/TO
  (NP nuclear/JJ fission/NN)
  ./.
  This/DT
  (NP fission/NN)
  is/VBZ
  done/VBN
  in/IN
  a/DT
  (NP special/JJ apparatus/NN)
  called/VBN
  as/IN
  (NP reactor/NN)
  ./.)

desul · Mar-24-2017, 06:47 PM

HII,

actually, I am using "nltk.RegexpParser" for identify finding noun phrase, but it is taking too time. So, please tell, how can I reduce the time?
or any alternative method for doing it....

 toks = nltk.word_tokenize(line)
                    postoks = nltk.tag.pos_tag(toks)
                
                    grammar= r""" NP:{<JJ>*<NN.*>+} """
                        
                    chunker = nltk.RegexpParser(grammar)
                    tree= chunker.parse(postoks)
                    NounPhrase= [" ".join([a for (a,b) in subtree.leaves()])for subtree in tree.subtrees(filter=lambda t: t.label() == 'NP')]

**nilamo** · Mar-24-2017, 07:06 PM

How about a regular expression?

>>> import re
>>> regex = re.compile(r"(\(NP[^)]*\))")
>>> text = '''
... (S
...   (NP Electrical/JJ power/NN)
...   can/MD
...   be/VB
...   generated/VBN
...   by/IN
...   (NP means/NNS)
...   of/IN
...   (NP nuclear/JJ power/NN)
...   ./.
...   In/IN
...   (NP nuclear/JJ power/NN station/NN)
...   ,/,
...   (NP electrical/JJ power/NN)
...   is/VBZ
...   generated/VBN
...   by/IN
...   (NP nuclear/JJ reaction/NN)
...   ./.
...   Here/RB
...   ,/,
...   (NP heavy/JJ radioactive/JJ elements/NNS such/JJ)
...   as/IN
...   (NP Uranium/NNP)
...   (/(
...   (NP U235/NNP)
...   )/)
...   or/CC
...   (NP Thorium/NNP)
...   (/(
...   (NP Th232/NNP)
...   )/)
...   are/VBP
...   subjected/VBN
...   to/TO
...   (NP nuclear/JJ fission/NN)
...   ./.
...   This/DT
...   (NP fission/NN)
...   is/VBZ
...   done/VBN
...   in/IN
...   a/DT
...   (NP special/JJ apparatus/NN)
...   called/VBN
...   as/IN
...   (NP reactor/NN)
...   ./.)
... '''

>>> matches = regex.findall(text)
>>> for match in matches:
...   print(match)
...
(NP Electrical/JJ power/NN)
(NP means/NNS)
(NP nuclear/JJ power/NN)
(NP nuclear/JJ power/NN station/NN)
(NP electrical/JJ power/NN)
(NP nuclear/JJ reaction/NN)
(NP heavy/JJ radioactive/JJ elements/NNS such/JJ)
(NP Uranium/NNP)
(NP U235/NNP)
(NP Thorium/NNP)
(NP Th232/NNP)
(NP nuclear/JJ fission/NN)
(NP fission/NN)
(NP special/JJ apparatus/NN)
(NP reactor/NN)

Need your help

User Panel Messages

Announcements