Python Forum
serious n00b.. NLTK in python 2.7 and 3.5
Thread Rating:
  • 2 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
serious n00b.. NLTK in python 2.7 and 3.5
#21
Yes and "tokenized" should be given as a argument to function.
When you test out stuff is best to remove try:except,so then it look like this.
import nltk
from nltk import pos_tag, PunktSentenceTokenizer
from nltk.corpus import state_union

def process_content(tokenized):
   for i in tokenized:
       words = nltk.word_tokenize(i)
       tagged = pos_tag(words)
       print(tagged)

if __name__ == '__main__':
   train_text = state_union.raw("2005-GWBush.txt")
   sample_text = state_union.raw("2006-GWBush.txt")
   custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
   tokenized = custom_sent_tokenizer.tokenize(sample_text)
   process_content(tokenized)
Reply
#22
wow.

I literally do not understand the difference.
Would you be able to explain in general what just happened?

I emulated your code and received the same result so what benefit would your coding have over the existing one I used?

I won't bore you with the code, it'd be repetitive, and the output is laboriously long and nonsensical without matplotlib which I'm unable to use for some reason. that's for another question. one at a time
Reply
#23
(Oct-20-2016, 06:58 PM)pythlang Wrote: I literally do not understand the difference.
Would you be able to explain in general what just happened?
"tokenized" is now given as a argument to the function,this make it clearer that is shall be used in function.
Global variables are usually not a good thing at all,will soon get messy.
Reply
#24
here was my post back to you:
Thanks for the clarification.

If I were adding code like this:

def process_content():

    try:
        for i in tokenized:
            words = nltk.word_tokenize(i)
            tagged = nltk.pos_tag(words)
            
            chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP><NN>?}"""

            chunkParser = nltk.RegexpParser(chunkGram)
            chunked = chunkParser.parse(tagged)

            chunked.draw()

    except Exception as e:
        print(str(e))

process_content()
how would I write it instead using your method with: 
def process_content(tokenized):

    for i in tokenized:
        words = nltk.word_tokenize(i)
        tagged = pos_tag(words)
        print(tagged)

if __name__ == '__main__':
    train_text = state_union.raw("2005-GWBush.txt")
    sample_text = state_union.raw("2006-GWBush.txt")
    custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
    tokenized = custom_sent_tokenizer.tokenize(sample_text)
    process_content(tokenized)
by the way, this is where "chunked.draw()" was failing and I got neither an output nor an error to the screen because I'm not sure if my matplotlib path was correct. 

however, my diagram randomly appeared a few minutes ago out of nowhere 

EDIT: when I try to move or resize the window in which the diagram appears from matplotlib, I get an error that "Python has quit unexpectedly". Shieettttt

EDIT2: I'm not so concerned with the window as I am adding my proposed addition above to the coding so nice provided by snippsat. 

Here's my attempt:

import nltk
from nltk import pos_tag, PunktSentenceTokenizer
from nltk.corpus import state_union

def process_content(tokenized):
    for i in tokenized:
        words = nltk.word_tokenize(i)
        tagged = pos_tag(words)
        print(tagged)

if __name__ == '__main__':
    train_text = state_union.raw("2005-GWBush.txt")
    sample_text = state_union.raw("2006-GWBush.txt")
    custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
    tokenized = custom_sent_tokenizer.tokenize(sample_text)
                
    chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""

    chunkParser = nltk.RegexpParser(chunkGram)
    chunked = chunkParser.parse(tagged)

    chunked.draw()
    process_content(tokenized)
Error:
[b][i][color=#cccccc][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']Traceback (most recent call last):[/font][/size][/font][/size][/color][/i][/b] [b][i][color=#cccccc][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback'] File "/Users/jordanXXX/Documents/NLP/chunking2", line 20, in <module>[/font][/size][/font][/size][/color][/i][/b] [b][i][color=#cccccc][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback']   chunked = chunkParser.parse(tagged)[/font][/size][/font][/size][/color][/i][/b] [color=#cccccc][size=small][font=-apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', HelveticaNeue-Light, Ubuntu, 'Droid Sans', sans-serif][size=x-small][font=Monaco, Menlo, Consolas, 'Droid Sans Mono', Inconsolata, 'Courier New', monospace, 'Droid Sans Fallback'][b][i]NameError: name 'tagged' is not defined[/i][/b][/font][/size][/font][/size][/color]
Reply
#25
alright guys I stayed up until 4am until I passed out really doing some studying and working out some code. the following is what I was satisfied with after elaborated on snippsat's coding:

import nltk
from nltk import pos_tag, PunktSentenceTokenizer, word_tokenize, RegexpParser
from nltk.corpus import state_union

def process_content(tokenized):
    for i in tokenized:
        words = word_tokenize(i)
        tagged = pos_tag(words)
        pos_tag(words)

        chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""

        chunkParser = RegexpParser(chunkGram)
        chunked = chunkParser.parse(tagged)

        chunked.draw()

if __name__ == '__main__':
    train_text = state_union.raw("2005-GWBush.txt")
    sample_text = state_union.raw("2006-GWBush.txt")
    custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
    tokenized = custom_sent_tokenizer.tokenize(sample_text)
    process_content(tokenized)
it was then successfully plotted with matplotlib and showed up as it should have. 

a YUGE thanks to everyone that guided me on here Heart Heart Heart Heart Heart Dance
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help with simple nltk Chatbot Extra 3 1,841 Jan-02-2022, 07:50 AM
Last Post: bepammoifoge
  Saving a download of stopwords (nltk) Drone4four 1 9,105 Nov-19-2020, 11:50 PM
Last Post: snippsat
  Installing nltk dependency Eshwar 0 1,796 Aug-30-2020, 06:10 PM
Last Post: Eshwar
  n00b question millpond 6 3,311 Jul-13-2019, 06:41 AM
Last Post: Gribouillis
  Clean Data using NLTK disruptfwd8 0 3,301 May-12-2018, 11:21 PM
Last Post: disruptfwd8
  n00b help with referencing files theturd 8 5,065 Jul-21-2017, 04:16 PM
Last Post: nilamo
  n00b needs help theturd 8 5,725 Jun-12-2017, 01:55 PM
Last Post: theturd
  Text Processing and NLTK (POS tagging) TwelveMoons 2 4,856 Mar-16-2017, 02:53 AM
Last Post: TwelveMoons
  NLTK create corpora pythlang 5 10,084 Oct-26-2016, 07:31 PM
Last Post: Larz60+
  Corpora catalof for NLTK Larz60+ 1 4,068 Oct-20-2016, 02:31 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020