Word co-occurrence matrix for a string (NLP) - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Word co-occurrence matrix for a string (NLP) (/thread-8598.html) |
Word co-occurrence matrix for a string (NLP) - JoeB - Feb-27-2018 I need to create a word co-occurrence matrix that shows how many times one word in a vocabulary precedes all other words in the vocabulary for a given corpus. The input sentence can be tokenized or not. The method has to be scalable to a sentence that is millions of words long, so much be efficient. test_sent = ['hello', 'i', 'am', 'hello', 'i', 'dont', 'want', 'to', 'i', 'dont']I would want this to give an output of: For example, the 2 in (row1, col2) shows that 'i' follows 'hello' twice.How can I implement something like this using sklearn? RE: Word co-occurrence matrix for a string (NLP) - Larz60+ - Feb-27-2018 Take a look at NLTK: https://www.nltk.org/ RE: Word co-occurrence matrix for a string (NLP) - Larz60+ - Feb-27-2018 Here's something that might help: https://stackoverflow.com/questions/37331708/nltk-find-occurrences-of-a-word-within-5-words-left-right-of-context-words-in |