Python Forum
Word co-occurrence matrix for a string (NLP)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Word co-occurrence matrix for a string (NLP)
#1
I need to create a word co-occurrence matrix that shows how many times one word in a vocabulary precedes all other words in the vocabulary for a given corpus.

The input sentence can be tokenized or not. The method has to be scalable to a sentence that is millions of words long, so much be efficient.

test_sent = ['hello', 'i', 'am', 'hello', 'i', 'dont', 'want', 'to', 'i', 'dont']
I would want this to give an output of:

Output:
[[0. 2. 0. 0. 0. 0.] [0. 0. 0. 1. 0. 2.] [0. 1. 0. 0. 0. 0.] [0. 0. 0. 0. 1. 0.] [1. 0. 0. 0. 0. 0.] [0. 0. 1. 0. 0. 0.]]
For example, the 2 in (row1, col2) shows that 'i' follows 'hello' twice.

How can I implement something like this using sklearn?
Reply
#2
Take a look at NLTK: https://www.nltk.org/
Reply
#3
Here's something that might help: https://stackoverflow.com/questions/3733...t-words-in
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Retrieve word from string knob 4 432 Jan-22-2024, 06:40 PM
Last Post: Pedroski55
  extract substring from a string before a word !! evilcode1 3 491 Nov-08-2023, 12:18 AM
Last Post: evilcode1
  Check if two matrix are equal and of not add the matrix to the list quest 3 778 Jul-10-2023, 02:41 AM
Last Post: deanhystad
  How to get unique entries in a list and the count of occurrence james2009 5 2,910 May-08-2022, 04:34 AM
Last Post: ndc85430
  Isolate a word from a long string nicocorico 2 1,498 Feb-25-2022, 01:12 PM
Last Post: nicocorico
  change string in MS word Mr_Blue 8 3,214 Sep-19-2021, 02:13 PM
Last Post: snippsat
Question Problem: Check if a list contains a word and then continue with the next word Mangono 2 2,455 Aug-12-2021, 04:25 PM
Last Post: palladium
  Selecting the first occurrence of a duplicate knight2000 8 5,087 May-25-2021, 01:37 AM
Last Post: knight2000
  How to multiply a matrix with herself, until the zero matrix results peanutbutterandjelly 3 3,298 May-03-2021, 06:30 AM
Last Post: Gribouillis
  Checking for one or more occurrence in a list menator01 3 2,635 May-18-2020, 06:44 AM
Last Post: DPaul

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020