Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Word co-occurrence matrix for a string (NLP)
#1
I need to create a word co-occurrence matrix that shows how many times one word in a vocabulary precedes all other words in the vocabulary for a given corpus.

The input sentence can be tokenized or not. The method has to be scalable to a sentence that is millions of words long, so much be efficient.

test_sent = ['hello', 'i', 'am', 'hello', 'i', 'dont', 'want', 'to', 'i', 'dont']
I would want this to give an output of:

Output:
[[0. 2. 0. 0. 0. 0.] [0. 0. 0. 1. 0. 2.] [0. 1. 0. 0. 0. 0.] [0. 0. 0. 0. 1. 0.] [1. 0. 0. 0. 0. 0.] [0. 0. 1. 0. 0. 0.]]
For example, the 2 in (row1, col2) shows that 'i' follows 'hello' twice.

How can I implement something like this using sklearn?
Quote
#2
Take a look at NLTK: https://www.nltk.org/
Quote
#3
Here's something that might help: https://stackoverflow.com/questions/3733...t-words-in
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Checking for one or more occurrence in a list menator01 3 159 May-18-2020, 06:44 AM
Last Post: DPaul
  Python Speech recognition, word by word AceScottie 6 7,750 Apr-12-2020, 09:50 AM
Last Post: vinayakdhage
  filter just with the string word jacklee26 2 378 Feb-03-2020, 03:25 PM
Last Post: snippsat
  Reverse the string word sneha 2 469 Dec-12-2019, 03:37 AM
Last Post: sneha
  Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup soothsayerpg 5 1,318 Oct-27-2019, 09:53 AM
Last Post: newbieAuggie2019
  print a word after specific word search evilcode1 8 451 Oct-22-2019, 08:08 AM
Last Post: newbieAuggie2019
  count occurrence of numbers in a sequence and return corresponding value python_newbie09 6 625 May-20-2019, 06:33 PM
Last Post: python_newbie09
  Word replace with string dabozz 1 538 Feb-13-2019, 03:11 PM
Last Post: ichabod801
  Replacing all letters in a string apart from the first letter in each word Carbonix 9 1,294 Jan-17-2019, 09:29 AM
Last Post: buran
  Creating list out of the first letter of every word in a string Drone4four 5 10,657 Oct-05-2018, 09:42 PM
Last Post: volcano63

Forum Jump:


Users browsing this thread: 1 Guest(s)