Mar-25-2017, 05:53 PM
Hi guys!
I am stuck on a statistics question. I have two vectors, vector_negative and vector_positive. These vectors are populated with the number of times certain words appear in movie reviews. The wordlist itself consists of about 17,000 words and each position in the vectors refer to the same word within the wordlist. For example, wordlist[22] = 'cat', vector_negative[22] = 12, vector_positive[22] = 3. This would mean that the word 'cat' appears 12 times in negative and 3 times in my positive movie reviews.
So far so good...I need to make two histograms based on vector_negative and vector_positive and then implement a statistical method, which tests whether these histograms are statistically significant.
I have no idea how to go about that.
I'm not very familiar with histograms and how they work.
Many thanks!
I am stuck on a statistics question. I have two vectors, vector_negative and vector_positive. These vectors are populated with the number of times certain words appear in movie reviews. The wordlist itself consists of about 17,000 words and each position in the vectors refer to the same word within the wordlist. For example, wordlist[22] = 'cat', vector_negative[22] = 12, vector_positive[22] = 3. This would mean that the word 'cat' appears 12 times in negative and 3 times in my positive movie reviews.
So far so good...I need to make two histograms based on vector_negative and vector_positive and then implement a statistical method, which tests whether these histograms are statistically significant.
I have no idea how to go about that.



Many thanks!