Python Forum
Statistics: Two histograms based on word frequency vectors
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Statistics: Two histograms based on word frequency vectors
#1
Hi guys!

I am stuck on a statistics question. I have two vectors, vector_negative and vector_positive. These vectors are populated with the number of times certain words appear in movie reviews. The wordlist itself consists of about 17,000 words and each position in the vectors refer to the same word within the wordlist. For example, wordlist[22] = 'cat', vector_negative[22] = 12, vector_positive[22] = 3. This would mean that the word 'cat' appears 12 times in negative and 3 times in my positive movie reviews.

So far so good...I need to make two histograms based on vector_negative and vector_positive and then implement a statistical method, which tests whether these histograms are statistically significant.

I have no idea how to go about that. Cry  I'm not very familiar with histograms and how they work. Huh Huh 

Many thanks!
Reply
#2
Neither are we. This is a Python forum, not a statistics forum.

Now, in your histograms, what would be the values, and what would be the variable.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#3
You can plot histogram with mathplotlib.pyplot.hist. As histogram shows distribution of numerical data, you will need to convert your vector and replace value counts with repeated values (instead v[22]=3 you need 22,22,22), numpy.repeat can do it. If you want to get "comparable" histograms, you should use same binning (bins parameter) for both negative and positive vectors.

Example histogram:
import matplotlib.pyplot as plt
from random import random

data = [random() for x in range(50)]
plt.hist(data, edgecolor='black')
plt.show()
[Image: k1Tf0Ra.png]
"Testing significance of histograms" does not make much sense, maybe it means some test used on binned data?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Slicing using vectors paul18fr 4 2,885 Nov-16-2019, 10:43 AM
Last Post: paul18fr
  Scaling of mapped vectors? sricha1217 1 2,359 Apr-10-2018, 10:26 AM
Last Post: sricha1217
  numpy and statistics module baronmontesqieu 2 7,955 Sep-18-2017, 12:03 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020