![]() |
Statistics: Two histograms based on word frequency vectors - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Statistics: Two histograms based on word frequency vectors (/thread-2570.html) |
Statistics: Two histograms based on word frequency vectors - fancy_panther - Mar-25-2017 Hi guys! I am stuck on a statistics question. I have two vectors, vector_negative and vector_positive. These vectors are populated with the number of times certain words appear in movie reviews. The wordlist itself consists of about 17,000 words and each position in the vectors refer to the same word within the wordlist. For example, wordlist[22] = 'cat', vector_negative[22] = 12, vector_positive[22] = 3. This would mean that the word 'cat' appears 12 times in negative and 3 times in my positive movie reviews. So far so good...I need to make two histograms based on vector_negative and vector_positive and then implement a statistical method, which tests whether these histograms are statistically significant. I have no idea how to go about that. ![]() ![]() ![]() Many thanks! RE: Statistics: Two histograms based on word frequency vectors - Ofnuts - Mar-26-2017 Neither are we. This is a Python forum, not a statistics forum. Now, in your histograms, what would be the values, and what would be the variable. RE: Statistics: Two histograms based on word frequency vectors - zivoni - Mar-27-2017 You can plot histogram with mathplotlib.pyplot.hist . As histogram shows distribution of numerical data, you will need to convert your vector and replace value counts with repeated values (instead v[22]=3 you need 22,22,22), numpy.repeat can do it. If you want to get "comparable" histograms, you should use same binning (bins parameter) for both negative and positive vectors.Example histogram: import matplotlib.pyplot as plt from random import random data = [random() for x in range(50)] plt.hist(data, edgecolor='black') plt.show()[Image: k1Tf0Ra.png] "Testing significance of histograms" does not make much sense, maybe it means some test used on binned data? |