Posts: 24
Threads: 12
Joined: Apr 2017
Feb-14-2019, 01:26 PM
(This post was last modified: Feb-14-2019, 02:38 PM by buran.)
Dear all,
I have a pandas dataframe and I created an extra column containing the string length of a record with
essay_cols=["record0", "record1"...] # where 'recordX' is the name of a column
all_essays = all_essays[essay_cols].apply(lambda x: ' '.join(x), axis=1) # I understand this joins all the words of the different columns in a single string
all_data["essay_len"] = all_essays.apply(lambda x: len(x)) # does this give the word count of a string or the length of a string? But how can I make a field with the average word length and frequency of certain words?
I should count individual words in a string, get their length and divide them by the number of words. In the second case, I should retrieve specific words from a string and count them. And all as a lambda function. This is a bit beyond me.
Do you have any tips?...
Thank you
Luigi
Posts: 817
Threads: 1
Joined: Mar 2018
(Feb-14-2019, 01:26 PM)Gigux Wrote: I should count individual words in a string, get their length and divide them by the number of words.
from collections import Counter
avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()]
avg_words_len_in_str("abc abc abcd abcd abcde") Output: [1.5, 2.0, 5.0]
(Feb-14-2019, 01:26 PM)Gigux Wrote: In the second case, I should retrieve specific words from a string and count them.
from collections import Counter
count_spec_words = lambda s, specific: [v for k, v in Counter(s.lower().split()).items() if k in specific]
count_spec_words("abc abc abcd abcd abcde", ("abc", )) Output: [2]
(Feb-14-2019, 01:26 PM)Gigux Wrote: And all as a lambda function.
Why? you can pass to pandas apply/map methods your own functions, defined with def .
Posts: 24
Threads: 12
Joined: Apr 2017
Thank you for your answer.
I selected the relevant column of the dataframe with:
x = df['field']
where x is:
x
Out[27]:
0 about me:<br />\n<br />\ni would love to think...
1 i am a chef: this is what that means.<br />\n1...
2 i'm not ashamed of much, but writing public te...
3 i work in a library and go to school. . . read...
4 hey how's it going? currently vague on the pro...
5 i'm an australian living in san francisco, but...
and pass it to your suggested function:
avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()]
avg_words_len_in_str(x)[/icode]
but I got:
avg_words_len_in_str(x)
Traceback (most recent call last):
File "<ipython-input-28-425c839e1d1d>", line 1, in <module>
avg_words_len_in_str(x)
File "<ipython-input-21-78c2dd2340ad>", line 1, in <lambda>
avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()] `
File "~/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 5057, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'lower'
Same thing when I applied:
count_spec_words = lambda s, specific: [v for k, v in Counter(s.lower().split()).items() if k in specific]
count_spec_words(x, ("I", ))
count_spec_words(x, ("i",))
count_spec_words(x, ("i"))
Posts: 817
Threads: 1
Joined: Mar 2018
Feb-17-2019, 06:15 AM
(This post was last modified: Feb-17-2019, 06:15 AM by scidam.)
First of all, you need to clean up the text (remove all html-tags).
Did you try to use apply method?, e.g.
all_essays.loc[:, 'new_column'] = all_essays[essay_cols].apply(lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()])
|