Python Forum

Dear all,
I have a pandas dataframe and I created an extra column containing the string length of a record with

essay_cols=["record0", "record1"...] # where 'recordX' is the name of a column
all_essays = all_essays[essay_cols].apply(lambda x: ' '.join(x), axis=1) # I understand this joins all the words of the different columns in a single string
all_data["essay_len"] = all_essays.apply(lambda x: len(x)) # does this give the word count of a string or the length of a string?

But how can I make a field with the average word length and frequency of certain words?

I should count individual words in a string, get their length and divide them by the number of words. In the second case, I should retrieve specific words from a string and count them. And all as a lambda function. This is a bit beyond me.

Do you have any tips?...

Thank you

Luigi

(Feb-14-2019, 01:26 PM)Gigux Wrote: [ -> ]I should count individual words in a string, get their length and divide them by the number of words.

from collections import Counter
avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()]
avg_words_len_in_str("abc abc abcd abcd abcde")

Output:
[1.5, 2.0, 5.0]

(Feb-14-2019, 01:26 PM)Gigux Wrote: [ -> ]In the second case, I should retrieve specific words from a string and count them.

from collections import Counter
count_spec_words = lambda s, specific: [v for k, v in Counter(s.lower().split()).items() if k in specific]
count_spec_words("abc abc abcd abcd abcde", ("abc", ))

Output:
[2]

(Feb-14-2019, 01:26 PM)Gigux Wrote: [ -> ]And all as a lambda function.

Why? you can pass to pandas apply/map methods your own functions, defined with def.

Thank you for your answer.
I selected the relevant column of the dataframe with:

x = df['field']
where x is:
x
Out[27]:
0 about me:<br />\n<br />\ni would love to think...
1 i am a chef: this is what that means.<br />\n1...
2 i'm not ashamed of much, but writing public te...
3 i work in a library and go to school. . . read...
4 hey how's it going? currently vague on the pro...
5 i'm an australian living in san francisco, but...

and pass it to your suggested function:

avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()]
avg_words_len_in_str(x)[/icode]

but I got:
avg_words_len_in_str(x)
Traceback (most recent call last):

File "<ipython-input-28-425c839e1d1d>", line 1, in <module>
avg_words_len_in_str(x)

File "<ipython-input-21-78c2dd2340ad>", line 1, in <lambda>
avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()]`

File "~/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 5057, in __getattr__
return object.__getattribute__(self, name)

AttributeError: 'Series' object has no attribute 'lower'

Same thing when I applied:
count_spec_words = lambda s, specific: [v for k, v in Counter(s.lower().split()).items() if k in specific]
count_spec_words(x, ("I", ))
count_spec_words(x, ("i",))
count_spec_words(x, ("i"))

First of all, you need to clean up the text (remove all html-tags).

Did you try to use apply method?, e.g.

all_essays.loc[:, 'new_column'] = all_essays[essay_cols].apply(lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()])

Gigux

scidam

Gigux

scidam