Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
lambda functions
#1
Dear all,
I have a pandas dataframe and I created an extra column containing the string length of a record with

essay_cols=["record0", "record1"...] # where 'recordX' is the name of a column
all_essays = all_essays[essay_cols].apply(lambda x: ' '.join(x), axis=1) # I understand this joins all the words of the different columns in a single string
all_data["essay_len"] = all_essays.apply(lambda x: len(x)) # does this give the word count of a string or the length of a string?
But how can I make a field with the average word length and frequency of certain words?

I should count individual words in a string, get their length and divide them by the number of words. In the second case, I should retrieve specific words from a string and count them. And all as a lambda function. This is a bit beyond me.

Do you have any tips?...

Thank you

Luigi
buran wrote Feb-14-2019, 02:38 PM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
Quote
#2
(Feb-14-2019, 01:26 PM)Gigux Wrote: I should count individual words in a string, get their length and divide them by the number of words.

from collections import Counter
avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()]
avg_words_len_in_str("abc abc abcd abcd abcde")
Output:
[1.5, 2.0, 5.0]
(Feb-14-2019, 01:26 PM)Gigux Wrote: In the second case, I should retrieve specific words from a string and count them.


from collections import Counter
count_spec_words = lambda s, specific: [v for k, v in Counter(s.lower().split()).items() if k in specific]
count_spec_words("abc abc abcd abcd abcde", ("abc", ))
Output:
[2]
(Feb-14-2019, 01:26 PM)Gigux Wrote: And all as a lambda function.

Why? you can pass to pandas apply/map methods your own functions, defined with def.
Quote
#3
Thank you for your answer.
I selected the relevant column of the dataframe with:

x = df['field']
where x is:
x
Out[27]:
0 about me:<br />\n<br />\ni would love to think...
1 i am a chef: this is what that means.<br />\n1...
2 i'm not ashamed of much, but writing public te...
3 i work in a library and go to school. . . read...
4 hey how's it going? currently vague on the pro...
5 i'm an australian living in san francisco, but...

and pass it to your suggested function:

avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()]
avg_words_len_in_str(x)[/icode]

but I got:
avg_words_len_in_str(x)
Traceback (most recent call last):

File "<ipython-input-28-425c839e1d1d>", line 1, in <module>
avg_words_len_in_str(x)

File "<ipython-input-21-78c2dd2340ad>", line 1, in <lambda>
avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()]`

File "~/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 5057, in __getattr__
return object.__getattribute__(self, name)

AttributeError: 'Series' object has no attribute 'lower'

Same thing when I applied:
count_spec_words = lambda s, specific: [v for k, v in Counter(s.lower().split()).items() if k in specific]
count_spec_words(x, ("I", ))
count_spec_words(x, ("i",))
count_spec_words(x, ("i"))
Quote
#4
First of all, you need to clean up the text (remove all html-tags).

Did you try to use apply method?, e.g.

all_essays.loc[:, 'new_column'] = all_essays[essay_cols].apply(lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()])
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Newbie question for using map, lambda zydjohn 2 896 Dec-09-2017, 07:18 PM
Last Post: zydjohn
  Filter and lambda question smw10c 3 2,961 Apr-27-2017, 04:44 PM
Last Post: zivoni

Forum Jump:


Users browsing this thread: 1 Guest(s)