lambda functions - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: lambda functions (/thread-16111.html) |
lambda functions - Gigux - Feb-14-2019 Dear all, I have a pandas dataframe and I created an extra column containing the string length of a record with essay_cols=["record0", "record1"...] # where 'recordX' is the name of a column all_essays = all_essays[essay_cols].apply(lambda x: ' '.join(x), axis=1) # I understand this joins all the words of the different columns in a single string all_data["essay_len"] = all_essays.apply(lambda x: len(x)) # does this give the word count of a string or the length of a string?But how can I make a field with the average word length and frequency of certain words? I should count individual words in a string, get their length and divide them by the number of words. In the second case, I should retrieve specific words from a string and count them. And all as a lambda function. This is a bit beyond me. Do you have any tips?... Thank you Luigi RE: lambda functions - scidam - Feb-15-2019 (Feb-14-2019, 01:26 PM)Gigux Wrote: I should count individual words in a string, get their length and divide them by the number of words. from collections import Counter avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()] avg_words_len_in_str("abc abc abcd abcd abcde")
(Feb-14-2019, 01:26 PM)Gigux Wrote: In the second case, I should retrieve specific words from a string and count them. from collections import Counter count_spec_words = lambda s, specific: [v for k, v in Counter(s.lower().split()).items() if k in specific] count_spec_words("abc abc abcd abcd abcde", ("abc", ))
(Feb-14-2019, 01:26 PM)Gigux Wrote: And all as a lambda function. Why? you can pass to pandas apply/map methods your own functions, defined with def .
RE: lambda functions - Gigux - Feb-15-2019 Thank you for your answer. I selected the relevant column of the dataframe with: x = df['field'] where x is: x Out[27]: 0 about me:<br />\n<br />\ni would love to think... 1 i am a chef: this is what that means.<br />\n1... 2 i'm not ashamed of much, but writing public te... 3 i work in a library and go to school. . . read... 4 hey how's it going? currently vague on the pro... 5 i'm an australian living in san francisco, but... and pass it to your suggested function: avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()] avg_words_len_in_str(x)[/icode] but I got: avg_words_len_in_str(x) Traceback (most recent call last):
File "<ipython-input-28-425c839e1d1d>", line 1, in <module> avg_words_len_in_str(x)
File "<ipython-input-21-78c2dd2340ad>", line 1, in <lambda> avg_words_len_in_str = lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()] `
File "~/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 5057, in __getattr__ return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'lower' Same thing when I applied: count_spec_words = lambda s, specific: [v for k, v in Counter(s.lower().split()).items() if k in specific] count_spec_words(x, ("I", )) count_spec_words(x, ("i",)) count_spec_words(x, ("i"))
RE: lambda functions - scidam - Feb-17-2019 First of all, you need to clean up the text (remove all html-tags). Did you try to use apply method?, e.g. all_essays.loc[:, 'new_column'] = all_essays[essay_cols].apply(lambda s: [len(k) / v for k,v in Counter(s.lower().split()).items()]) |