Python Forum

Full Version: Linguistic measures on corporate filings
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I have an excel file which includes txt file name and year. I am trying to conduct textual analysis for corporate disclosure (a bunch of txt files), but having difficulty in generating the following measures. Can anyone provide some guidance? Thank you!!

1. The number of words in sentences that include at least one 4-word phrase that is shared by at least 75% of all firms in a given fiscal year.

2. The number of words in sentences that include at least one 8-word phrase that is identical to a phrase used in the prior year’s 10-K.
Hi,

I have an excel file which includes txt file name and year. I am trying to conduct textual analysis for corporate disclosure (a bunch of txt files, firm-year level), but having difficulty in generating the following measures. Can anyone provide some guidance? Thank you!!!

1. The number of words in sentences that include at least one 4-word phrase that is shared by at least 75% of all firms in a given fiscal year.

2. The number of words in sentences that include at least one 8-word phrase that is identical to a phrase used in the prior year’s 10-K.
Is this a homework assignment?
You should show what you have attempted, and where you are having difficulty,.
Please don't double post.
same post as https://python-forum.io/Thread-Textual-a...-txt-files