Bottom Page

Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Help with code customization
Hello folks,

I am currently conducting some experiments in the area of Topic Modeling. For this I use a code from a corresponding tutorial. So far all of them work, but I would like to make an adjustment which I'm desperate about at the moment.


def format_topics_sentences(ldamodel=lda_model, corpus=corpus, texts=data):
    # Init output
    sent_topics_df = pd.DataFrame()

    # Get main topic in each document
    for i, row in enumerate(ldamodel[corpus]):
        row = sorted(row, key=lambda x: (x[1]), reverse=True)
        # Get the Dominant topic, Perc Contribution and Keywords for each document
        for j, (topic_num, prop_topic) in enumerate(row):
            if j == 0:  # => dominant topic
                wp = ldamodel.show_topic(topic_num)
                topic_keywords = ", ".join([word for word, prop in wp])
                sent_topics_df = sent_topics_df.append(pd.Series([int(topic_num), round(prop_topic,4), topic_keywords]), ignore_index=True)
    sent_topics_df.columns = ['Dominant_Topic', 'Perc_Contribution', 'Topic_Keywords']

    # Add original text to the end of the output
    contents = pd.Series(texts)
    sent_topics_df = pd.concat([sent_topics_df, contents], axis=1)

df_topic_sents_keywords = format_topics_sentences(ldamodel=optimal_model, corpus=corpus, texts=data)

# Format
df_dominant_topic = df_topic_sents_keywords.reset_index()
df_dominant_topic.columns = ['Document_No', 'Dominant_Topic', 'Topic_Perc_Contrib', 'Keywords', 'Text']

#Save .CSV File
#Save .XLSX File
df_dominant_topic.to_excel('OUTPUT/Topic_Overview.xlsx', 'Data_Overview')

# Show

At the moment a Pandas Dataframe is output in an Excel with 6 described columns.

Column 1: Continuous Index

Column 2: Document number

Column 3: Topics Number

Column 4: Topics Percentage

Column 5: Keywords on the corresponding topic

Column 6: Document



Only the highest percentage of x topics per document is displayed at the moment. (Screenshot_01)

Example: With document 0, topic 1 is added with 0.5491 because this value is the highest in the comparison of all percentages of x Topic_Perc_Contrib documents.

What I would like is to have a variable to determine how many topics actually exist and then output all topics with the corresponding values in connection with the documents. (Screenshot_02)

As an example here a manually created example with 4 topics but I would like to change this number manually so that the output would also change. Of course this should be repeated with all documents.



Is there someone who can quickly see through this and nicely adapt it to me?

Thank you for your Answer Huh

Top Page

Forum Jump:

Users browsing this thread: 1 Guest(s)