Python Forum
Help with code customization
Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help with code customization
#1
Hello folks,



I am currently conducting some experiments in the area of Topic Modeling. For this I use a code from a corresponding tutorial. So far all of them work, but I would like to make an adjustment which I'm desperate about at the moment.

#SOURCE: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/

def format_topics_sentences(ldamodel=lda_model, corpus=corpus, texts=data):
    # Init output
    sent_topics_df = pd.DataFrame()

    # Get main topic in each document
    for i, row in enumerate(ldamodel[corpus]):
        row = sorted(row, key=lambda x: (x[1]), reverse=True)
        # Get the Dominant topic, Perc Contribution and Keywords for each document
        for j, (topic_num, prop_topic) in enumerate(row):
            if j == 0:  # => dominant topic
                wp = ldamodel.show_topic(topic_num)
                topic_keywords = ", ".join([word for word, prop in wp])
                sent_topics_df = sent_topics_df.append(pd.Series([int(topic_num), round(prop_topic,4), topic_keywords]), ignore_index=True)
            else:
                break
    sent_topics_df.columns = ['Dominant_Topic', 'Perc_Contribution', 'Topic_Keywords']

    # Add original text to the end of the output
    contents = pd.Series(texts)
    sent_topics_df = pd.concat([sent_topics_df, contents], axis=1)
    return(sent_topics_df)


df_topic_sents_keywords = format_topics_sentences(ldamodel=optimal_model, corpus=corpus, texts=data)

# Format
df_dominant_topic = df_topic_sents_keywords.reset_index()
df_dominant_topic.columns = ['Document_No', 'Dominant_Topic', 'Topic_Perc_Contrib', 'Keywords', 'Text']

#Save .CSV File
df_dominant_topic.to_csv('OUTPUT/Topic_Overview.csv')
#Save .XLSX File
df_dominant_topic.to_excel('OUTPUT/Topic_Overview.xlsx', 'Data_Overview')

# Show
df_dominant_topic.head()
At the moment a Pandas Dataframe is output in an Excel with 6 described columns.

Column 1: Continuous Index

Column 2: Document number

Column 3: Topics Number

Column 4: Topics Percentage

Column 5: Keywords on the corresponding topic

Column 6: Document

[Image: 1zyd7jk.jpg]
Screenshot:01

Only the highest percentage of x topics per document is displayed at the moment. (Screenshot_01)

Example: With document 0, topic 1 is added with 0.5491 because this value is the highest in the comparison of all percentages of x Topic_Perc_Contrib documents.



What I would like is to have a variable to determine how many topics actually exist and then output all topics with the corresponding values in connection with the documents. (Screenshot_02)



As an example here a manually created example with 4 topics but I would like to change this number manually so that the output would also change. Of course this should be repeated with all documents.



[Image: 2zebcwm.png]
Screenshot_02


Is there someone who can quickly see through this and nicely adapt it to me?

Thank you for your Answer Huh
Reply


Messages In This Thread
Help with code customization - by Nicson - Feb-05-2019, 11:32 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020