(Jan-05-2018, 09:35 PM)karaokelove Wrote: Is there a good way to reply or tag someone without quoting their entire message?There is a
Quote highlighted text
button to the left of Quote
button. (Jan-05-2018, 05:45 PM)karaokelove Wrote: For instance, we have a lot of "If the model is a [car-model], what is the make?". Basically, questions that are totally different, but contain 90%+ similar wording. A simple percent-based comparison algorithm would flag these as duplicates and delete usable questions.You may need to read the Excel file and to compare stuff so you get what you want,if @Gribouillis method can be used is for sure fast.
The binary Excel format may not make much sense before read it.
Python has several good libraries for this eg Pandas, openpyxl, pyexcel.
I like Pandas because when read in a file format,Pandas has a lot power and several way to remove duplicates.
Example
email_excel.xlsx
,there is two duplicate emails.Output:name message email
John Smith Hello Mr. Smith [email protected]
John Doe Hello Mr. Doe [email protected]
Ms. Foo Hello Ms. Foo [email protected]
Fire up Pandas and fix it.G:\Anaconda3 λ python -m ptpython >>> import pandas as pd >>> df = pd.read_excel('email_excel.xlsx', sheetname=0) >>> df.head() Name message email 0 John Smith Hello Mr. Smith [email protected] 1 John Doe Hello Mr. Doe [email protected] 2 Ms. Foo Hello Ms. Foo [email protected] >>> remove_dup = df[~df.stack().duplicated().unstack().any(1)] >>> remove_dup Name message email 0 John Smith Hello Mr. Smith [email protected] 1 John Doe Hello Mr. Doe [email protected] >>> remove_dup.to_excel('email_nodup.xlsx')No have this in
email_nodup.xlsx
.Output:Name message email
John Smith Hello Mr. Smith [email protected]
John Doe Hello Mr. Doe [email protected]