Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Excel Question
#12
(Jan-05-2018, 09:35 PM)karaokelove Wrote: Is there a good way to reply or tag someone without quoting their entire message?
There is a Quote highlighted text button to the left of Quote button.
(Jan-05-2018, 05:45 PM)karaokelove Wrote: For instance, we have a lot of "If the model is a [car-model], what is the make?". Basically, questions that are totally different, but contain 90%+ similar wording. A simple percent-based comparison algorithm would flag these as duplicates and delete usable questions.
You may need to read the Excel file and to compare stuff so you get what you want,if @Gribouillis method can be used is for sure fast.
The binary Excel format may not make much sense before read it.
Python has several good libraries for this eg Pandas, openpyxl, pyexcel.

I like Pandas because when read in a file format,Pandas has a lot power and several way to remove duplicates.
Example email_excel.xlsx,there is two duplicate emails.
Output:
name message email John Smith Hello Mr. Smith [email protected] John Doe Hello Mr. Doe [email protected] Ms. Foo Hello Ms. Foo [email protected]
Fire up Pandas and fix it.
G:\Anaconda3
λ python -m ptpython
>>> import pandas as pd

>>> df = pd.read_excel('email_excel.xlsx', sheetname=0)
>>> df.head()
         Name          message            email
0  John Smith  Hello Mr. Smith  [email protected]
1    John Doe    Hello Mr. Doe  [email protected]
2     Ms. Foo    Hello Ms. Foo  [email protected]


>>> remove_dup = df[~df.stack().duplicated().unstack().any(1)]
>>> remove_dup
         Name          message            email
0  John Smith  Hello Mr. Smith  [email protected]
1    John Doe    Hello Mr. Doe  [email protected]

>>> remove_dup.to_excel('email_nodup.xlsx')
No have this in email_nodup.xlsx.
Output:
Name message email John Smith Hello Mr. Smith [email protected] John Doe Hello Mr. Doe [email protected]
Reply


Messages In This Thread
Excel Question - by karaokelove - Jan-05-2018, 05:45 PM
RE: Excel Question - by hshivaraj - Jan-05-2018, 06:04 PM
RE: Excel Question - by karaokelove - Jan-05-2018, 06:08 PM
RE: Excel Question - by Povellesto - Jan-05-2018, 06:24 PM
RE: Excel Question - by karaokelove - Jan-05-2018, 06:31 PM
RE: Excel Question - by Gribouillis - Jan-05-2018, 07:41 PM
RE: Excel Question - by karaokelove - Jan-05-2018, 09:03 PM
RE: Excel Question - by Gribouillis - Jan-05-2018, 09:27 PM
RE: Excel Question - by karaokelove - Jan-05-2018, 09:35 PM
RE: Excel Question - by Gribouillis - Jan-05-2018, 09:38 PM
RE: Excel Question - by karaokelove - Jan-05-2018, 09:46 PM
RE: Excel Question - by snippsat - Jan-05-2018, 11:15 PM
RE: Excel Question - by karaokelove - Jan-05-2018, 11:17 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020