Jun-19-2023, 03:20 PM
I have a database with 10,000 rows and 3,000 columns. My task is detect abnormal data when compare line by line, item by item.
We can't use this to a very big data. My idea is detect abnormal first then plot figure to showed it.
But, I don't know how to do this.
Who has any good ideas that can help me so much.
Thanks for reading!
import matplotlib.pyplot as plt import pandas as pd file = "E:/Document/python/Book1.xlsx" df = pd.read_excel(file) outliers = dict( marker ="o",mfc= "w",mec = "r",mew=1.5) medianprops = dict(lw=1.5,c = "b") meanprops = dict(marker="+",ms=10, mew=3, c = "g") whiskerprops = dict(ls="--",lw=1.5,c = "k") capprops = dict(lw=1.5,c = "orange") boxprops = dict(lw=1.5) cols = df.columns[3:] for col in cols: df.boxplot(col,by="Tester", grid= False, showfliers=True, flierprops = outliers, boxprops=boxprops,medianprops=medianprops, showmeans=True, meanprops=meanprops, whiskerprops= whiskerprops, capprops=capprops) plt.suptitle("") plt.xlabel("") plt.show()My code showed abnormal data with a small data like this:
We can't use this to a very big data. My idea is detect abnormal first then plot figure to showed it.
But, I don't know how to do this.



Who has any good ideas that can help me so much.
Thanks for reading!
Attached Files