1、Anomaly Detection: A introduction Source of slides:Tutorial At American Statistical Association (ASA2008)Jiawei Han-data mining : concepts and techniquesTutorial at the European Conference on Principles and Practice of Knowledge Discovery in DatabasesSpeaker: Wentao LiOutline Definition Application
2、MethodsLimited time, So I just draw the picture of anomaly detection, for more detail, please turn to the paper for help.What are Anomalies? Anomaly is a pattern in the data that does not conform to the expected behavior Anomaly is A data object that deviates significantly from the normal objects as
3、 if it were generated by a different mechanism Also referred to as outliers, exceptions, peculiarities, surprises, etc. Anomalies translate to significant (often critical) real life entities Cyber intrusions Credit card fraud Faults in mechanical systemsRelated problems Outliers are different from t
4、he noise data Noise is random error or variance in a measured variableNoise should be removed before outlier detectionOutliers are interesting: It violates the mechanism that generates the normal data Outlier detection vs. novelty detection: early stage, outlier; but later merged into the modelKey C
5、hallenges Defining a representative normal region is challenging The boundary between normal and outlying behavior is often not precise Availability of labeled data for training/validation The exact notion of an outlier is different for different application domains Data might contain noise Normal b
6、ehavior keeps evolving Appropriate selection of relevant features MapRelated areas(theory)Application(practice) Problem formulation Detection effect +Aspects of Anomaly Detection Problem Nature of input data What is the characteristic of input data Availability of supervision Number of label Type of
7、 anomaly: point, contextual, structural Type of anomaly Output of anomaly detection Score vs label Evaluation of anomaly detection techniques What kind of detection is goodInput Data Most common form of data handled by anomaly detection techniques is Record DataUnivariateMultivariateInput Data Most common form of data handled by anomaly detection techniques is Record DataUnivariateMultivariateInput Data Nature of Attributes Nature of attributesBinaryCategoricalContinuousHybridcategorical continuous continuouscategorical binary