1、 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Data Mining: DataLecture Notes for Chapter 2Introduction to Data MiningbyTan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 What is Data?l Collection of data objects and their attributesl An attribute is a
2、property or characteristic of an object Examples: eye color of a person, temperature, etc. Attribute is also known as variable, field, characteristic, or featurel A collection of attributes describe an object Object is also known as record, point, case, sample, entity, or instanceAttributesObjects T
3、an,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Attribute ValueslAttribute values are numbers or symbols assigned to an attributelDistinction between attributes and attribute values Same attribute can be mapped to different attribute valuesu Example: height can be measured in feet or met
4、ers Different attributes can be mapped to the same set of valuesu Example: Attribute values for ID and age are integersu But properties of attribute values can be different ID has no limit but age has a maximum and minimum value Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Measuremen
5、t of Length l The way you measure an attribute is somewhat may not match the attributes properties. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Types of Attributes l There are different types of attributes Nominal(标称 )u Examples: ID numbers, eye color, zip codes Ordinal(序数 )u Exampl
6、es: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height in tall, medium, short Interval(区间 )u Examples: calendar dates, temperatures in Celsius or Fahrenheit. Ratio(比率 )u Examples: temperature in Kelvin, length, time, counts Tan,Steinbach, Kumar Introduction to Data Mining 4/
7、18/2004 6 Properties of Attribute Values lThe type of an attribute depends on which of the following properties it possesses: Distinctness: = Order: Addition: + - Multiplication: * / Nominal attribute: distinctness Ordinal attribute: distinctness & order Interval attribute: distinctness, order & add
8、ition Ratio attribute: all 4 propertiesAttribute TypeDescription Examples OperationsNominal The values of a nominal attribute are just different names, i.e., nominal attributes provide only enough information to distinguish one object from another. (=, )zip codes, employee ID numbers, eye color, sex
9、: male, femalemode, entropy, contingency correlation, 2 testOrdinal The values of an ordinal attribute provide enough information to order objects. ()hardness of minerals, good, better, best, grades, street numbersmedian, percentiles, rank correlation, run tests, sign testsInterval For interval attr
10、ibutes, the differences between values are meaningful, i.e., a unit of measurement exists. (+, - )calendar dates, temperature in Celsius or Fahrenheitmean, standard deviation, Pearsons correlation, t and F testsRatio For ratio variables, both differences and ratios are meaningful. (*, /)temperature
11、in Kelvin, monetary quantities, counts, age, mass, length, electrical currentgeometric mean, harmonic mean, percent variationAttribute LevelTransformation CommentsNominal Any permutation of values If all employee ID numbers were reassigned, would it make any difference?Ordinal An order preserving ch
12、ange of values, i.e., new_value = f(old_value) where f is a monotonic function.An attribute encompassing the notion of good, better best can be represented equally well by the values 1, 2, 3 or by 0.5, 1, 10.Interval new_value =a * old_value + b where a and b are constantsThus, the Fahrenheit and Ce
13、lsius temperature scales differ in terms of where their zero value is and the size of a unit (degree).Ratio new_value = a * old_value Length can be measured in meters or feet. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Discrete and Continuous Attributes l Discrete Attribute Has onl
14、y a finite or countably infinite set of values Examples: zip codes, counts, or the set of words in a collection of documents Often represented as integer variables. Note: binary attributes are a special case of discrete attributes l Continuous Attribute Has real numbers as attribute values Examples:
15、 temperature, height, or weight. Practically, real values can only be measured and represented using a finite number of digits. Continuous attributes are typically represented as floating-point variables. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Types of data sets l Record Data Matrix Document Data Transaction Datal Graph World Wide Web Molecular Structuresl Ordered Spatial Data Temporal Data Sequential Data Genetic Sequence Data