Ent crypto

Comment

Author: Admin | 2025-04-28

Presentation on theme: "CS690L Data Mining: Classification"— Presentation transcript: 1 CS690L Data Mining: ClassificationReference: J. Han and M. Kamber, Data Mining: Concepts and Techniques Yong Fu: 2 Classification Classificationdetermine the class or category of an object based on its properties Two stages Learning stage: construction of a classification function or model Classification stage: prediction of classes of objects using the function or model Tools for classification Decision tree Bayesian networks Neural networks Regression Problem Given a set of objects whose classes are known called training set derive a classification model which can correctly classify future objects 3 Classification: Decision TreeClassification model: decision tree Method: Top Down Induction of Decision Trees Data representation: Every object is represented by a vector of values on a fixed set of attributes. If a relation is defined on the attributes an object is a tuple in the relation. A special attribute called class attribute tells the group/category the object belongs to which is the dependent attribute to be predicted Learning stage: Induction of a decision tree that classifies the training set Classification stage: The decision tree will classify new objects. 4 An Example DefinitionsA decision tree is a tree in which each non-leaf node corresponds to an attribute of objects and each branch from a non-leaf node to its children represents a value of the attribute. Each leaf node in a decision tree is labeled by a class of the objects Classification using decision trees Starting from the root an object follows the path to a leaf node which gives the class of the object taking branches according to its values along the way Alternative view of decision tree Node/Branch: discrimination test Node: subset of objects satisfying test 5 Decision Tree InductionInduction of decision trees: Starting from a training set recursively selecting attributes to split nodes thus partitioning the objects Termination condition: when to stop splitting a node Selection of attribute for splitting testing: Best split A measure for splitting? ID3 algorithm Selection: attribute information gain Termination condition: all objects are in a single class 6 ID3 Algorithm 7 ID3 Algorithm (Cont) 8 Example 9 Example: Decision Tree BuildingInformation content of C (Expected information for the classification) I(P) = Ent(C)= - {(9/14) log2 ( 9/14) + (5/14)log2 (5/14)} = 0.940 For each Attribute Ai Step 1: Compute the entropy for a given attribute Ai Ent(Sunny) = - ((2/5 log2 2/5) + (3/5 log2 3/5)) = 0.97 Ent(Rainy) = 0.97 Ent(Overcast) = 0 Step 2: Compute the Entropy (expected information based on the partitioning into Subsets by A) Ent(C, Outlook) = (5/14)Ent(Sunny) + (5/14)Ent(Rainy) + (4/14)Ent(Overcast) = (5/14)(0.97) + (5/14)(0.97) + (4/14)(0) = 0.69 Step 3: Gain(C, Outlook) = Ent(C) – Ent(C, Outlook) = – 0.69

Add Comment