Saturday, October 15, 2016

Pattern Recognition


A flow chart of the process of classifier design


Probabilistic distance measures


Dissimilarity measures for numeric variables (between x and y)


Linear regression


Statistical Pattern Recognition


Stages in a pattern recognition problem

1. Formulation of the problem: gaining a clear understanding of the aims of the investigation and planning the remaining stages.

2. Data collection: making measurements on appropriate variables and recording details of the data collection procedure (ground truth).

3. Initial examination of the data: checking the data, calculating summary statistics and producing plots in order to get a feel for the structure.

4. Feature selection or feature extraction: selecting variables from the measured set that are appropriate for the task. These new variables may be obtained by a linear or nonlinear transformation of the original set (feature extraction). To some extent, the division of feature extraction and classification is artificial.

5. Unsupervised pattern classification or clustering. This may be viewed as exploratory data analysis and it may provide a successful conclusion to a study. On the other hand, it may be a means of preprocessing the data for a supervised classification procedure.

6. Apply discrimination or regression procedures as appropriate. The classifier is de- signed using a training set of exemplar patterns.

7. Assessment of results. This may involve applying the trained classifier to an indepen- dent test set of labelled patterns.

8. Interpretation.

Monday, October 03, 2016

Tree replication problem. The same subtree can appear at different branches.

This makes the decision tree more complex than necessary and perhaps more difficult to interpret. Such a situation can arise from decision tree implementations that rely on a single attribute test condition at each internal node. Since most of the decision tree algorithms use a divide-and-conquer partitioning strategy, the same test condition can be applied to different parts of the attribute space, thus leading to the subtree replication problem.


Comparison among the impurity measures for binary classification problems


Test condition for continuous attributes


Test conditions for nominal attributes


Classifying an unlabeled vertebrate. The dashed lines represent the outcomes of applying various attribute test conditions on the unlabeled vertebrate. The vertebrate is eventually assigned to the Non-mammal class


Training and test error rates


Example of a decision tree and its decision boundaries for a two-dimensional data set