- 9.1. Introduction
- 9.2. Using sdroc objects
- 9.2.1. Setting the current operating point
- 9.2.2. Performing decisions based on ROC
- 9.2.3. Interactive visualization of ROC decisions
- 9.2.4. Interactive visualization of confusion matrices
- 9.2.5. Accessing operating points
- 9.2.6. Accessing the estimated performances
- 9.2.7. Using different performance measures
- 9.3. Multi-class ROC Analysis
- 9.4. ROC Analysis using target thresholding (detection)
- 9.5. Selecting application-specific operating point
- 9.5.1. Applying performance constraints
- 9.5.2. Constraints using the low-level methods
- 9.5.3. Cost-sensitive optimization
- 9.5.4. Applying multiple performance constraints
- 9.6. Rejection
9.1. Introduction ↩
In Chapter 6, we saw that a trained classifier may provide decisions at different operating points. Now we will learn to use a powerful tool helping us to find desirable operating points in our applications: The ROC analysis.
ROC abbreviation stands for the Receiver Operating Characteristic.
The basic idea of ROC analysis is very simple: For a given trained classifier and a labeled test set define a set of possible operating points and estimate different type of errors at these points.
To optimize our classifier, we will need the following:
- the trained classifier capable of returning soft outputs
- knowledge on the type of soft outputs (similarity or distance)
- labeled test set
ROC analysis works in three steps:
- define admissible operating points
- measure classifier performance at these points
- select an operating point of interest based on application requirements
Let us consider a two-class problem with apple and banana classes. We
will select the two classes of interest (data set a contains also the
third class with outliers called stone).
>> load fruit; b=a(:,:,[1 2]);
>> [tr,ts]=randsubset(b,0.5) % split the data into training and test set
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
We train our classifier on the training set. We use a mixture of Gaussians with 5 components per class
>> p=sdmixture(tr,'comp',5,'iter',10)
[class 'apple' EM:.......... 5 comp] [class 'banana' EM:.......... 5 comp]
Mixture of Gaussians pipeline 2x2 2 classes, 10 components (sdp_normal)
Now we can estimate soft outputs of our mixture model on the test set. For a mixture model, the soft outputs are class-conditional probability densities:
>> out=ts*p
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
The two outputs in out represent our two classes:
>> out.lab.list
sdlist (2 entries)
ind name
1 apple
2 banana
On the soft outputs, we perform ROC analysis using the sdroc command:
>> r=sdroc(out)
ROC (2001 w-based op.points, 3 measures), curop: 1042
est: 1:err(apple)=0.01, 2:err(banana)=0.04, 3:mean-error [0.50,0.50]=0.02
sdroc defined a set of operating points, estimated three error measures
(error on each class and the mean error), and fixed the "current" operating
point to minimize the mean error.
We may visualize the ROC using the sddrawroc command:
>> sddrawroc(r)

The ROC plot shows the first two measures in r, namely the error on apple
class on the horizontal axis and the error on banana on the vertical axis.
In the ROC plot, each blue marker represents one operating point. The
current operating point is denoted by the thick black marker. When moving
over the plot, the gray cursor marker follows the closest operating
point. The figure title then shows the number of the cursor operating point
and the values of the errors.
Note that selecting an operating point is a matter of trade-off. When we try to minimize error on one class, the error on the other at some moment inevitably increases. Only in the situation without class overlap, we could select an optimal solution. In real-world pattern recognition projects, we do need to accept certain level of errors. ROC analysis allows us to carefuly choose the acceptable trade-off.
9.2. Using sdroc objects ↩
9.2.1. Setting the current operating point ↩
The current operating point may be set interactively in the sddrawroc
figure by clicking the left mouse button. By pressing the s key (save),
we may store the current operating point index back to the sdroc object
in the Matlab workspace. Simply put the name of the sdroc variable in the
dialog box and press OK.
In this way, we can set the current operating point also to other PRSD
Studio objects, such as sdops sets of operating points, pipelines
or custom algorithms discussed in Chapter
9 or even PRTools sddecide mappings.
Alternatively, we may set the curernt operating point manually using the
setcurop function on Matlab prompt:
>> r2=setcurop(r,208)
ROC (2001 w-based op.points, 3 measures), curop: 208
est: 1:err(apple)=0.12, 2:err(banana)=0.02, 3:mean-error [0.50,0.50]=0.07
9.2.2. Performing decisions based on ROC ↩
The sdroc object may be directly concatenated with the model
pipeline via the * operator. This will add the decision action with all the ROC operating points:
>> pd=p*r
sequential pipeline 2x1 ''
1 sdp_normal 2x2 2 classes, 10 components
2 sdp_decide 2x1 Weight-based decision (2 classes, 2001 ops) at op 1042
The pd pipeline returns decisions at the current operating point:
>> sdconfmat(ts.lab,ts*pd)
ans =
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 165 1 | 166
banana | 7 159 | 166
-------------------------------------
Totals | 172 160 | 332
To relate the confusion matrix to the error measures in the ROC object, we may better use error normalization:
>> sdconfmat(ts.lab,ts*pd,'norm')
ans =
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 0.994 0.006 | 1.00
banana | 0.042 0.958 | 1.00
-------------------------------------
>> r
ROC (2001 w-based op.points, 3 measures), curop: 1042
est: 1:err(apple)=0.01, 2:err(banana)=0.04, 3:mean-error [0.50,0.50]=0.02
You can see that the error on apple class was 0.6% (rounded to 1% in the
sdroc display above) and the error on the banana class 4.2%.
The confusion matrix at the ROC r2 with manually selected op.point 208:
>> sdconfmat(ts.lab,ts*p*r2,'norm')
ans =
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 0.880 0.120 | 1.00
banana | 0.024 0.976 | 1.00
-------------------------------------
9.2.3. Interactive visualization of ROC decisions ↩
In 2D feature spaces, we may visualize the ROC decisions at different
operating points using the sdscatter. We can connect the sdscatter to
an open ROC figure. We only need to supply the data, the pipeline
including the ROC operating points, and the number of the open ROC figure:
>> sdscatter(tr,p*r2,2)

We can now change the operating points in the ROC figure and directly observe the changes to the classifier decisions.
We can also open both plots in one step simply providing the ROC object to
the sdscatter:
>> sdscatter(tr,p*r2,'roc',r2)
9.2.4. Interactive visualization of confusion matrices ↩
sddrawroc is able to interactively visualize confusion matrices at
different operating points. The advantage of this approach is that it is
applicable to arbitrary problems, while the visualization of decisions is
valid only for 2D feature spaces.
However, sdroc does not store confusion matrices by default. It only
stores the desired performance measures. We must, therefore, add the
confmat option when estimating the ROC:
>> r=sdroc(out,'confmat')
ROC (2001 w-based op.points, 3 measures), curop: 1
est: 1:err(apple)=0.01, 2:err(banana)=0.01, 3:mean-error [0.50,0.50]=0.01
>> sddrawroc(r)
To open the confusion matrix window, press c in the ROC figure.

Two confusion matrices are shown: The top one for the current operating point (black marker) and the lower one for the cursor operating point closest to the mouse pointer (gray marker).
In this view, we can compare different operating points. This approach is especially handy to understand trade-offs in multi-class ROC analysis.
9.2.5. Accessing operating points ↩
Any sdroc object stores the set of operating points using the
sdops.
>> ops=getops(r2)
Weight-based operating set (2001 ops, 2 classes) at op 208
>> getdata(ops(1:5))
ans =
0.5000 0.5000
0 1.0000
0.0005 0.9995
0.0010 0.9990
0.0015 0.9985
Note that the order of operating points may be arbitrary. Although in
certain special cases (thresholding-based ROC) we may
preserve ordering of operating points, this is not true in general. For
example, in our simple two-class ROC example using output weighting, the
sdroc returns the equal-weight solution as the first one.
9.2.6. Accessing the estimated performances ↩
sdroc object behaves like a matrix with rows representing the operating
points and columns the estimated performance measures. The order of
columns is shown in the sdroc display string. We can extract performance
estimates simply by addressing the sdroc as a matrix:
>> r2
ROC (2001 w-based op.points, 3 measures), curop: 208
est: 1:err(apple)=0.12, 2:err(banana)=0.02, 3:mean-error [0.50,0.50]=0.07
>> r2(206:208,:)
ans =
0.1205 0.0120 0.0663
0.1205 0.0181 0.0693
0.1205 0.0241 0.0723
The performance measures may be requested also by name:
>> r2(1:5,'err(apple)')
ans =
0.0120
1.0000
0.4759
0.4398
0.4157
The access to performance estimates is useful for custom selection of an operating point using application constraints.
9.2.7. Using different performance measures ↩
We may specify the performance measures used by sdroc command using the
measures option. It takes a cell array with the list of desired
measures. In this example, we will estimate commonly used ROC using true
positive and false positive ratios:
>> load fruit; a=a(:,:,[1 2])
>> [tr,ts]=randsubset(a,0.5)
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
>> p=sdnmean(tr)
Nearest mean pipeline 2x2 2 classes, 2 components (sdp_normal)
>> out=ts*p
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
>> r=sdroc(out,'measures',{'FPr','apple','TPr','apple'})
ROC (2001 w-based op.points, 3 measures), curop: 1192
est: 1:FPr(apple)=0.14, 2:TPr(apple)=0.79, 3:mean-error [0.50,0.50]=0.17
>> sddrawroc(r)

Note that in the measures option, we specify the measure name (FPr)
followed by the class name. This is needed because the false positive rate
depends on the definition of the target class. Because we specify the
target is apple, the FPr is the error on banana misclassifed as apple.
The following performance measures are supported:
mean-error: mean error over classes (using equal class priors)class-errors: per class errors for each classerr: error of a specific class, parameter: name of classTP,FP,FN,TN: e.g.TP= numer of true positives, parameter: name of target classTPr,FPr,FNr,TNr: e.g.TPr= true positive rate, parameter: name of target classsensitivity,specificity, parameter: name of target classprecision: ratio of true positives in all positivesTP/(TP+FP), parameter: name of target classpositive-fraction(posfrac): ratio of all positive decisions from all decisions(TP+FP)/N, parameter: name of target classconfmat: arbitrary entry of the confusion matrix, parameters: true class, decision, example:{'confmat','apple','banana'}frac: fraction of given decisions from all samples (used for reject curves), parameter: decision of interest
All measures are also available for multi-class ROC. Non-targets are then defined as sum of all remaining classes.
9.3. Multi-class ROC Analysis ↩
In full generality, multi-class ROC has exponential complexity with respect
to number of classes. However, practical sub-optimal solutions may be
found using different search strategies. PRSD Studio allows us to perform
multi-class ROC analysis using a greedy optimizer. Similarly to the
two-class case, we simply pass the data with the model soft outputs to
the sdroc command:
>> a=sddata(gendatm(1000))
Multi-Class Problem, 1000 by 2 sddata, 8 classes: [130 110 111 145 116 119 137 132]
>> [tr,ts]=randsubset(a,0.5)
Multi-Class Problem, 502 by 2 sddata, 8 classes: [65 55 56 73 58 60 69 66]
Multi-Class Problem, 498 by 2 sddata, 8 classes: [65 55 55 72 58 59 68 66]
>> p=sdmixture(tr,'comp',1,'iter',10)
[class 'a' EM:.......... 1 comp] [class 'b' EM:.......... 1 comp]
[class 'c' EM:.......... 1 comp] [class 'd' EM:.......... 1 comp]
[class 'e' EM:.......... 1 comp] [class 'f' EM:.......... 1 comp]
[class 'g' EM:.......... 1 comp] [class 'h' EM:.......... 1 comp]
Mixture of Gaussians pipeline 2x8 8 classes, 8 components (sdp_normal)
>> out=ts*p
Multi-Class Problem, 498 by 8 dataset with 8 classes: [65 55 55 72 58 59 68 66]
>> r=sdroc(out)
..........
ROC (2000 w-based op.points, 9 measures), curop: 318
est: 1:err(a)=0.12, 2:err(b)=0.04, 3:err(c)=0.02, 4:err(d)=0.42, 5:err(e)=0.02, 6:err(f)=0.00, 7:err(g)=0.03, 8:err(h)=0.03, 9:mean-error [0.12,0.12,...]=0.08
>> sdscatter(ts,p*r,'roc',r) % visualize the scatter and ROC plot

Note that you can switch between class errors shown the the ROC plot using the cursor keys (left/right for horizontal and up/down for vertical axis).
9.4. ROC Analysis using target thresholding (detection) ↩
In this section, we illustrate how to build ROC in target detection setting. This is achieved by thresholding the output of a model trained on a single (target) class.
Let us, for example, consider the problem where we want to detect all fruit in our fruit data set which contains apple and banana fruit examples and some stone (outlier) examples.
>> a=gendatf(1000)
'Fruit set' 1000 by 2 sddata, 3 classes: 'apple'(333) 'banana'(333) 'stone'(334)
We will first create a two-class data set labeling all non-stone classes as fruit.
>> b=sdrelab(a,{'~stone','fruit'})
new lablist:
1: apple -> fruit
2: banana -> fruit
3: stone -> stone
'Fruit set', 1000 by 2 sddata, 2 classes: 'stone'(334) 'fruit'(666)
>> [tr,ts]=randsubset(b,0.5) % split the data into training and validation
'Fruit set', 500 by 2 sddata, 2 classes: 'stone'(333) 'fruit' (167)
'Fruit set', 500 by 2 sddata, 2 classes: 'stone'(333) 'fruit' (167)
We will train a mixture model on the fruit class only:
>> p=sdmixture(subset(tr,'fruit'),'comp',10,'iter',30)
[class 'fruit' EM:.............................. 10 comp]
Mixture of Gaussians pipeline 2x1 one class, 10 components (sdp_normal)
We will estimate soft outputs on our test set:
>> out=ts*p
'Fruit set' 500 by 1 sddata, 2 classes: 'stone'(167) 'fruit'(333)
The out data contains one column representing the fruit class and
contains labels of two classes (we need non-target examples to perform
ROC):
>> out.featlab.list
sdlist (1 entries)
ind name
1 fruit
>> out.lab.list
sdlist (2 entries)
ind name
1 stone
2 fruit
The ROC analysis is straightforward:
>> r=sdroc(out)
1: stone -> non-fruit
2: fruit -> fruit
ROC (500 thr-based op.points, 3 measures), curop: 170
est: 1:err(fruit)=0.10, 2:err(non-fruit)=0.16, 3:mean-error=0.13
We visualize the detector decisions plotting the data a with all three
classes:
>> sdscatter(a,p*r,'roc',r)

By thresholding the fruit model output, we effectively reject outliers.
9.5. Selecting application-specific operating point ↩
In our projects, we usually need to fix the operating point based on specific performance requirements. Two most common techniques are:
9.5.1. Applying performance constraints ↩
Sometimes, we know specific constraints for our problem. For example, the maximum error on fruit may not exceed 10%.
We may apply performance constraints using the constrain method.
It takes existing ROC object, specification of a performance measure (by
its index or by name) and the constrain value.
Let us, for example, select subset of operating points with error on frui lower than 20%:
>> r2=constrain(r,'err(fruit)',0.2)
ROC (216 thr-based op.points, 3 measures), curop: 170
est: 1:err(fruit)=0.10, 2:err(non-fruit)=0.16, 3:mean-error=0.13
This results in a subset of 216 operating points from the original 500. In
this subset, the constrain method sets the current operarating
point minimizing the mean error over classes.
We may be, however, interested in a different point, simply minimizing the error
on the non-fruit. To do that, we can use the setcurop method:
>> r2=setcurop(r2,'min','err(non-fruit)')
ROC (216 thr-based op.points, 3 measures), curop: 212
est: 1:err(fruit)=0.19, 2:err(non-fruit)=0.11, 3:mean-error=0.15
Note that we ma reach 5% better error on non-cancer at the expense of higher mean error.
9.5.2. Constraints using the low-level methods ↩
We may also can apply constraints simply by querying the estimated
performances in the ROC object using the standard Matlab commands such as
find or min:
>> r
ROC (500 thr-based op.points, 3 measures), curop: 170
est: 1:err(fruit)=0.10, 2:err(non-fruit)=0.16, 3:mean-error=0.13
We will first find indices of all operating points with error on fruit smaller than 10%:
>> ind=find( r(:,'err(fruit)')<0.10 );
Now we can now find minimum error on non-fruit in this subset:
>> [m,ind2]=min( r(ind,'err(non-fruit)') );
And set the resulting operating point directly by the index:
>> r2=setcurop(r,ind(ind2))
ROC (500 thr-based op.points, 3 measures), curop: 170
est: 1:err(fruit)=0.10, 2:err(non-fruit)=0.16, 3:mean-error=0.13
>> sddrawroc(r2)

If criteria are too strict, the performance constraints may not be met. For example, if we wish not to exceed 10% error on fruit and 1% of stones, our classifier cannot provide a solution:
>> ind=find( r(:,'err(fruit)')<0.10 & r(:,'err(non-fruit)')<0.01 )
ind =
Empty matrix: 0-by-1
9.5.3. Cost-sensitive optimization ↩
Second type of selecting operating point is based on an idea of misclassification costs. We can penalize different types of errors in confusion matrix and minimize classifier loss.
To perform cost-sensitive optimization, we need to fix our cost specification. This may come in a form of a cost matrix corresponding to the confusion matrix.
In this example, we will build a three-class mixture model for the fruit problem:
>> a
Fruit set, 1000 by 2 sddata, 3 classes: 'apple'(333) 'banana'(333) 'stone'(334)
>> [tr,ts]=randsubset(a,0.5)
'Fruit set' 499 by 2 sddata, 3 classes: 'apple'(166) 'banana'(166) 'stone'(167)
'Fruit set' 501 by 2 sddata, 3 classes: 'apple'(167) 'banana'(167) 'stone'(167)
>> p=sdmixture(tr,'comp',[3 3 1],'iter',10)
[class 'apple' EM:.......... 3 comp] [class 'banana' EM:.......... 3 comp]
[class 'stone' EM:.......... 1 comp]
sequential pipeline 2x3 ''
1 sdp_normal 2x3 3 classes, 7 components
We can estimate the normalized confusion matrix at the default operating point:
>> sdconfmat(ts.lab,ts*sddecide(p),'norm')
ans =
True | Decisions
Labels | apple banana stone | Totals
--------------------------------------------
apple | 0.892 0.108 0.000 | 1.00
banana | 0.036 0.922 0.042 | 1.00
stone | 0.006 0.132 0.862 | 1.00
--------------------------------------------
We might be interested in lowering the fraction of apples misclassified as bananas. We will therefore specify the cost matrix in the following way:
>> m=ones(3); m(1,2)=5
m =
1 5 1
1 1 1
1 1 1
Now we can perform ROC analysis using the cost-based optimization:
>> out=ts*p
Fruit set, 499 by 3 sddata, 3 classes: 'apple'(166) 'banana'(166) 'stone'(167)
>> r=sdroc(out,'cost',m)
..........
ROC (100 w-based op.points, 4 measures), curop: 1
est: 1:err(apple)=0.01, 2:err(banana)=0.14, 3:err(stone)=0.14, 4:mean-error [0.33,0.33,...]=0.10
>> sdconfmat(ts.lab,ts*p*r,'norm')
ans =
True | Decisions
Labels | apple banana stone | Totals
--------------------------------------------
apple | 0.994 0.006 0.000 | 1.00
banana | 0.133 0.855 0.012 | 1.00
stone | 0.012 0.126 0.862 | 1.00
--------------------------------------------
The confusion matrix for the operating point found shows that we minimized the apple / banana error. Of course we pay in terms of banana / apple as each solution is a specific trade-off.
9.5.4. Applying multiple performance constraints ↩
Multiple performance constraints may be combined in one constrain call.
Let us consider a multi-class ROC example where we try to optimize
precision and per-class errors:
>> a=gendatf(10000)
'Fruit set' 10000 by 2 sddata, 3 classes: 'apple'(3333) 'banana'(3333) 'stone'(3334)
>> [tr,ts]=randsubset(a,0.5)
'Fruit set' 4999 by 2 sddata, 3 classes: 'apple'(1666) 'banana'(1666) 'stone'(1667)
'Fruit set' 5001 by 2 sddata, 3 classes: 'apple'(1667) 'banana'(1667) 'stone'(1667)
>> p=sdlinear(tr)
sequential pipeline 2x3 'Linear discriminant'
1 Gauss eq.cov. 2x3 3 classes, 3 components (sdp_normal)
2 Output normalization 3x3 (sdp_norm)
>> out=ts*p
'Fruit set' 5001 by 3 sddata, 3 classes: 'apple'(1667) 'banana'(1667) 'stone'(1667)
>> r=sdroc(out,'confmat','measures',{'precision','apple','class-errors'})
..........
ROC (2000 w-based op.points, 5 measures), curop: 1
est: 1:precision(apple)=0.84, 2:err(apple)=0.18, 3:err(banana)=0.23, 4:err(stone)=0.11, 5:mean-error=0.17
>> sddrawroc(r)

We are interested in precisions above 75% and banana errors under 30%:
>> r2=constrain(r,'precision(apple)',0.75,'err(banana)',0.3)
ROC (260 w-based op.points, 5 measures), curop: 1
est: 1:precision(apple)=0.84, 2:err(apple)=0.18, 3:err(banana)=0.23, 4:err(stone)=0.11, 5:mean-error=0.17
Remember that the oprating point is selected in this subset to minimize mean error. This default solution yields the following confusion matrix:
>> sdconfmat(ts.lab,out*r2)
ans =
True | Decisions
Labels | apple banana stone | Totals
--------------------------------------------
apple | 1373 268 26 | 1667
banana | 236 1289 142 | 1667
stone | 27 153 1487 | 1667
--------------------------------------------
Totals | 1636 1710 1655 | 5001
We are, however, intrested in lowering the amount of apples, misclassified as bananas as apples are in our problem more costly. Therefore, we set the following cost matrix, penalizing this type of error with high cost:
>> m=ones(3); m(1,2)=10
m =
1 10 1
1 1 1
1 1 1
We may now select the operating point in our subset minimizing the loss function considering our cost specification:
>> r2=setcurop(r2,'cost',m)
ROC (260 w-based op.points, 5 measures), curop: 4
est: 1:precision(apple)=0.75, 2:err(apple)=0.07, 3:err(banana)=0.29, 4:err(stone)=0.17, 5:mean-error=0.18
>> sdconfmat(ts.lab,out*r2)
ans =
True | Decisions
Labels | apple banana stone | Totals
--------------------------------------------
apple | 1546 119 2 | 1667
banana | 413 1185 69 | 1667
stone | 101 184 1382 | 1667
--------------------------------------------
Totals | 2060 1488 1453 | 5001
The final classifier is composed of a model p with the operating point in r2:
>> pd=p*r2
sequential pipeline 2x1 'Linear discriminant+Decision'
1 Gauss eq.cov. 2x3 3 classes, 3 components (sdp_normal)
2 Output normalization 3x3 (sdp_norm)
3 Decision 3x1 weighting, 3 classes, 260 ops at op 4 (sdp_decide)
9.6. Rejection ↩
Rejection refers to the choice we make not to assign the data sample to any of the learned classes. The sample may be either discarded or passed for further processing by a different system or human expert.
PRSD Studio supports both types of rejection:
- Distance-based rejection where we discard data samples far away from the learned class distributions. This rejection scheme helps us to protect against outliers
- Rejection close to the decision boundary where we reject data samples that fall into the area of class overlap and hence are likely to be misclassified.
Technically, PRSD Studio implements both rejection types by adding a rejection threshold to the weighting based operating point. The only difference between both reject options is in the type of the classifier soft output used. If the soft output was normalized with respect to all classes, the rejection operates close to the boundary. Without normalization, the rejection discard outliers.
9.6.1. Distance-based rejection ↩
To illustrate distance-based rejection, we will use two-class Higleyman
dataset gendath and train quadratic discriminant:
>> a=sddata(gendath(300))
'Highleyman Dataset' 300 by 2 sddata, 2 classes: '1'(154) '2'(146)
>> [tr,ts]=randsubset(a,0.5);
>> p=sdgauss(tr)
Gaussian model pipeline 2x2 2 classes, 2 components (sdp_normal)
We estimate soft outputs on the test set:
>> out=ts*p
'Highleyman Dataset' 150 by 2 sddata, 2 classes: '1'(77) '2'(73)
>> +out(1:5,:)
ans =
0.0747 0.0000
0.1233 0.0036
0.0777 0.0393
0.1121 0.0000
0.1099 0.1760
Note that the soft outputs do not sum to one. That is because sdgauss returns
class conditional densities.
We will now use sdroc command to construct a reject curve at default
operating point. It will add the rejection capability to the operating
point and derive a set of feasible rejection thresholds from the data. In
this example, default operating point (equal class weights) will be used as
a bases for adding reject option:
>> r=sdroc(out,'reject')
ROC (1001 wr-based op.points, 4 measures), curop: 1
est: 1:frac(reject)=0.00, 2:TPr(1)=0.86, 3:TPr(2)=0.97, 4:TPr(reject)=0.00
We can visualize the decisions using sdscatter.
>> sdscatter(ts,p*r,'roc',r)

The horizontal axis represents the fraction of rejected objects; vertical the true positive ratio for the first class. By moving out of the default operating point (where rejection is not performed), we can observe how rejection takes place far from both trained distributions.
We can select a specific point by left mouse click and store it back to the
r object by pressing s key. We can now estimate the confusion matrix on
the test set:
>> r=setcurop(r,6); % Setting the operating point 6 in sdroc object r
ROC (1001 wr-based op.points, 4 measures), curop: 6
est: 1:frac(reject)=0.03, 2:TPr(1)=0.82, 3:TPr(2)=0.95, 4:TPr(reject)=0.00
>> sdconfmat(ts.lab,ts*p*r)
ans =
True | Decisions
Labels | 1 2 reject | Totals
--------------------------------------------
1 | 62 11 3 | 76
2 | 2 69 2 | 73
--------------------------------------------
Totals | 64 80 5 | 149
We can see that the confusion matrix contains the added reject decision.
9.6.2. Rejection close to the decision boundary ↩
To illustrate the rejection close to the decision boundary, we will continue in the example above. We make a single change in the procedure above: we normalize the model soft outputs to sum to one.
>> p2=sdquadratic(tr)
sequential pipeline 2x2 'Quadratic discr.'
1 Gauss full cov. 2x2 2 classes, 2 components (sdp_normal)
2 Output normalization 2x2 (sdp_norm)
>> out2=ts*p2
'Highleyman Dataset' 150 by 2 sddata, 2 classes: '1'(77) '2'(73)
>> +out2(1:5,:)
ans =
1.0000 0.0000
0.9717 0.0283
0.6639 0.3361
1.0000 0.0000
0.3843 0.6157
Note that the classifier outputs are now posterior probabilitites, not densities.
>> r2=sdroc(out2,'reject')
ROC (1001 wr-based op.points, 4 measures), curop: 1
est: 1:frac(reject)=0.00, 2:TPr(1)=0.87, 3:TPr(2)=0.97, 4:TPr(reject)=0.00
>> sdscatter(ts,p*r2,'roc',r2)

The red reject decision now occupies the area where errors would be highly probably.
9.6.3. Adding reject option to a specific operating point ↩
By default, the reject option will add rejection to a default operating
point (with equal class weights). In order to add the reject option to a
different operating point, we may pass operating set sdops or ROC
object sdroc:
>> a
'Fruit Set', 2000 by 2 sddata with 2 classes: 'apple' (983) 'banana' (1017)
>> p % trained Parzen classifier
sequential pipeline 2x2 ''
1 sdp_parzen 2x2 2 classes, 1601 prototypes
>> out=a*p % soft outputs
'Fruit Set', 2000 by 2 sddata with 2 classes: 'apple' (983) 'banana' (1017)
>> r=sdroc(out) % standard ROC
ROC (2001 w-based op.points, 3 measures), curop: 1014
est: 1:err(1)=0.02, 2:err(2)=0.02, 3:mean-error [0.50,0.50]=0.02
We will choose a different operating point - e.g. the point 100 where we're not loosing the class 2:
>> r=setcurop(r,100)
ROC (2001 w-based op.points, 3 measures), curop: 100
est: 1:err(1)=0.25, 2:err(2)=0.00, 3:mean-error [0.50,0.50]=0.13
To create a rejection curve starting from this operating point, just pass
the r to the reject option:
>> r2=sdroc(out,'reject',r)
ROC (1001 wr-based op.points, 4 measures), curop: 1
est: 1:frac(reject)=0.00, 2:TPr(1)=0.75, 3:TPr(2)=1.00, 4:TPr(reject)=0.00
The eventual pipeline would be:
>> p2=p*r2
sequential pipeline 2x1 'Parzen+Decision'
1 Parzen 2x2 2 classes, 200 prototypes (sdp_parzen)
2 Decision 2x1 weight+reject, 3 decisions, 1001 ops at op 1 (sdp_decide)
>> p2.list
sdlist (3 entries)
ind name
1 1
2 2
3 reject
ans =
9.6.4. Setting the rejection manually by discarding fraction of data ↩
Instead of constructing the full reject curve, we may also directly add a
reject option to our multi-class classifier by specifying the fraction of
objects to be discarded. We need to pass this reject fraction to the
reject option. In this example, we reject 1% of all data:
>> r=sdroc(a*p,'reject',0.01)
ROC (wr-based op.point, 4 measures)
est: 1:frac(reject)=0.01, 2:TPr(1)=0.97, 3:TPr(2)=0.97, 4:TPr(reject)=0.00
>> sdscatter(a,p*r)

In case, we need to specify reject fraction and use specific operating
point, we may supply it as parameter to the additional reject option:
>> r=sdroc(a*p)
ROC (2001 w-based op.points, 3 measures), curop: 1014
est: 1:err(1)=0.02, 2:err(2)=0.02, 3:mean-error [0.50,0.50]=0.02
>> r=setcurop(r,100)
ROC (2001 w-based op.points, 3 measures), curop: 100
est: 1:err(1)=0.25, 2:err(2)=0.00, 3:mean-error [0.50,0.50]=0.13
>> r2=sdroc(a*p,'reject',0.01,'reject',r) % operating point in r is used
ROC (wr-based op.point, 4 measures)
est: 1:frac(reject)=0.01, 2:TPr(1)=0.73, 3:TPr(2)=1.00, 4:TPr(reject)=0.00
