PRSD Studio Documentation development version 2.0.9 (8-Mar-2010)

Chapter 9: ROC Analysis

Table of contents

9.1. Introduction ↩

In Chapter 6, we saw that a trained classifier may provide decisions at different operating points. Now we will learn to use a powerful tool helping us to find desirable operating points in our applications: The ROC analysis.

ROC abbreviation stands for the Receiver Operating Characteristic.

The basic idea of ROC analysis is very simple: For a given trained classifier and a labeled test set define a set of possible operating points and estimate different type of errors at these points.

To optimize our classifier, we will need the following:

ROC analysis works in three steps:

Let us consider a two-class problem with apple and banana classes. We will select the two classes of interest (data set a contains also the third class with outliers called stone).

>> load fruit; b=a(:,:,[1 2]);
>> [tr,ts]=randsubset(b,0.5) %  split the data into training and test set
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50) 
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)

We train our classifier on the training set. We use a mixture of Gaussians with 5 components per class

>> p=sdmixture(tr,'comp',5,'iter',10)
[class 'apple' EM:.......... 5 comp] [class 'banana' EM:.......... 5 comp]            
Mixture of Gaussians pipeline 2x2  2 classes, 10 components (sdp_normal)

Now we can estimate soft outputs of our mixture model on the test set. For a mixture model, the soft outputs are class-conditional probability densities:

>> out=ts*p
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)

The two outputs in out represent our two classes:

>> out.lab.list
sdlist (2 entries)
 ind name
   1 apple 
   2 banana

On the soft outputs, we perform ROC analysis using the sdroc command:

>> r=sdroc(out)
ROC (2001 w-based op.points, 3 measures), curop: 1042
est: 1:err(apple)=0.01, 2:err(banana)=0.04, 3:mean-error [0.50,0.50]=0.02

sdroc defined a set of operating points, estimated three error measures (error on each class and the mean error), and fixed the "current" operating point to minimize the mean error.

We may visualize the ROC using the sddrawroc command:

>> sddrawroc(r)

The ROC plot shows the first two measures in r, namely the error on apple class on the horizontal axis and the error on banana on the vertical axis. In the ROC plot, each blue marker represents one operating point. The current operating point is denoted by the thick black marker. When moving over the plot, the gray cursor marker follows the closest operating point. The figure title then shows the number of the cursor operating point and the values of the errors.

Note that selecting an operating point is a matter of trade-off. When we try to minimize error on one class, the error on the other at some moment inevitably increases. Only in the situation without class overlap, we could select an optimal solution. In real-world pattern recognition projects, we do need to accept certain level of errors. ROC analysis allows us to carefuly choose the acceptable trade-off.

9.2. Using sdroc objects ↩

9.2.1. Setting the current operating point ↩

The current operating point may be set interactively in the sddrawroc figure by clicking the left mouse button. By pressing the s key (save), we may store the current operating point index back to the sdroc object in the Matlab workspace. Simply put the name of the sdroc variable in the dialog box and press OK.

In this way, we can set the current operating point also to other PRSD Studio objects, such as sdops sets of operating points, pipelines or custom algorithms discussed in Chapter 9 or even PRTools sddecide mappings.

Alternatively, we may set the curernt operating point manually using the setcurop function on Matlab prompt:

>> r2=setcurop(r,208)
ROC (2001 w-based op.points, 3 measures), curop: 208
est: 1:err(apple)=0.12, 2:err(banana)=0.02, 3:mean-error [0.50,0.50]=0.07

9.2.2. Performing decisions based on ROC ↩

The sdroc object may be directly concatenated with the model pipeline via the * operator. This will add the decision action with all the ROC operating points:

>> pd=p*r
sequential pipeline     2x1 ''
 1  sdp_normal          2x2  2 classes, 10 components
 2  sdp_decide          2x1  Weight-based decision (2 classes, 2001 ops) at op 1042

The pd pipeline returns decisions at the current operating point:

>> sdconfmat(ts.lab,ts*pd)

ans =

 True      | Decisions
 Labels    | apple  banana  | Totals
-------------------------------------
 apple     |   165      1   |   166
 banana    |     7    159   |   166
-------------------------------------
 Totals    |   172    160   |   332

To relate the confusion matrix to the error measures in the ROC object, we may better use error normalization:

>> sdconfmat(ts.lab,ts*pd,'norm')

ans =

 True      | Decisions
 Labels    | apple  banana  | Totals
-------------------------------------
 apple     | 0.994  0.006   | 1.00
 banana    | 0.042  0.958   | 1.00
-------------------------------------

>> r
ROC (2001 w-based op.points, 3 measures), curop: 1042
est: 1:err(apple)=0.01, 2:err(banana)=0.04, 3:mean-error [0.50,0.50]=0.02

You can see that the error on apple class was 0.6% (rounded to 1% in the sdroc display above) and the error on the banana class 4.2%.

The confusion matrix at the ROC r2 with manually selected op.point 208:

>> sdconfmat(ts.lab,ts*p*r2,'norm')

ans =

 True      | Decisions
 Labels    | apple  banana  | Totals
-------------------------------------
 apple     | 0.880  0.120   | 1.00
 banana    | 0.024  0.976   | 1.00
-------------------------------------

9.2.3. Interactive visualization of ROC decisions ↩

In 2D feature spaces, we may visualize the ROC decisions at different operating points using the sdscatter. We can connect the sdscatter to an open ROC figure. We only need to supply the data, the pipeline including the ROC operating points, and the number of the open ROC figure:

>> sdscatter(tr,p*r2,2)

We can now change the operating points in the ROC figure and directly observe the changes to the classifier decisions.

We can also open both plots in one step simply providing the ROC object to the sdscatter:

>> sdscatter(tr,p*r2,'roc',r2)

9.2.4. Interactive visualization of confusion matrices ↩

sddrawroc is able to interactively visualize confusion matrices at different operating points. The advantage of this approach is that it is applicable to arbitrary problems, while the visualization of decisions is valid only for 2D feature spaces.

However, sdroc does not store confusion matrices by default. It only stores the desired performance measures. We must, therefore, add the confmat option when estimating the ROC:

>> r=sdroc(out,'confmat')
ROC (2001 w-based op.points, 3 measures), curop: 1
est: 1:err(apple)=0.01, 2:err(banana)=0.01, 3:mean-error [0.50,0.50]=0.01
>> sddrawroc(r)

To open the confusion matrix window, press c in the ROC figure.

Two confusion matrices are shown: The top one for the current operating point (black marker) and the lower one for the cursor operating point closest to the mouse pointer (gray marker).

In this view, we can compare different operating points. This approach is especially handy to understand trade-offs in multi-class ROC analysis.

9.2.5. Accessing operating points ↩

Any sdroc object stores the set of operating points using the sdops.

>> ops=getops(r2)
Weight-based operating set (2001 ops, 2 classes) at op 208
>> getdata(ops(1:5))

ans =

0.5000    0.5000
     0    1.0000
0.0005    0.9995
0.0010    0.9990
0.0015    0.9985

Note that the order of operating points may be arbitrary. Although in certain special cases (thresholding-based ROC) we may preserve ordering of operating points, this is not true in general. For example, in our simple two-class ROC example using output weighting, the sdroc returns the equal-weight solution as the first one.

9.2.6. Accessing the estimated performances ↩

sdroc object behaves like a matrix with rows representing the operating points and columns the estimated performance measures. The order of columns is shown in the sdroc display string. We can extract performance estimates simply by addressing the sdroc as a matrix:

>> r2
ROC (2001 w-based op.points, 3 measures), curop: 208
est: 1:err(apple)=0.12, 2:err(banana)=0.02, 3:mean-error [0.50,0.50]=0.07
>> r2(206:208,:)

ans =

0.1205    0.0120    0.0663
0.1205    0.0181    0.0693
0.1205    0.0241    0.0723

The performance measures may be requested also by name:

>> r2(1:5,'err(apple)')

ans =

0.0120
1.0000
0.4759
0.4398
0.4157

The access to performance estimates is useful for custom selection of an operating point using application constraints.

9.2.7. Using different performance measures ↩

We may specify the performance measures used by sdroc command using the measures option. It takes a cell array with the list of desired measures. In this example, we will estimate commonly used ROC using true positive and false positive ratios:

>> load fruit; a=a(:,:,[1 2])
>> [tr,ts]=randsubset(a,0.5)
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50) 
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
>> p=sdnmean(tr) 
Nearest mean pipeline   2x2  2 classes, 2 components (sdp_normal)
>> out=ts*p
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
>> r=sdroc(out,'measures',{'FPr','apple','TPr','apple'})
ROC (2001 w-based op.points, 3 measures), curop: 1192
est: 1:FPr(apple)=0.14, 2:TPr(apple)=0.79, 3:mean-error [0.50,0.50]=0.17
>> sddrawroc(r)

Note that in the measures option, we specify the measure name (FPr) followed by the class name. This is needed because the false positive rate depends on the definition of the target class. Because we specify the target is apple, the FPr is the error on banana misclassifed as apple.

The following performance measures are supported:

All measures are also available for multi-class ROC. Non-targets are then defined as sum of all remaining classes.

9.3. Multi-class ROC Analysis ↩

In full generality, multi-class ROC has exponential complexity with respect to number of classes. However, practical sub-optimal solutions may be found using different search strategies. PRSD Studio allows us to perform multi-class ROC analysis using a greedy optimizer. Similarly to the two-class case, we simply pass the data with the model soft outputs to the sdroc command:

>> a=sddata(gendatm(1000))
Multi-Class Problem, 1000 by 2 sddata, 8 classes: [130  110  111  145  116  119  137  132]
>> [tr,ts]=randsubset(a,0.5) 
Multi-Class Problem, 502 by 2 sddata, 8 classes: [65  55  56  73  58  60  69  66]
Multi-Class Problem, 498 by 2 sddata, 8 classes: [65  55  55  72  58  59  68  66]
>> p=sdmixture(tr,'comp',1,'iter',10)
[class 'a' EM:.......... 1 comp] [class 'b' EM:.......... 1 comp] 
[class 'c' EM:.......... 1 comp] [class 'd' EM:.......... 1 comp] 
[class 'e' EM:.......... 1 comp] [class 'f' EM:.......... 1 comp] 
[class 'g' EM:.......... 1 comp] [class 'h' EM:.......... 1 comp] 
Mixture of Gaussians pipeline 2x8  8 classes, 8 components (sdp_normal)          
>> out=ts*p
Multi-Class Problem, 498 by 8 dataset with 8 classes: [65  55  55  72  58  59  68  66]
>> r=sdroc(out)
..........
ROC (2000 w-based op.points, 9 measures), curop: 318
est: 1:err(a)=0.12, 2:err(b)=0.04, 3:err(c)=0.02, 4:err(d)=0.42, 5:err(e)=0.02, 6:err(f)=0.00, 7:err(g)=0.03, 8:err(h)=0.03, 9:mean-error [0.12,0.12,...]=0.08
>> sdscatter(ts,p*r,'roc',r) %  visualize the scatter and ROC plot

Note that you can switch between class errors shown the the ROC plot using the cursor keys (left/right for horizontal and up/down for vertical axis).

9.4. ROC Analysis using target thresholding (detection) ↩

In this section, we illustrate how to build ROC in target detection setting. This is achieved by thresholding the output of a model trained on a single (target) class.

Let us, for example, consider the problem where we want to detect all fruit in our fruit data set which contains apple and banana fruit examples and some stone (outlier) examples.

>> a=gendatf(1000)
'Fruit set' 1000 by 2 sddata, 3 classes: 'apple'(333) 'banana'(333) 'stone'(334) 

We will first create a two-class data set labeling all non-stone classes as fruit.

>> b=sdrelab(a,{'~stone','fruit'})
new lablist:
1: apple  -> fruit 
2: banana -> fruit 
3: stone  -> stone 
'Fruit set', 1000 by 2 sddata, 2 classes:  'stone'(334) 'fruit'(666) 

>> [tr,ts]=randsubset(b,0.5) %  split the data into training and validation
'Fruit set', 500 by 2 sddata, 2 classes: 'stone'(333)  'fruit' (167)
'Fruit set', 500 by 2 sddata, 2 classes: 'stone'(333)  'fruit' (167)

We will train a mixture model on the fruit class only:

>> p=sdmixture(subset(tr,'fruit'),'comp',10,'iter',30)
[class 'fruit' EM:.............................. 10 comp] 
Mixture of Gaussians pipeline 2x1  one class, 10 components (sdp_normal)

We will estimate soft outputs on our test set:

>> out=ts*p
'Fruit set' 500 by 1 sddata, 2 classes: 'stone'(167) 'fruit'(333)

The out data contains one column representing the fruit class and contains labels of two classes (we need non-target examples to perform ROC):

>> out.featlab.list
sdlist (1 entries)
 ind name
   1 fruit
>> out.lab.list
sdlist (2 entries)
 ind name
   1 stone 
   2 fruit 

The ROC analysis is straightforward:

>> r=sdroc(out)
  1: stone  -> non-fruit
  2: fruit  -> fruit
ROC (500 thr-based op.points, 3 measures), curop: 170
est: 1:err(fruit)=0.10, 2:err(non-fruit)=0.16, 3:mean-error=0.13

We visualize the detector decisions plotting the data a with all three classes:

>> sdscatter(a,p*r,'roc',r)

By thresholding the fruit model output, we effectively reject outliers.

9.5. Selecting application-specific operating point ↩

In our projects, we usually need to fix the operating point based on specific performance requirements. Two most common techniques are:

9.5.1. Applying performance constraints ↩

Sometimes, we know specific constraints for our problem. For example, the maximum error on fruit may not exceed 10%.

We may apply performance constraints using the constrain method. It takes existing ROC object, specification of a performance measure (by its index or by name) and the constrain value.

Let us, for example, select subset of operating points with error on frui lower than 20%:

>> r2=constrain(r,'err(fruit)',0.2)
ROC (216 thr-based op.points, 3 measures), curop: 170
est: 1:err(fruit)=0.10, 2:err(non-fruit)=0.16, 3:mean-error=0.13

This results in a subset of 216 operating points from the original 500. In this subset, the constrain method sets the current operarating point minimizing the mean error over classes. We may be, however, interested in a different point, simply minimizing the error on the non-fruit. To do that, we can use the setcurop method:

>> r2=setcurop(r2,'min','err(non-fruit)')
ROC (216 thr-based op.points, 3 measures), curop: 212
est: 1:err(fruit)=0.19, 2:err(non-fruit)=0.11, 3:mean-error=0.15

Note that we ma reach 5% better error on non-cancer at the expense of higher mean error.

9.5.2. Constraints using the low-level methods ↩

We may also can apply constraints simply by querying the estimated performances in the ROC object using the standard Matlab commands such as find or min:

>> r
ROC (500 thr-based op.points, 3 measures), curop: 170
est: 1:err(fruit)=0.10, 2:err(non-fruit)=0.16, 3:mean-error=0.13

We will first find indices of all operating points with error on fruit smaller than 10%:

>> ind=find( r(:,'err(fruit)')<0.10 );

Now we can now find minimum error on non-fruit in this subset:

>> [m,ind2]=min( r(ind,'err(non-fruit)') );

And set the resulting operating point directly by the index:

>> r2=setcurop(r,ind(ind2))
ROC (500 thr-based op.points, 3 measures), curop: 170
est: 1:err(fruit)=0.10, 2:err(non-fruit)=0.16, 3:mean-error=0.13
>> sddrawroc(r2)

If criteria are too strict, the performance constraints may not be met. For example, if we wish not to exceed 10% error on fruit and 1% of stones, our classifier cannot provide a solution:

>> ind=find( r(:,'err(fruit)')<0.10 & r(:,'err(non-fruit)')<0.01 )

ind =

   Empty matrix: 0-by-1

9.5.3. Cost-sensitive optimization ↩

Second type of selecting operating point is based on an idea of misclassification costs. We can penalize different types of errors in confusion matrix and minimize classifier loss.

To perform cost-sensitive optimization, we need to fix our cost specification. This may come in a form of a cost matrix corresponding to the confusion matrix.

In this example, we will build a three-class mixture model for the fruit problem:

>> a
Fruit set, 1000 by 2 sddata, 3 classes: 'apple'(333) 'banana'(333) 'stone'(334) 
>> [tr,ts]=randsubset(a,0.5)
'Fruit set' 499 by 2 sddata, 3 classes: 'apple'(166) 'banana'(166) 'stone'(167) 
'Fruit set' 501 by 2 sddata, 3 classes: 'apple'(167) 'banana'(167) 'stone'(167) 
>> p=sdmixture(tr,'comp',[3 3 1],'iter',10)
[class 'apple' EM:.......... 3 comp] [class 'banana' EM:.......... 3 comp]
[class 'stone' EM:.......... 1 comp] 
sequential pipeline     2x3 ''
 1  sdp_normal          2x3  3 classes, 7 components

We can estimate the normalized confusion matrix at the default operating point:

>> sdconfmat(ts.lab,ts*sddecide(p),'norm')

ans =

 True      | Decisions
 Labels    | apple  banana stone   | Totals
--------------------------------------------
 apple     | 0.892  0.108  0.000   | 1.00
 banana    | 0.036  0.922  0.042   | 1.00
 stone     | 0.006  0.132  0.862   | 1.00
--------------------------------------------

We might be interested in lowering the fraction of apples misclassified as bananas. We will therefore specify the cost matrix in the following way:

>> m=ones(3); m(1,2)=5

m =

 1     5     1
 1     1     1
 1     1     1

Now we can perform ROC analysis using the cost-based optimization:

>> out=ts*p
Fruit set, 499 by 3 sddata, 3 classes: 'apple'(166)  'banana'(166)  'stone'(167)
>> r=sdroc(out,'cost',m)
..........
ROC (100 w-based op.points, 4 measures), curop: 1
est: 1:err(apple)=0.01, 2:err(banana)=0.14, 3:err(stone)=0.14, 4:mean-error [0.33,0.33,...]=0.10

>> sdconfmat(ts.lab,ts*p*r,'norm')

ans =

 True      | Decisions
 Labels    | apple  banana stone   | Totals
--------------------------------------------
 apple     | 0.994  0.006  0.000   | 1.00
 banana    | 0.133  0.855  0.012   | 1.00
 stone     | 0.012  0.126  0.862   | 1.00
--------------------------------------------

The confusion matrix for the operating point found shows that we minimized the apple / banana error. Of course we pay in terms of banana / apple as each solution is a specific trade-off.

9.5.4. Applying multiple performance constraints ↩

Multiple performance constraints may be combined in one constrain call. Let us consider a multi-class ROC example where we try to optimize precision and per-class errors:

>> a=gendatf(10000)
'Fruit set' 10000 by 2 sddata, 3 classes: 'apple'(3333) 'banana'(3333) 'stone'(3334) 
>> [tr,ts]=randsubset(a,0.5)
'Fruit set' 4999 by 2 sddata, 3 classes: 'apple'(1666) 'banana'(1666) 'stone'(1667) 
'Fruit set' 5001 by 2 sddata, 3 classes: 'apple'(1667) 'banana'(1667) 'stone'(1667) 

>> p=sdlinear(tr)
sequential pipeline     2x3 'Linear discriminant'
 1  Gauss eq.cov.           2x3  3 classes, 3 components (sdp_normal)
 2  Output normalization    3x3  (sdp_norm)

>> out=ts*p
'Fruit set' 5001 by 3 sddata, 3 classes: 'apple'(1667) 'banana'(1667) 'stone'(1667) 

>> r=sdroc(out,'confmat','measures',{'precision','apple','class-errors'})
..........
ROC (2000 w-based op.points, 5 measures), curop: 1
est: 1:precision(apple)=0.84, 2:err(apple)=0.18, 3:err(banana)=0.23, 4:err(stone)=0.11, 5:mean-error=0.17

>> sddrawroc(r)

We are interested in precisions above 75% and banana errors under 30%:

>> r2=constrain(r,'precision(apple)',0.75,'err(banana)',0.3)
ROC (260 w-based op.points, 5 measures), curop: 1
est: 1:precision(apple)=0.84, 2:err(apple)=0.18, 3:err(banana)=0.23, 4:err(stone)=0.11, 5:mean-error=0.17

Remember that the oprating point is selected in this subset to minimize mean error. This default solution yields the following confusion matrix:

>> sdconfmat(ts.lab,out*r2)

ans =

 True      | Decisions
 Labels    |  apple banana  stone  | Totals
--------------------------------------------
 apple     |  1373    268     26   |  1667
 banana    |   236   1289    142   |  1667
 stone     |    27    153   1487   |  1667
--------------------------------------------
 Totals    |  1636   1710   1655   |  5001

We are, however, intrested in lowering the amount of apples, misclassified as bananas as apples are in our problem more costly. Therefore, we set the following cost matrix, penalizing this type of error with high cost:

>> m=ones(3); m(1,2)=10

m =

 1    10     1
 1     1     1
 1     1     1

We may now select the operating point in our subset minimizing the loss function considering our cost specification:

>> r2=setcurop(r2,'cost',m)
ROC (260 w-based op.points, 5 measures), curop: 4
est: 1:precision(apple)=0.75, 2:err(apple)=0.07, 3:err(banana)=0.29, 4:err(stone)=0.17, 5:mean-error=0.18

>> sdconfmat(ts.lab,out*r2)

ans =

 True      | Decisions
 Labels    |  apple banana  stone  | Totals
--------------------------------------------
 apple     |  1546    119      2   |  1667
 banana    |   413   1185     69   |  1667
 stone     |   101    184   1382   |  1667
--------------------------------------------
 Totals    |  2060   1488   1453   |  5001

The final classifier is composed of a model p with the operating point in r2:

>> pd=p*r2
sequential pipeline     2x1 'Linear discriminant+Decision'
 1  Gauss eq.cov.           2x3  3 classes, 3 components (sdp_normal)
 2  Output normalization    3x3  (sdp_norm)
 3  Decision                3x1  weighting, 3 classes, 260 ops at op 4 (sdp_decide)

9.6. Rejection ↩

Rejection refers to the choice we make not to assign the data sample to any of the learned classes. The sample may be either discarded or passed for further processing by a different system or human expert.

PRSD Studio supports both types of rejection:

Technically, PRSD Studio implements both rejection types by adding a rejection threshold to the weighting based operating point. The only difference between both reject options is in the type of the classifier soft output used. If the soft output was normalized with respect to all classes, the rejection operates close to the boundary. Without normalization, the rejection discard outliers.

9.6.1. Distance-based rejection ↩

To illustrate distance-based rejection, we will use two-class Higleyman dataset gendath and train quadratic discriminant:

>> a=sddata(gendath(300))
'Highleyman Dataset' 300 by 2 sddata, 2 classes: '1'(154) '2'(146)
>> [tr,ts]=randsubset(a,0.5);
>> p=sdgauss(tr)
Gaussian model pipeline 2x2  2 classes, 2 components (sdp_normal)

We estimate soft outputs on the test set:

>> out=ts*p
'Highleyman Dataset' 150 by 2 sddata, 2 classes: '1'(77) '2'(73) 
>> +out(1:5,:)

ans =

0.0747    0.0000
0.1233    0.0036
0.0777    0.0393
0.1121    0.0000
0.1099    0.1760

Note that the soft outputs do not sum to one. That is because sdgauss returns class conditional densities.

We will now use sdroc command to construct a reject curve at default operating point. It will add the rejection capability to the operating point and derive a set of feasible rejection thresholds from the data. In this example, default operating point (equal class weights) will be used as a bases for adding reject option:

>> r=sdroc(out,'reject')
ROC (1001 wr-based op.points, 4 measures), curop: 1
est: 1:frac(reject)=0.00, 2:TPr(1)=0.86, 3:TPr(2)=0.97, 4:TPr(reject)=0.00

We can visualize the decisions using sdscatter.

>> sdscatter(ts,p*r,'roc',r)

The horizontal axis represents the fraction of rejected objects; vertical the true positive ratio for the first class. By moving out of the default operating point (where rejection is not performed), we can observe how rejection takes place far from both trained distributions.

We can select a specific point by left mouse click and store it back to the r object by pressing s key. We can now estimate the confusion matrix on the test set:

>> r=setcurop(r,6); %  Setting the operating point 6 in sdroc object r
ROC (1001 wr-based op.points, 4 measures), curop: 6
est: 1:frac(reject)=0.03, 2:TPr(1)=0.82, 3:TPr(2)=0.95, 4:TPr(reject)=0.00
>> sdconfmat(ts.lab,ts*p*r)

ans =

 True      | Decisions
 Labels    | 1      2      reject  | Totals
--------------------------------------------
 1         |    62     11      3   |    76
 2         |     2     69      2   |    73
--------------------------------------------
 Totals    |    64     80      5   |   149

We can see that the confusion matrix contains the added reject decision.

9.6.2. Rejection close to the decision boundary ↩

To illustrate the rejection close to the decision boundary, we will continue in the example above. We make a single change in the procedure above: we normalize the model soft outputs to sum to one.

>> p2=sdquadratic(tr) 
sequential pipeline     2x2 'Quadratic discr.'
 1  Gauss full cov.         2x2  2 classes, 2 components (sdp_normal)
 2  Output normalization    2x2  (sdp_norm)
>> out2=ts*p2
'Highleyman Dataset' 150 by 2 sddata, 2 classes: '1'(77) '2'(73)
>> +out2(1:5,:)

ans =

1.0000    0.0000
0.9717    0.0283
0.6639    0.3361
1.0000    0.0000
0.3843    0.6157

Note that the classifier outputs are now posterior probabilitites, not densities.

>> r2=sdroc(out2,'reject')
ROC (1001 wr-based op.points, 4 measures), curop: 1
est: 1:frac(reject)=0.00, 2:TPr(1)=0.87, 3:TPr(2)=0.97, 4:TPr(reject)=0.00

>> sdscatter(ts,p*r2,'roc',r2)

The red reject decision now occupies the area where errors would be highly probably.

9.6.3. Adding reject option to a specific operating point ↩

By default, the reject option will add rejection to a default operating point (with equal class weights). In order to add the reject option to a different operating point, we may pass operating set sdops or ROC object sdroc:

>> a 
'Fruit Set', 2000 by 2 sddata with 2 classes: 'apple' (983) 'banana' (1017)
>> p %  trained Parzen classifier
sequential pipeline     2x2 ''
 1  sdp_parzen          2x2  2 classes, 1601 prototypes
>> out=a*p %  soft outputs
'Fruit Set', 2000 by 2 sddata with 2 classes: 'apple' (983) 'banana' (1017)
>> r=sdroc(out) %  standard ROC
ROC (2001 w-based op.points, 3 measures), curop: 1014
est: 1:err(1)=0.02, 2:err(2)=0.02, 3:mean-error [0.50,0.50]=0.02

We will choose a different operating point - e.g. the point 100 where we're not loosing the class 2:

>> r=setcurop(r,100)
ROC (2001 w-based op.points, 3 measures), curop: 100
est: 1:err(1)=0.25, 2:err(2)=0.00, 3:mean-error [0.50,0.50]=0.13

To create a rejection curve starting from this operating point, just pass the r to the reject option:

>> r2=sdroc(out,'reject',r)
ROC (1001 wr-based op.points, 4 measures), curop: 1
est: 1:frac(reject)=0.00, 2:TPr(1)=0.75, 3:TPr(2)=1.00, 4:TPr(reject)=0.00

The eventual pipeline would be:

>> p2=p*r2
sequential pipeline     2x1 'Parzen+Decision'
 1  Parzen                  2x2  2 classes, 200 prototypes (sdp_parzen)
 2  Decision                2x1  weight+reject, 3 decisions, 1001 ops at op 1 (sdp_decide)
>> p2.list
sdlist (3 entries)
 ind name
   1 1 
   2 2
   3 reject
ans =

9.6.4. Setting the rejection manually by discarding fraction of data ↩

Instead of constructing the full reject curve, we may also directly add a reject option to our multi-class classifier by specifying the fraction of objects to be discarded. We need to pass this reject fraction to the reject option. In this example, we reject 1% of all data:

>> r=sdroc(a*p,'reject',0.01)
ROC (wr-based op.point, 4 measures)
est: 1:frac(reject)=0.01, 2:TPr(1)=0.97, 3:TPr(2)=0.97, 4:TPr(reject)=0.00
>> sdscatter(a,p*r)

In case, we need to specify reject fraction and use specific operating point, we may supply it as parameter to the additional reject option:

>> r=sdroc(a*p)
ROC (2001 w-based op.points, 3 measures), curop: 1014
est: 1:err(1)=0.02, 2:err(2)=0.02, 3:mean-error [0.50,0.50]=0.02
>> r=setcurop(r,100)
ROC (2001 w-based op.points, 3 measures), curop: 100
est: 1:err(1)=0.25, 2:err(2)=0.00, 3:mean-error [0.50,0.50]=0.13

>> r2=sdroc(a*p,'reject',0.01,'reject',r) %  operating point in r is used
ROC (wr-based op.point, 4 measures)
est: 1:frac(reject)=0.01, 2:TPr(1)=0.73, 3:TPr(2)=1.00, 4:TPr(reject)=0.00