PRSD Studio Documentation development version 2.0.9 (8-Mar-2010)

Chapter 11: Custom algorithms

Table of contents

11.1. Introduction ↩

Design of pattern recognition systems requires to handle systems built from many low-level components such as feature extractors, models and decision functions. PRSD Studio provides a tool to model the entire pattern recognition system: the sdalg algorithm.

Major features of the sdalg algorithm:

11.1.1. Algorithm function ↩

Let us illustrate the use of sdalg algorithm on a simple example. We consider a classifier trained in two steps, first reducing the dimensionality using the Principal Component Analysis (PCA), and then training a classifier in the resulting subspace. This classification system is fully defined in a function sda_pca_clf.

  1: function out = sda_pca_clf(alg,data)

  3:    if nargin==0 

  5:        alg=sdalg(mfilename);               

  7:        alg.frac=0.9;                       
  8:        alg.clf=sdlinear;                   

 10:        out=alg;

 12:    elseif totrain(alg) 

 14:        alg.pca=sdpca(data,alg.frac);
 15:        data=data*alg.pca;

 17:        alg.trclf=sddecide(data*alg.clf);   

 19:        out=setstate(alg,'trained');

 21:    elseif toexecute(alg) 

 23:        data=data*alg.pca;                  
 24:        out=data*alg.trclf;                 

 26:    end

Algorithm function takes two input parameters, namely the algorithm object alg and the data set data and returns the output out. The type of out will be different if training or executing the algorithm.

The function contains three sections. The first one describes algorithm initialization (lines 3-10), the second training (lines 12-19) and the third the algorithm execution (lines 21-24).

In the initialization section, we instantiate the algorithm object and attach it to the function (in our case to sda_pca_clf). We may then add arbitrary parameters useful for making our algorithm more general. In our case, we want to adjust the fraction of preserved variance of the PCA dimensionality reduction and the classifier model.

Algorithm parameters behave analogously to Matlab structure fields. It is because sdalg is nothing more than a structure connected to a function.

Final statement in initialization section makes sure the algorithm object alg gets returned in variable out.

The training section describes training of initialized algorithm on sddata object data. First we train PCA projection and store it as alg.pca. On line 15, we project the input data using the trained PCA. Line 17 trains the model alg.clf on projected data and adds a default operating point. Finally, we set the algorithm state to 'trained' and return it on line 19.

The execution section is invoked on a trained algorithm and sddata object data. In our algorithm, if merely projects input data by trained PCA projection and then applies the alg.trclf pipeline which executes the trained classifier model and performs decisions. The output out is therefore the sdlab object with decisions for each sample in data.

11.1.2. Algorithm training and execution ↩

We can construct an untrained algorithm by simply calling its definition function without parameters:

>> alg=sda_pca_clf
untrained sdalg 'sda_pca_clf'
 frac            1x1          8  double   0.9
 clf             0x0       1034  sdppl    untrained sdlinear

We will use the medical problem in our example:

>> load medical
>> a
'medical all' 259783 by 11 sddata, 3 classes: 'cancer'(56652) 'non-cancer'(168467) 'shadow'(34664) 

>> b=a(:,:,1:2) %  only cancer and non-cancer classes
'medical all' 225119 by 11 sddata, 2 classes: 'cancer'(56652) 'non-cancer'(168467) 

We will use first 8 patients as a test set and the remaining ones for training:

>> [ts,tr]=subset(b,'patient',1:8)
'medical all' 112053 by 11 sddata, 2 classes: 'cancer'(33785) 'non-cancer'(78268) 
'medical all' 113066 by 11 sddata, 2 classes: 'cancer'(22867) 'non-cancer'(90199) 

To train the algorithm, we call directly the algorithm function:

>> tralg=sda_pca_clf(alg,tr)
trained sdalg 'sda_pca_clf'
 frac            1x1          8  double   0.9
 clf             0x0       1034  sdppl    untrained sdlinear
 pca            11x2       5272  sdppl    trained sdp_affine
 trclf           2x1      17440  sdppl    trained sdp_decide

Simpler alternative is to use the multiplication operator:

>> tralg=tr*alg
trained sdalg 'sda_pca_clf'
 frac            1x1          8  double   0.9
 clf             0x0       1034  sdppl    untrained sdlinear
 pca            11x2       5272  sdppl    trained sdp_affine
 trclf           2x1      17440  sdppl    trained sdp_decide

Training added pca and trclf fields. We may display the trained classifier trclf:

>> tralg.trclf
sequential pipeline     2x1 'Gauss eq.cov.+Output normalization+Decision'
 1  Gauss eq.cov.           2x2  2 classes, 2 components (sdp_normal)
 2  Output normalization    2x2  (sdp_norm)
 3  Decision                2x1  weighting, 2 classes, 1 ops at op 1 (sdp_decide)

To execute the algorithm, apply it to the test set:

>> dec=ts*tralg
sdlab with 112053 entries, 2 groups: 'cancer'(15134) 'non-cancer'(96919) 

Algorithms may be executed only on sddata objects and the execution output may be either decisions (sdlab object) or soft output (sddata object).

11.1.3. Converting algorithms into pipelines for out-of-Matlab execution ↩

In order to execute a complex algorithm outside of Matlab, we need to convert it into an sdppl pipeline. We may add an extra toconvert section in the algorithm function to describe this conversion.

For our example algorithm, the pipeline construction would be simple. We will only need to return concatenate the PCA projection with the trained classifier.

function out = sda_pca_clf(alg,data)

if nargin==0                            

    alg=sdalg(mfilename);               

    alg.frac=0.9;                       
    alg.clf=sdlinear;                   

    out=alg;

elseif totrain(alg)                     

    alg.pca=sdpca(data,alg.frac);
    data=data*alg.pca;

    alg.trclf=sddecide(data*alg.clf);   

    out=alg;

elseif toexecute(alg)                   

    data=data*alg.pca;                  
    out=data*alg.trclf;                 

elseif toconvert(alg)

    out=alg.pca*alg.trclf;

end

The trained algorithm may be now converted using sdconvert function:

>> p=sdconvert(tralg)
sequential pipeline     11x1 'PCA+Gauss eq.cov.+Output normalization+Decision'
 1  PCA                    11x2  93%% of variance (sdp_affine)
 2  Gauss eq.cov.           2x2  2 classes, 2 components (sdp_normal)
 3  Output normalization    2x2  (sdp_norm)
 4  Decision                2x1  weighting, 2 classes, 1 ops at op 1 (sdp_decide)