- 11.1. Introduction
11.1. Introduction ↩
Design of pattern recognition systems requires to handle systems built from
many low-level components such as feature extractors, models and decision
functions. PRSD Studio provides a tool to model the entire pattern
recognition system: the sdalg algorithm.
Major features of the sdalg algorithm:
- Simple to define: Algorithm training and execution is defined in a single Matlab function
- Parametrized: User may define named parameters so that one algorithm may describe the whole class of methods. For example, the classifier model may be a parameter.
- Self-contained: All algorithm parameters are kept inside the algorithm object (including the operating point) so untrained or trained algorithms may be stored and reused.
- Easy to deploy: The user controls the conversion of a trained algorithm into a pipeline for fast execution using libPRSD.
11.1.1. Algorithm function ↩
Let us illustrate the use of sdalg algorithm on a simple example. We
consider a classifier trained in two steps, first reducing the
dimensionality using the Principal Component Analysis (PCA), and then
training a classifier in the resulting subspace. This classification
system is fully defined in a function sda_pca_clf.
1: function out = sda_pca_clf(alg,data)
3: if nargin==0
5: alg=sdalg(mfilename);
7: alg.frac=0.9;
8: alg.clf=sdlinear;
10: out=alg;
12: elseif totrain(alg)
14: alg.pca=sdpca(data,alg.frac);
15: data=data*alg.pca;
17: alg.trclf=sddecide(data*alg.clf);
19: out=setstate(alg,'trained');
21: elseif toexecute(alg)
23: data=data*alg.pca;
24: out=data*alg.trclf;
26: end
Algorithm function takes two input parameters, namely the algorithm object
alg and the data set data and returns the output out. The type of
out will be different if training or executing the algorithm.
The function contains three sections. The first one describes algorithm initialization (lines 3-10), the second training (lines 12-19) and the third the algorithm execution (lines 21-24).
In the initialization section, we instantiate the algorithm object
and attach it to the function (in our case to sda_pca_clf). We may then
add arbitrary parameters useful for making our algorithm more general. In
our case, we want to adjust the fraction of preserved variance of the PCA
dimensionality reduction and the classifier model.
Algorithm parameters behave analogously to Matlab structure fields. It is
because sdalg is nothing more than a structure connected to a
function.
Final statement in initialization section makes sure the algorithm object
alg gets returned in variable out.
The training section describes training of initialized algorithm on
sddata object data. First we train PCA projection and store it
as alg.pca. On line 15, we project the input data using the trained PCA.
Line 17 trains the model alg.clf on projected data and adds a default
operating point. Finally, we set the algorithm state to 'trained' and
return it on line 19.
The execution section is invoked on a trained algorithm and
sddata object data. In our algorithm, if merely projects input
data by trained PCA projection and then applies the alg.trclf pipeline
which executes the trained classifier model and performs decisions. The
output out is therefore the sdlab object with decisions for each
sample in data.
11.1.2. Algorithm training and execution ↩
We can construct an untrained algorithm by simply calling its definition function without parameters:
>> alg=sda_pca_clf
untrained sdalg 'sda_pca_clf'
frac 1x1 8 double 0.9
clf 0x0 1034 sdppl untrained sdlinear
We will use the medical problem in our example:
>> load medical
>> a
'medical all' 259783 by 11 sddata, 3 classes: 'cancer'(56652) 'non-cancer'(168467) 'shadow'(34664)
>> b=a(:,:,1:2) % only cancer and non-cancer classes
'medical all' 225119 by 11 sddata, 2 classes: 'cancer'(56652) 'non-cancer'(168467)
We will use first 8 patients as a test set and the remaining ones for training:
>> [ts,tr]=subset(b,'patient',1:8)
'medical all' 112053 by 11 sddata, 2 classes: 'cancer'(33785) 'non-cancer'(78268)
'medical all' 113066 by 11 sddata, 2 classes: 'cancer'(22867) 'non-cancer'(90199)
To train the algorithm, we call directly the algorithm function:
>> tralg=sda_pca_clf(alg,tr)
trained sdalg 'sda_pca_clf'
frac 1x1 8 double 0.9
clf 0x0 1034 sdppl untrained sdlinear
pca 11x2 5272 sdppl trained sdp_affine
trclf 2x1 17440 sdppl trained sdp_decide
Simpler alternative is to use the multiplication operator:
>> tralg=tr*alg
trained sdalg 'sda_pca_clf'
frac 1x1 8 double 0.9
clf 0x0 1034 sdppl untrained sdlinear
pca 11x2 5272 sdppl trained sdp_affine
trclf 2x1 17440 sdppl trained sdp_decide
Training added pca and trclf fields. We may display the trained classifier trclf:
>> tralg.trclf
sequential pipeline 2x1 'Gauss eq.cov.+Output normalization+Decision'
1 Gauss eq.cov. 2x2 2 classes, 2 components (sdp_normal)
2 Output normalization 2x2 (sdp_norm)
3 Decision 2x1 weighting, 2 classes, 1 ops at op 1 (sdp_decide)
To execute the algorithm, apply it to the test set:
>> dec=ts*tralg
sdlab with 112053 entries, 2 groups: 'cancer'(15134) 'non-cancer'(96919)
Algorithms may be executed only on sddata objects and the
execution output may be either decisions (sdlab object) or soft
output (sddata object).
11.1.3. Converting algorithms into pipelines for out-of-Matlab execution ↩
In order to execute a complex algorithm outside of Matlab, we need to
convert it into an sdppl pipeline. We may add an extra toconvert
section in the algorithm function to describe this conversion.
For our example algorithm, the pipeline construction would be simple. We will only need to return concatenate the PCA projection with the trained classifier.
function out = sda_pca_clf(alg,data)
if nargin==0
alg=sdalg(mfilename);
alg.frac=0.9;
alg.clf=sdlinear;
out=alg;
elseif totrain(alg)
alg.pca=sdpca(data,alg.frac);
data=data*alg.pca;
alg.trclf=sddecide(data*alg.clf);
out=alg;
elseif toexecute(alg)
data=data*alg.pca;
out=data*alg.trclf;
elseif toconvert(alg)
out=alg.pca*alg.trclf;
end
The trained algorithm may be now converted using sdconvert function:
>> p=sdconvert(tralg)
sequential pipeline 11x1 'PCA+Gauss eq.cov.+Output normalization+Decision'
1 PCA 11x2 93%% of variance (sdp_affine)
2 Gauss eq.cov. 2x2 2 classes, 2 components (sdp_normal)
3 Output normalization 2x2 (sdp_norm)
4 Decision 2x1 weighting, 2 classes, 1 ops at op 1 (sdp_decide)
