PRSD Studio Documentation development version 2.0.9 (8-Mar-2010)

Chapter 6: Data visualization

Table of contents

6.1. Interactive scatter plot ↩

PRSD Studio provides an interactive scatter plot sdscatter. We can launch it on any data set - here we create a data set with three features computed from road sign images. We will compute mean, standard deviation and median of each data set row (image reshaped to a vector):

>> a
381 by 1024 sddata, 17 classes: [31  28  24  33  19  21  57  26  21   9  13  15  14   1  14  29  26]

>> a2=setdata(a,[mean(+a,2) std(+a,0,2) median(+a,2)])
Warning: Feature names reset to 'Feature X' format.
> In <a href="error:/Users/pavel/ws/misc/tools/prsd_toolbox/DEV/src/prsd/@sddata/setdata.m,31,1">sddata.setdata at 31</a>
381 by 3 sddata, 17 classes: [31  28  24  33  19  21  57  26  21   9  13  15  14   1  14  29  26]

Note the warning message stating that feature labels of the new data set were set automatically.

>> getfeatlab(a2)
sdlab with 3 entries, 3 groups: 'Feature 1'(1) 'Feature 2'(1) 'Feature 3'(1) 

We may set the feature labels to more descriptive names using setfeatlab:

>> a2=setfeatlab(a2,sdlab('mean','std','median'))
381 by 3 sddata, 17 classes: [31  28  24  33  19  21  57  26  21   9  13  15  14   1  14  29  26]

Alternatively, we may provide the feature labels directly in the setdata call:

>> a2=setdata(a,[mean(+a,2) std(+a,0,2) median(+a,2)],sdlab('mean','std','median'))
381 by 3 sddata, 17 classes: [31  28  24  33  19  21  57  26  21   9  13  15  14   1  14  29  26]

In order to visualize the scatter plot, we invoke the sdscatter command:

>> sdscatter(a2)
ans =
 1

sdscatter opens a new figure and returns its handle:

The figure shows scatter plot of the first two features in the data set. Each point represents one data sample (here a road sign). The color and marker styles correspond to different classes.

By moving the mouse over the plot, we're shifting focus to the closest data sample represented by black marker. The figure title provides details about the highlighted sample, such as its index in the data set and class.

6.1.1. Legend ↩

The legend may be switched on either by pressing the l key (as in legend) or using Show legend command in Scatter menu.

Note that pressing the legend toolbar button does not show correct class names in the legend; this is a known issue.

6.1.2. Changing features ↩

We can change features shown in sdscatter using cursor keys. "Left" and "Right" arrow flips through the features on the horizontal and "Up" and "Down" through the features on the vertical axis.

In order to directly select a feature of interest, use right click on the axis legend. A pop-up menu will appear listing the features available.

If more than 25 features are present in the data set, a dialog will appear allowing us to select a feature by its index.

6.1.3. Sample inspector ↩

Sample inspector shows a detailed view of a current sample. It is especially useful if data samples in the data set represent images (such as in our road sign example).

We can select the Show sample inspector command from Scatter menu. The dialog opens asking for the name of the data set which contains the image data. We will type a2 and click on OK. A separate window opens showing the road sign image of the currently highlighted example:

You can use the sample inspector to identify outliers or to understand which objects fall in the area of overlap.

6.1.4. Switching between different sets of labels ↩

It is often beneficial to use multiple sets of labels. For example, in a medical problem, we may be interested not only in the top-level class such as 'cancer'/'non-cancer' but also in specific type of tissue or in the patient the sample originates from.

sdscatter may visualize any sample labeling available in the data set. Any sdlab object stored as a sample property is available.

Let's use a medical data set from cancer detection problem in this example. It contains information on pixels in scans of multiple patients. For each pixel, we know the high-level label such as 'cancer'/'non-cancer' more precise tissue type and patient:

>> load medical;
>> a'
'medical all' 225119 by 11 sddata, 2 classes: 'cancer'(56652) 'non-cancer'(168467) 
sample props: 'lab'->'class' 'class'(L) 'pixel'(N) 'patient'(L) 'tissue'(L)
feature props: 'featlab'->'featname' 'featname'(L)
data props:  'data'(N)

>> sdscatter(a)

We may switch between different labels via Use property command in Scatter menu.

Switching to patient labeling:

We may switch quickly to a specific property using the 1-9 shortcut keys. In our example, the tissue property is accessible by pressing '3':

6.1.5. Visualizing subsets of samples ↩

sdscatter allows us to show only subset of samples defined by label values. This feature is accessible via the Sample filter command in Scatter menu.

For example, we may be interested only in non-cancer tissues. We can select only non-cancer examples in *Scatter/Sample filter/class*.

We may combine multiple filters. For example, we might be interested only in non-cancer of patient 'Dick':

Note that sdscatter preserves the axes limits of the total data set also for the sample subsets. This gives us important clues about position of the subset within the total data distribution. If we are interested in the detailed view of the subset, we may enter the automatic mode by pressing 'a' key. The limits will then be set according to the subset. Pressing 'a' again returns us to the full data set limits.

When visualizing sample subsets, we may freely move between different sets of labels. For example, by pressing '3' we use 'tissue' property which shows us the specific non-cancer tissues of Dick:

To quickly return to the previous filter, use 'f' key or *Sample filter/Apply previous filter* command. This allows us to understand differences between distributions

Visible subset of samples may be stored in a new data set in Matlab workspace using Create data set with visible samples menu command.

6.1.6. Bringing class to top, z-order of classes ↩

Overlapping classes may easily obscure scatter plots of large data sets. sdscatter provides Class to top command in the Scatter menu which allows us to bring desired class on top. In this way, we can better understand what happens in the area of overlap.

We will demonstrate this function on the artificially-generated three-class data set created by the gendatf function:

>> a=gendatf(10000)
'Fruit set' 10000 by 2 sddata, 3 classes: 'apple'(3333) 'banana'(3333) 'stone'(3334) 
>> sdscatter(a)

The stone class obscures the banana distribution. By selecting Class to top and banana, we change the order in which the classes are plotted, so that banana appears on top.

sdscatter also offers two keystrokes for easy flipping through the plotting order (z-order) of classes using + and - keys (to make things simpler, the = works as + so three is no need to hold SHIFT).

6.1.7. Hand-painting class labels ↩

sdscatter allows us to define class labels directly by painting. In this way, we can interactively label interesting groups of samples such as outliers, areas of overlap or class modes.

Painting is accessible both from the Scatter menu and from context-sensitive menu.

We need to specify which class to paint. It can be either one of the existing classes or we can create a new class. In our example, we are interested in the area of overlap and will, therefore, create a new class called overlap.

In painting mode, the square is added to the scatter plot axis. By holding left mouse button, we assign the samples included in the square into the desired class.

Note that while painting, you can freely switch between features to find the best views for your problem. You can also hide some of the classes using Class visibility command. Painting assigns the labels only to visible data samples.

When finished, choose Stop painting from the context menu or from the Scatter menu.

6.1.8. Renaming classes ↩

sdscatter provides a simple way to rename classes. This facility is helpful to re-arrange the data set or to assign meaning to labels generated by cluster analysis.

The function is accessible through Rename class command in the context menu or in the Scatter menu.

We can, for example, rename the apple and banana classes into fruit. Using the Create data set in workspace command from Scatter menu, we can save this data set into the Matlab work-space. The resulting data set will have only two classes, namely stone and fruit.

>> b  %  Created sddata b with all label sets.
Fruit set, 10000 by 2 sddata, 2 classes: 'stone'(3334) 'fruit'(6666)
>> b.lab.list
sdlist (2 entries)
 ind name
   1 stone 
   2 fruit 

Note that interactive renaming of classes makes sense when used with interactively defined classes. For existing classes in the data set, it is simpler to use the sdrelab function as we discussed here.

6.2. Interactive image view ↩

PRSD Studio provides sdimage command for interactive image visualization. Let's inspect its capabilities with the following example image.

>> im = imread('roadsign09.bmp');
>> sdimage(im);

The image a has three RGB values, visualized as three separate bands. We can move between the bands with the 'up' and 'down' cursor keys. Each pixel is a data sample, the figure title shows the pixel's value and class label ('unknown' by default)

Using space bar, we may toggle label layer off and on. We may also adjust label transparency from very transparent to opaque in Image menu.

6.2.1. Hand-painting class labels ↩

sdimage allows us to paint class labels for image regions. In order to enter the 'paint' mode, use the Paint mode in the Image menu, select the Create new class command:

A dialog window will ask for the name of the class. Let's say we are interested in labeling samples from the road, we provide the name of the class and paint in the image region. Via the Image menu, or by clicking the right mouse button we can:

6.2.2. Saving hand-painted labels ↩

The Create data set in workspace command in Image menu lets us store the image data together with the painted labels in a new sddata object in Matlab workspace. We are asked to provide the variable name for this new data set.

6.2.3. Connecting sdimage and sdscatter ↩

It is often useful to inspect the connection between image neighborhoods and the scatter plot. In order to visualize this connection the sdscatter and sdimage commands can be used together.

We may simply show data set with image data using sdscatter and then connect the sdimage plot to the scatter figure using the returned figure handle.

>> data2    %  Created data set data2 in the workspace.
412160 by 3 sddata, 2 classes: 'unknown'(399848) 'road'(12312) 

>> h=sdscatter(data2)
h =
     2

>> sdimage(data2,h); 

By moving mouse pointer over the image, we may see where the image pixel appears in the feature space. Similarly, moving over the scatter plot shows us the corresponding pixel.

By painting the in the scatter plot, the linked image plot also updates. This helps us to analyze position of specific feature space clusters in image domain: