As we show later in this chapter, the algorithm based on the dynamic time warping and the normalised cut sometimes groups pathlets that look different. This results in the creation of a poor pathlet models. This problem is due to the fact that we are comparing pathlets in pairs using a similarity measure that is not necessarily compatible with our grouping model. A global similarity measure for several pathlets at a time would be much more appropriate since it can ensure that a group is coherent, a property that is not guaranteed with a similarity measure for only two pathlets. A global similarity measure leads to a groupwise clustering rather than the pairwise clustering of the dynamic time warping and normalised cut algorithm.
Furthermore, we want the similarity measure to be compatible with our grouping model. One logical choice for the similarity measure is the variance of the group of pathlets. Indeed several pathlets are properly grouped if the variance of the group is small. A large variance denotes a group containing pathlets that look different to each other.
Unfortunately, we cannot use the same clustering algorithm. The normalised cut algorithm is based on the use of a similarity matrix, which cannot easily be created for a groupwise similarity measure. Instead, we use a greedy algorithm to group the pathlets. At each step, the algorithm makes the best decision for the grouping.
The algorithm first assumes that each pathlet given by the trajectory segmentation initially forms a group by itself. Let be this initial set of groups.
For every pair of elements of
, we compute the variance of the group
that is built using the pathlets from both
and
. We select the pair of groups
that gives the lowest variance and merge those two groups to form only one group. We delete
and
from
and insert
instead.
We iterate the process until we reach a given number of clusters .
We can also stop the algorithm if we reach a given variance so that we can control the quality of the model. However, if we set this variance too low, we stop the algorithm too quickly and most of the resulting groups will only contain one pathlet, making the use of pathlet models superfluous in our framework. We prefer to control the number of pathlet models so that we can control the computational complexity of the remaining steps in the framework.
Figure 5.11 summarises the greedy algorithm.