Given a long sequence of points in the parameter space, we want to divide it into sub-trajectories. These sub-trajectories correspond to actions or visual units in the video sequence.
The aim of the segmentation is to extract some nodes that will split the trajectory into several sub-trajectories. The nodes will form the beginnings and ends of the sub-trajectories. The sub-trajectories are computed in order to be grouped later in the process, so similar sub-trajectories should also have similar beginnings and ends respectively. Furthermore, we would like to find the points where different behaviours split or converge together. Finding points of high density in the appearance parameter space is a good way of meeting these requirements.
In order to find the high density points, we use the sample mean shift, described by Comaniciu and Meer [3]. We iteratively modify our current estimate of the local maxima of density by moving to the mean of the closest points of the current estimate. The process converges to the position of the local maximum density.
![]() |
We initialise the mean shift algorithm at each point of the trajectory in turn. Running the algorithm to convergence finds all the nearly local maxima in the density estimate. The trajectory points nearest to each local maxima are defined to be the nodes splitting the full path into sub-trajectories. In practice, we only do this for each points from the training sequence. This improves efficiency with a negligible effect on the result. Figure 2 shows an example of nodes extracted from a hand drawn trajectory.