3D Head Tracking in Video
How a simple model can do big.
In this post, I will describe my own implementation of a head tracker. 3D Head Tracking (HT) consists of inferring the 3D orientation and displacement of the head, often from a (single) video source. Here, the video source will be a Logitech C910 webcam. Of course, any webcam will do. Video grabing and image processing will be done using OpenCV library.
The outline of the algorithm is as follow:
- Grab a frame and detect 2D features.
- Initialize the head pose.
- Compute 3D features→FTold.
- Grab a frame and detect 2D features.
- Compute 3D features →FTnew.
- Compute motion that registers FTnew→FTold.
- Update head pose.
- FTold = FTnew and go to 4.
At first glance, the toughest step in this outliine seems to be the 2D→3D features conversion. It turns out this is among the easiest task thanks to a simple idea: Cylindrical head model. In a nutshell, 2D features are unprojected from the camera reference to a virtual cylinder. This intersection provides the
sought 3D positions of the image features. But first thing first…
Grabing an image is easy using OpenCV. Boiler plate code for that is a loop that looks like:
Mat frame, img;
VideoCapture capture;
int dev_id = 1; //Device number.
capture.open(dev_id);
if (!capture.isOpened()){
cerr<< "Failed to open video device "
<< dev_id<<" \n"<<endl;
return 1;
}
for (;;){
capture>>frame;
if ( frame.empty() )
continue;
frame.copyTo(image);
imshow( window_name , image );
char key = (char) waitKey(5);
if( key == ' ' )
break;
}
In each input frame, 2D features are detected. Among the myriad of features, KLT are probably the most suited to our real-time needs. Indeed, KLT are easy and fast to compute because there is no descriptor computation and no scale-space analysis is involved (at least not as SIFT). Using OpenCV, KLT features are retrieved as follow:
int MAX_COUNT=100;
TermCriteria termcrit(CV_TERMCRIT_ITER|
CV_TERM_CRIT_EPS,
20, 0.3);
// We use two sets of points in order to swap
// pointers.
vector<Point2d> points[2];
Size subPixWinSize(10,10), winSize(21,21);
//Convert image to gray scale.
cvtColor(image,gray,CV_RGB2GRAY);
//Feature detection is performed here...
goodFeaturesToTrack(gray, points[1], MAX_COUNT,
0.01, 10, Mat(), 3, 0, 0.04);
cornerSubPix(gray, points[1], subPixWinSize,
Size(-1,-1), termcrit);
Now that features are detected, they are unprojected and intersected with the virtual cylinder. Exact solution to this ray-cylinder intersection could easily be found on the net. Now that we have 3D positions of features at time Tt-1 the same features are tracked in the upcoming frame using optical flow routine from OpenCV:
calcOpticalFlowPyrLK(prev_gray, gray,
points[0], points[1],
status, err);
The result of this tracking is a set of features at time Tt. To get the change in head pose, we register the 3D features at time Tt-1 with 2D features at time Tt. This is performed using a PnP algorithm. Because the virtual cylinder represents the head (a rough estimate!), it must be updated with the incremental pose
just computed. In a sens, the cylinder is a state object of the tracked head.
The head pose algorithm runs comfortably on a 2.4 ghz laptop using a Logitech C910 webcam as the following video depicts:
xxxxxxxxxx