In this post, I will describe my own implementation of a head tracker. 3D Head Tracking (HT) consists of inferring the 3D orientation and displacement of the head, often from a (single) video source. Here, the video source will be a Logitech C910 webcam. Of course, any webcam will do. Video grabing and image processing will be done using OpenCV library. The outline of the algorithm is as follow: 1. Grab a frame and detect 2D features. 2. Initialize the head pose. 3. Compute 3D features→FTold. 4. Grab a frame and detect 2D features. 5. Compute 3D features →FTnew. 6. Compute motion that registers FTnew→FTold. 7. Update head pose. 8. FTold = FTnew and go to 4. At first glance, the toughest step in this outliine seems to be the 2D→3D features conversion. It turns out this is among the easiest task thanks to a simple idea: Cylindrical head model. In a nutshell, 2D features are unprojected from the camera reference to a virtual cylinder. This intersection provides the sought 3D positions of the image features. But first thing first… Grabing an image is easy using OpenCV. Boiler plate code for that is a loop that looks like: ``` Mat frame, img; VideoCapture capture; int dev_id = 1; //Device number. capture.open(dev_id); if (!capture.isOpened()){ cerr<< "Failed to open video device " << dev_id<<" \n"<>frame; if ( frame.empty() ) continue; frame.copyTo(image); imshow( window_name , image ); char key = (char) waitKey(5); if( key == ' ' ) break; } ``` In each input frame, 2D features are detected. Among the myriad of features, KLT are probably the most suited to our real-time needs. Indeed, KLT are easy and fast to compute because there is no descriptor computation and no scale-space analysis is involved (at least not as SIFT). Using OpenCV, KLT features are retrieved as follow: ``` int MAX_COUNT=100; TermCriteria termcrit(CV_TERMCRIT_ITER| CV_TERM_CRIT_EPS, 20, 0.3); // We use two sets of points in order to swap // pointers. vector points[2]; Size subPixWinSize(10,10), winSize(21,21); //Convert image to gray scale. cvtColor(image,gray,CV_RGB2GRAY); //Feature detection is performed here... goodFeaturesToTrack(gray, points[1], MAX_COUNT, 0.01, 10, Mat(), 3, 0, 0.04); cornerSubPix(gray, points[1], subPixWinSize, Size(-1,-1), termcrit); ``` Now that features are detected, they are unprojected and intersected with the virtual cylinder. Exact solution to this ray-cylinder intersection could easily be found on the net. Now that we have 3D positions of features at time Tt-1 the same features are tracked in the upcoming frame using optical flow routine from OpenCV: ``` calcOpticalFlowPyrLK(prev_gray, gray, points[0], points[1], status, err); ``` The result of this tracking is a set of features at time Tt. To get the change in head pose, we register the 3D features at time Tt-1 with 2D features at time Tt. This is performed using a PnP algorithm. Because the virtual cylinder represents the head (a rough estimate!), it must be updated with the incremental pose just computed. In a sens, the cylinder is a state object of the tracked head. The head pose algorithm runs comfortably on a 2.4 ghz laptop using a Logitech C910 webcam as the following video depicts:

Computer Vision and other Fun

Jamil Draréni

3D Head Tracking in Video