On Jul 2, 2015, at 12:24 AM, Maneesh Agrawala wrote:
That would be great! I looked through the 3d puppet code/approach and it seems like it will be a useful reference. The project is so beautiful! I see what you mean about blurring physical/virtual boundaries, and how to create open-ended mental models. Technically speaking, I had tried to do something similar when I wanted to make the book-page turning installation, but (as you would have probably anticipated before writing an implementation) SIFT homographies are completely useless on typographic elements. (Turns out the features/corners of digital typography are rather uniform—ha!)
I like using SIFT for feature detection, but it's parameter settings can be a little finicky. Still I think it should work well on text elements unless there are resolution issues.
I'd be interested in hearing more about what you tried with SIFT. SIFT can be a little slow, so for tracking you may need an optical flow-based technique like KLT.
I dug up some of the screens from when I had tried SIFT for identification—it was actually for my “
video binder” experiment, not for typography as I had mistakenly recalled. Here you can see the working homography:
It seemed to work on the first try, but once I had different printouts (in the same format), it wasn’t helpful for the n-way classification because the SIFT keypoint detector was picking up so many image corners that it could always find a correspondence. Here’s it failing (with a false positive), warping all manner of identical corners to the same dark corner of the sheet I’m holding:
Anyway, in this particular case, using the thumbnail grid as a fiducial and matching within each thumbnail works very well—once you can isolate the individual images it becomes trivial to classify the page—but I show this just to demonstrate that the implicit assumptions of some computer vision techniques that make perfect sense in the natural world (e.g. that corners are unique and reproducible discriminants) may not apply in a designed environment (e.g. where the harshest corners may be the least meaningful discriminants).
I’m trying to figure out a generic approach to classification/tracking for very different types of objects (3-D, natural, designed, templated, non-opaque, &c) but suspect I will simply need to take a hybrid approach.