Re: Dynamic whiteboard archiving

Hi!

On Mar 14, 2016, at 12:39 AM, Toby Schachman wrote:

I have had mixed luck with this approach, as I believe Rob has as well. But today I got some potentially promising results following this tutorial on panorama image stitching with SIFT features in Python.

Maybe this’ll come up at the Memex Jam (!), but I BCC’ed the research gallery on correspondence with Maneesh on this point last July:

Dear Maneesh,

On Jul 2, 2015, at 12:24 AM, Maneesh Agrawala wrote:
That would be great! I looked through the 3d puppet code/approach and it seems like it will be a useful reference. The project is so beautiful! I see what you mean about blurring physical/virtual boundaries, and how to create open-ended mental models. Technically speaking, I had tried to do something similar when I wanted to make the book-page turning installation, but (as you would have probably anticipated before writing an implementation) SIFT homographies are completely useless on typographic elements. (Turns out the features/corners of digital typography are rather uniform—ha!)

I like using SIFT for feature detection, but it's parameter settings can be a little finicky. Still I think it should work well on text elements unless there are resolution issues.
I'd be interested in hearing more about what you tried with SIFT. SIFT can be a little slow, so for tracking you may need an optical flow-based technique like KLT.

I dug up some of the screens from when I had tried SIFT for identification—it was actually for my “video binder” experiment, not for typography as I had mistakenly recalled. Here you can see the working homography:

It seemed to work on the first try, but once I had different printouts (in the same format), it wasn’t helpful for the n-way classification because the SIFT keypoint detector was picking up so many image corners that it could always find a correspondence. Here’s it failing (with a false positive), warping all manner of identical corners to the same dark corner of the sheet I’m holding:

Anyway, in this particular case, using the thumbnail grid as a fiducial and matching within each thumbnail works very well—once you can isolate the individual images it becomes trivial to classify the page—but I show this just to demonstrate that the implicit assumptions of some computer vision techniques that make perfect sense in the natural world (e.g. that corners are unique and reproducible discriminants) may not apply in a designed environment (e.g. where the harshest corners may be the least meaningful discriminants).

I’m trying to figure out a generic approach to classification/tracking for very different types of objects (3-D, natural, designed, templated, non-opaque, &c) but suspect I will simply need to take a hybrid approach.

Maneesh wrote back:

Thanks for sending the examples. My sense is that the difference in orientations and resolutions between the images is
too large for SIFT. While SIFT is invariant to a range of transformations, that range is a bit limited -- especially when the
resolution is relatively low in one of the images (i.e. the image containing the printout). In this case going with the grid
structure as you did will be more robust for sure.

… anyway, my conclusion is that SIFT matching is optimized for individual images, not for designed objects, especially when high-level features (in this case the very fact of the thumbnail grid) are the same for every instance of the design.

RMO

Date:	Wed, 16 Mar 2016 11:47:10 +0100
From:	Robert Ochshorn
Subject:	Re: Dynamic whiteboard archiving