Date: Mon, 11 May 2020 17:49:28 -0700
From: Luke Iannini
Subject: Re: The Story of Marks Kit
Bonus footage of Alex playing with the score toy and replacing eighth notes with peanuts and quarter notes with 2 peanuts which Just Worked
[video.webm]

(which suggests we could potentially achieve "so let's say this pen is the wind direction..." *right now*) 

On May 11, 2020, at 5:42 PM, Luke Iannini wrote:

The Story of Marks Kit

A few weeks ago I mentioned that the size of our objects were the biggest obstacle to working on DSP kit with limited table space. I'd weighed playing with a few other fiducial-style experiments, but I've had the "writing language" dream for so long — and it has so much potential and possibility beyond just solving "smaller objects" — that I decided to just dive all the way into that.

I've implemented various pieces over the years — in La Tabla, in shaders that got up to the contours stage, and my iPad language prototype, so I felt like I had a good handle on those. But the one thing I hadn't really tried to solve yet was mark matching. We had something based on SIFT in the later days of La Tabla but it was only moderately reliable.

Finding a matching algorithm

It seemed wise to start with a survey of the field. SIFT wasn't ideal as it's optimized for photographs of complex objects, or finding photos-in-photos like a book cover or a box of macaroni and cheese — and the demos only showed a few objects, whereas I wanted to have a whole alphabet. Most importantly, I wanted to be able to recognize an instance from a *single example*, which seemed to rule out machine-learning style classifiers that need training on piles of examples.

This is a good overview of the SIFT-like approaches:
which look really useful for things like assigning meaning to larger complex objects!

I found a different class of techniques called "Shape Matching & Retrieval" algorithms and started to survey those instead. This canonical dataset for these seems to be "MPEG-7", which looks like just what I'm looking for http://www.dabi.temple.edu/~shape/MPEG7/index.html — 1400 shapes. An algorithm called "Shape Context" came up as a venerated exemplar in this space (out of Berkeley in 2000). It's reasonably fast and is available in OpenCV. My test of it was pretty promising vs. the alternatives (e.g. Hu moments and Hausdorff distance, neither of which performed very well) — it nailed all the matches on my little test rig where I handed it a dozen symbols drawn twice.

This gives a good overview of the Shape Matching space that illustrates the basics of how Shape Context works:

OpenCV Kit

OK, let's try that! The implementation was hairy enough that I didn't want to spend weeks in that rabbit hole, so.... time to add OpenCV to Realtalk. We can skip a lot of the gory details here, but the end result is that you get something like DSP Kit where you can just write the relevant block of OpenCV code and an OpenCV translator will take care of generating a Lua-callable function for you that bridges all the images, polygons, arrays etc back and forth.

It also exposes arguments and types like DSP Kit does, so I built up the requisite visible pipeline so you can chain the blocks together and see the whole stream, and exposing the parameters for tuning with dials, etc (I can do a separate video on this). It's a lot of fun, and I used it to prototype the image-to-contours pipeline I'll describe next — ideally, there'd be a way to "compile" your pipeline once you've built it (and it should be pretty straightforward to build that), but I was itching to get to the main event : ).

So that's our first new capability: OpenCV kit. It should make exploring OpenCV's bag of techniques easier as we work towards a diverse set of approaches for recognition kit!

Polygons

Using the contours pipeline I built with OpenCV Kit, I added a To Know When for asking about polygons in a region.

The pipeline is Threshold->Contours->Simplify->Transform to Inches.

Thresholding is done with Otsu's method (courtesy OpenCV) which works really well.

Polygons are simplified using the Douglas-Peucker algorithm (built into OpenCV as "approxPolyDP") to lighten the load of drawing and matching them.

The polygons themselves are delivered in a "packed" format, since you don't always want to use them from Lua (i.e. you might just be handing them off to the matching system which would have to convert them back into C-compatible structures again), but there's a function to easily convert them illumination-compatible lists of points so you can manipulate and re-draw them however you like. I compute bounding boxes, areas and polygon colors during this pass so those are all available "for free". We also get the full "polygon hierarchy", allowing us to understand nesting of contours at arbitrary depths (e.g. circling a mark twice will produce a hierarchy of Circle->Circle->Mark).

When /page/ has polygons /polygons/, 
     /page/ has polygons bounding boxes /bboxes/, 
     /page/ has polygons areas /areas/, 
     /page/ has polygons colors /colors/,
     /page/ has polygons hierarchy /hierarchy/:
    ...
End

I built a small library of polygon-sampling algorithms on top of this, like finding similar points between polygons for the shape morphing demo, finding lines and re-sampling polygons for DSP kit waveforms, and so on.

Marks
The next stage is converting Polygons into Marks, which are fully-fledged Realtalk regions with identity. This lets us use all our spatial relations as well as associate statements. I use the hierarchy information to establish parent relationships between marks ("_ has parent mark _").

Wish (page) recognizes marks.
When /object/ is a "mark", /object/ has parent (page), /object/ has mark /mark_polygon/:
    ...
End

The most interesting part here is identity, which is currently a simple frame-to-frame bounding box similarity check. It work well enough, but it could use improvement (which will yield benefits for the next stage).

Symbols & Alphabets
To assign meaning to a mark, you can define it as a Symbol that represents any other Realtalk object, like a page of code. You can then match any shape against the Symbol, and run the matched Symbol's code on that shape.

Symbols can be optionally placed into different Alphabets, which are useful both for making shape matching faster and more robust as well as giving symbols different meaning depending on their context.

To define a symbol, you make a small square that says:

Claim (you) defines symbol. -- placed in the "universal" alphabet
Claim (you) defines symbol in the "music score" alphabet.

These will project a box on their right wherein you can draw a symbol, and a whisker on their left where you can attach a Page.

To match a symbol to some mark, you ask:
When /mark/ matches /symbol_page/ in the /alphabet_name/ alphabet:
    ...
End

Where "symbol_page" is the page that was associated with the mark via "define symbol". You can then get the code on that page or do whatever else you want with it.

This is a To Know When that kicks off a continuous matching operation, which bundles up all the symbols in the given alphabet and hands them to the Shape Context algorithm along with the symbol you want to match.

Manuscripts

A page that runs Marks is called a Manuscript, as in "a book, document, or piece of music written by hand rather than typed or printed".

You can make a page or box into a manuscript via
Claim (you) is a "manuscript".
which will be noticed by "Manuscripts", which uses all of the marks + symbols + alphabets layers above to find polygons, turn them into marks objects, and match them with symbols in the alphabets that are on the table.

Your symbol definitions are just regular Realtalk code that say things like
When (you) points "right" at /m/, /m/ is a "mark", /m/ has polygon /p/:
    ...
End

And now you're having a Real Good Time!

As mentioned in the video, the fun is of course that your Symbols can now interpret and actuate the world of Marks however they please, and a symbol can even override the Symbol system by wishing some nearby mark is *not* recognized, which is a good idea when you know you want to treat it as data but there are other creative uses as well.
Wish (mark) is not recognized.

To Illuminate your Manuscripts, your symbols can of course just wish for illuminations on themselves as usual (see addendum for details).

Once this is all running a bit quicker it's of course very exciting to imagine creating languages for defining new symbols directly in Marks!
(It should be pretty straightforward to port everything my iPad prototype was doing straight over)

Technical Details and Future Directions


Illuminating Manuscripts

I was surprised to achieve direct projection on marks! I figured it would cause too much interference, and at the beginning of the project I had it set up so that the illuminations for marks appeared on a separate region — that sucked so I worked on making it work. I found that certain colors like #0000FF blue can be mostly filtered out by the thresholding, so I added options to all the spatial relations I was using to customize the color of their illuminations.

But the most general thing was to just project a high-priority illumination on top of the manuscript. The obvious choice was transparent black, dimming all illuminations, but I discovered transparent white works really well too, lighting up the whole manuscript and making it easier to separate out the dark marks from the light illuminations and background! It also makes the illuminations much easier to see, at the slight expense of the kind of uncanny lit-up page. The light technique worked best at night, when I tend to work : ).

I've maybe mentioned it before — a really interesting route to interference-free illumination would be to build a tightly synchronized camera-projector rig. The idea would be to black out the illumination, capture a frame of the un-illuminated table, project a frame, etc. 60 times a second. This is similar to the "low-persistence" technique used by VR HMDs to ensure they're only sending light to your brain at the exact head position the world was rendered for. Persistence of vision would make it invisible to us. With this in hand we could use whatever brightness of illuminations we like, which would be pretty sweet.

Threads

OpenCV's Shape Context algorithm is pretty slow — it takes about 300ms on my laptop to match 12 marks to 12 alphabet symbols.

For a future implementation, I've found some very promising recent GPU implementations of similar styles of algorithms that are able to match across the entire 1400-symbol MPEG-7 dataset in single-digit milliseconds, which should be more in line with what we'd want for a really deep use of Marks kit! They also happen to be very robust to occlusion. ("An Efficient Shape Feature Extraction, Description and Matching Method Using GPU" by Chang et al)

But for this implementation, OpenCV's Shape Context was what I had. So I thought it would be a good opportunity to try out some of the multi-rate stuff we've been talking about in Realtalk 2020. My first hope was to prototype this with Effil, but those hopes were dashed when it turned out you can't pass C functions to Effil threads. So I turned to C++ "futures" instead. I use them in both the polygon finding and the shape matching.

When you ask for polygons, for example, the first frame that wish exists kicks off a thread. Each frame we check if the thread has returned a result yet, and if so, the resulting polygons are sampled into State and Claimed persistently, while simultaneously kicking off another thread to run the polygon finding on the latest available image.

It's fun to imagine using this on the 128-thread AMD CPU, where we could be matching 128 symbols in parallel!

In the symbol matcher's case, we also keep a memory of the last 10 matches and pick the most frequent member of that set, to smooth out any momentary mismatches due to lighting or movement or whatever. So that's also a nice case study for some of the History stuff we've talked about.

There should be some easy robustness wins tuning playing with this scoring algorithm a bit and making the frame-to-frame bounding box persistence as reliable as rtengine's.

The threading approach got us to a screamin' 10-20 FPS when using Marks kit, which I was thrilled to achieve! But it would be amazing to get to smooth rates and seems pretty possible with all the work we've been building up to.

Storing Appearances

It would be great to add a way to store an image of a manuscript the same way we do with code — this would allow for "ghost symbols", "manuscript binders",  "marks snapshots", and all the other fun stuff we do.

Bret's new "Print with Traced Appearance" will of course be awesome to play with in Marks Kit, thanks for that!! 

Compound Symbols

This is an easy one; for the first pass I only supported contiguous symbols but Shape Context supports handing it however many polygons you want (since it actually just samples a bunch of boundary points and has no idea about connectivity). So for a symbol like "=", Define Symbol would just claim all the polygons in its defined region, and we'd merge bounding boxes of close-together marks to create compound polygons to match with.

The End
Here's the little scribble I drew once all the pieces were starting to come together enough that I could see the light at the end of the tunnel
****************."><Image 5-11-20 at 5.36 PM.jpeg>