Date: Wed, 7 Sep 2022 20:23:17 -0700
From: Bret Victor
Subject: Re: Reciting the alphabet
If you haven't already checked out the Potrace technical description, you might find something helpful in how he does tracing, smoothing, corner analysis, etc...

http://potrace.sourceforge.net/potrace.pdf



On Sep 5, 2022, at 2:15 PM, Luke Iannini wrote:

Sketching out the analysis of the outline of a letter "a" using the subdivided/parallel circle idea I mused might be a better approach overall in the postscript...

It discards the special-cases for finding "lines" and "corners" and simply considers circles all the way down — a line is a "big circle" (infinitely big for a straight line), and a corner is a "small circle" standing on its own.

It's parallel-by-default, along with the merge pass, taking pairs of neighboring circles and producing one circle if they could be merged (i.e. their centers are close together, and their radii are similar), or returning both unchanged if not, and then we'd just keep collapsing pairs until we converged on a summary.

<Screen Shot 2022-09-05 at 5.13.22 PM.png>


On Aug 31, 2022, at 3:48 PM, Luke Iannini wrote:

The final frontier in my journey into shape recognition is to recognize a whole handwritten alphabet.

The basic idea of the first matcher was "triangle walking" — for each point along the exterior of the shape, walk a set distance to the points on its left and right (to smooth over wiggles), measure the angle of the triangle that forms, and then take the sharpest triangles in each neighborhood as the descriptors of the shape, along with the distance to the next keypoint. (There are of course a few more details but this is the gist). 

With this we now have a description of the shape as a list of of {angle-at-point, distance-to-next} pairs, so the next step is matching. To do this we use the Longest Common Subsequence algorithm, modified to allow for fuzzy equality (small variations in angle and distance) along with accumulating the error in each fuzzy match to choose the best of multiple matches. LCS is commonly used in string diffing and, appropriately, DNA comparison, and is handy here as it's robust to any missing or extraneous items in the two sequences which the description process might produce.

This is reasonably speedy, gives a nice shape summary, and performs OK on angular shapes (as it's best at finding corners), but doesn't do as well with curves, as choosing the sharpest triangle in a continuous curve is pretty arbitrary and variable, especially with hand-drawn symbols.

Angular shapes cover a healthy landscape of possible shapes, so I'm sure we can find lots of use for this approach! But it doesn't yet slay the alphabetical dragon I'm pursuing, so I must return to the forge!

~time passes~

With some quality hammock time I set upon idea of chasing curves directly. If the approach above was "triangle-walking", this one is "circle-walking", wherein I use the circle-from-three-points construction to recognize continuous curves along the shape (calculating their arc-length and radius), such that we can reduce a shape to a series of curves and lines that can then be fed into the same fuzzy-common-subsequence matcher. This is particularly appealing since the resulting summary feels like a plain-english description of a given letterform — "an uppercase B is two half-circles followed by a perpendicular line of twice their diameter".

There are many permutations and improvements to try, but even the quick untuned proof of concept version of this looks very promising, so stay tuned while I try hardening this one in the flames of reality!

<IMG_9392.jpeg>

A few extraneous comments and details...

~ Interactive algorithm design in Realtalk remains one of computing's greatest pleasures

~ There's a whole lot of low-hanging fruit just begging to be plucked in Recognition Kit, but fast dynamic shape matching feels like the thorniest problem that also brings the widest possibility space in expressivity! (Most tantalizingly, keyboard-less and dot-less programming)

~  Both "walker" prototypes proceed linearly following the contour, but could be easily parallelized by first subdividing the contour into many pieces followed by one or more merge phases — this will possibly yield better results anyway (even single-threaded) since we'll get an iterative understanding of the shape at multiple levels of detail.

~ One wrinkle in the matching phase, particularly for rotated copies of a shape, is that we don't have a canonical point that the description-string starts from. I'm currently solving this by just brute-forcing every possible phase-shifted version of a shape and picking the best one — I think doubling one of the shape-strings would accomplish the same thing? — but there might be a better way

~ I'm ignoring interior shapes at the moment to focus on the contour algorithms, but recursively applying the contour algorithm to interior shapes once the outer shell is analyzed should be no big deal, and a big help to matching

~ There are some image-based techniques I still haven't tried that may be promising, but I'm drawn to the geometric approach as it feels more human-understandable than convolutions on bags of pixels!