Today I made some initial progress parsing drawn state diagrams. The problem is to parse an image like this,
into a list of states (drawn circles) and arrows (each with a from and to state). The parser should return the elements of the diagram and their connectivity as well as the pixel locations of each of the elements, so you can project onto them, find images/tokens/writing within the circles, etc.
First I extract the "ink" image using the same technique from cat carrier.
Then to find the states (the drawn circles), I look for internal contours ("holes") on the ink image.
I then filter these to find ones that look particularly like circles by evaluating their roundness (ratio of perimeter^2 to area) and convexity (ratio of the area of convex hull to area of original).
This gives me all the states as contours.
Next I need to find the arrows and for each which is their from state and their to state.
To do this, I take the states I found and use them as a mask that I subtract from the original ink image to give me just the arrows.
Now I can find the contours in this arrowsImage and evaluate which state contours they're "touching". (You can perform a touch test by taking two masks, dilating one slightly, and seeing if the intersection image has any white pixels.)
I haven't gotten around to this yet, but I think everything is straightforward from here on out. (To be continued...)
I wanted to document this to show that it's possible and to give a sense for the kinds of CV operations we can use to parse hand-drawn diagrams. Shape analysis of contours gets you pretty far, as do erosions, dilations, and binary operators on binary (black-and-white) images.
The skeletonize operation will also be helpful for normalizing stroke thickness.