Date: Thu, 31 Mar 2016 21:29:26 -0700
From: Bret Victor
Subject: Re: Realtalk object model sketch
This is a sketch of an object and communication model for our system, as described at lunch on Tuesday.  It's probably more useful as a companion to discussion than a standalone description.

It's a first draft and likely to change quite a bit, but it's a concrete thing to think about, and should serve as a good "spine" to allow us to sketch out a system architecture, as well as for modeling the applications that we've been imagining.

Influences include Clojure, Erlang, Unix, Smalltalk, and (in one way or another) many of the media authoring system we studied.


Summary of Interesting Things

recognizers.  An object does not intrinsically belong to a class.  A object adopts the behavior of a class when it is recognized as a member.

behaviors:  Any observable can be interpreted as a program and continuously executed (whether it's text, an arrangement of objects, an image, whatever), so long as you provide an appropriate interpreter.

observables:  When an object looks at the world, it sees the observables exposed by other objects.  An object may project observables onto other objects.

mailboxes:  Objects have an inbox and outbox through which they receive and send messages.

history:  Observables are streams, and objects can "see" (query) observables through time.  At a meta-level, the entire history of observables, messages sent and received, and private variables are available for inspection.


Rulebook

"Applications" are implemented as a set of class definitions, or a rulebook, analogous to the rules of a board game.  The rulebook "gives meaning" to a set of physical objects.  A rulebook specifies a spatial scope or some other criteria that bounds its effect.

The rulebook is itself a physical object, and perhaps can be displayed proudly:



Class Definition

All activity happens through objects participating as members of some class.  Like Stagecast, many classes will be probably intended for one specific object, in which case, defining the "class" will just feel like defining the object itself.

A class comprises a number of things:

 - recognizers:  what determines that an object is a member of this class?
 - behaviors:  what do these objects do?
 - observables:  what do these objects look like to their fellow objects?
 - mailboxes:  what have the objects said and heard from their fellow objects?

(There might be a component system, where a class definition is an assemblage of components, each supplying a set of behaviors, observables, etc.)


Recognizers

Unlike traditional OOP, a class is not an inherent property of an object.  The system cannot instantiate physical objects from scratch with "new Class"; it can only try to make sense of the existing physical situation.

A go stone was fabricated at the factory, and independently exists regardless of what anyone thinks of it.  What we can do is recognize it as a piece in a go game (or as an explorer navigating a hand-drawn maze, or as a paperweight), and in that act of recognition, it takes on meaning for us.

In this system, class is in the eye of the beholder.  The same physical object may be recognized as different classes by different rulebooks at different times, just like the go stone is recognized as different things in different contexts.

When an object is recognized as a member of a class, it takes on the behaviors and observables expected of a member of that class.  But remove the rulebook that is recognizing the object, and the object ceases to behave that way, because no one is around to see it as that.


recognizer identifies an object as a member of a class.  A recognizer might specify a particular location in space (as in v2), or an RFID, or a set of retroreflective corners near a particular desk, or it might do some fancy image processing.

A Go Board, for example, might be recognized by its corners and tag.  Go Stones could then be recognized by processing the camera image within the bounds of the board, looking for black or white circles.  Go Cells might also recognized, not by processing an image, but simply asserting they exist as a 19 x 19 grid over the board.

Once an object is recognized, the system will start the processes associated with the class behaviors (for example, the Go Cell might simulate a cellular automata cell), and the class's observables can be observed on the object by anyone that cares to notice them (for example, the Go Cell might have an observable color, "black" or "white").

When the object is no longer recognized, its behaviors stop and its observables vanish.  (Although their history is still observable -- there used to be a Go Stone here.)


Behaviors

A behavior is an independent process with its own state.  The state is encapsulated, although its history is recorded, and visible from the meta-system.  The process interacts with the world by:

 - exposing observables for other objects to notice
 - querying for other objects in terms of observables
 - sending messages to objects discovered through queries
 - receiving messages.

A behavior specifies:

 - a set of variables (state)
 - an interpreter
 - program to be interpreted

program can be in any language.  A program can be any observable at all.  Javascript text, Python text, an arrangement of objects which forms a state diagram, an arrangement of go stones, an image of a hand-drawn diagram, an image of a hand-written math equation, a Stagecast-like set of before/after rules, whatever.  You provide the interpreter, and the system will provide the CPU cycles to run the interpreter and feed it the program.  (This interpreter/program pairing is inspired by the Unix "shebang".)

Observables are streams, and thus, programs are streams.  Whenever the Javascript text changes, the system will communicate that change to the interpreter, so it can update the process.  If the program is a camera image, the process will be updating continuously.  (The interpreter is literally "interpreting" the image.)

Interpreters are objects and can be built within the system, although at some point, the stack of interpreters has to "bottom out" in an executable native to a host platform, such as Node or Python.  (There can be a variety of host platforms available to the system, though.  If we put a Windows machine or some GPU beast or an FPGA machine on the network, any object can then make use of interpreters native to those platforms.  Like sensors and actuators, computation platforms are resources.)

We will probably start out programming mostly in code, but we aspire to eventually be programming in the world.  The intention behind this flexible behavior model is to try to reduce the friction in that transition, and encourage experimentation with alternative languages and non-languages in a way that makes them "equally first-class citizens" with the established languages.  We don't know what "programming in the world" will mean yet, so we need a system that's willing to consider anything a program.


Observables

When a person looks at a stone on a go board, they will notice its location and its color.  These are the properties that are relevant to the activity of playing go.


When a person looks at a mole tank, they will notice its location, its color, and its count.  These are the properties relevant to playing the mole tank game.


In the same way that recognizers model the human process of seeing an object as meaningful in a context, observables model the properties that are relevant and worth noticing in a context.

Observables are not stateful.  They are "reactive" -- live functions of other observables and/or a behavior's internal state.  But the history of each observable is maintained by the system, and objects can observe objects as of particular points in time, and can query across time.

The concept of observation is loosely inspired by Clojure.


Projection

Some observables are maintained by the object itself.  The mole tank, for example, knows its own count, and has behavior for updating the count, so it will expose the count as an observable.

But the mole tank mostly likely doesn't have behavior to figure out its own location.  There's some other object which is, say, looking for retroreflective corners in camera images, and it's this tracker object which determines the mole tank's location.  Yet we want to be able to look at the mole tank itself and observe its location, without caring what trackers are involved.  How does the location observable appear on the mole tank?

The tracker projects a location observable onto the mole tank.  (This is analogous to how a wall with no writing on it can be made to appear as if it has writing on it, if there is a projector projecting an image onto the wall.  You observe the writing when looking at the wall, but it's actually coming from the projector.)  (Projection is inspired by a useful feature of v2 -- illuminations can specify a target object, so an illumination process that's attached to one object can be illuminating a different object.  This is, for example, how the buttons in the library illuminate the wall above bookshelf.)


Above, a Laser Tracker has noticed that the mole tank has been lasered, and has projected a "lasered by dot D" observable onto the mole tank.  Presumably, the tracker has also "recognized" a Laser Dot object, and projected the observable "on object X" onto it.  Thus, you can be observing the mole tank and notice that it got lasered, or you can be observing the laser dot and notice that it's pointing to a mole tank, and in neither case does anyone need to know about the third-party Laser Tracker.  (There may in fact be various objects willing to project laser observables, not just one Laser Tracker.)

Third parties might use projection to convert observables to other observables.  Consider an object with an "image" observable, which is a view of the object from a camera.  There might be an "OCR" object which looks for objects with "image" observables, tries to convert them to text, and if successful, projects a "text" observable back onto the object.  Another "Translation" object might be looking for objects with "text" observables, and so on.


Sensed observables and actuated observables

Some observables will be derived from sensors, such as the location.  Some might be derived from internal state, such as the count, or from other observables, such as the OCR'd text.  

But some observables might express a desired physical state, and the system will attempt to "make it so".  For example, the mole tank might expose an "illumination" observable, which is the image that it wants (physically) projected on itself.  (A big "1", in this case.)  Some "Illumination Manager" object will notice this observable, and will arrange for a projector to project that image.  Likewise, the mole tank might expose an "audio" observable, which is the sound that it wants played, and the "Audio Manager" will notice it and route it to a nearby speaker.

Regardless of whether sensing or actuation is used to "close the loop", the virtual observable matches what a human would physically observe about the object.


Mailboxes

Every object has an inbox and an outbox.

The inbox is an incoming message queue, modeled after Erlang.  Behaviors can examine the queue and handle the messages they care about.  Unhandled messages are discarded.

The outbox represents an object's ability to send messages to other objects, including broadcasting to multiple objects.

The inbox and outbox are named explicitly because the history of all messages an object has received and sent is maintained by the system, and (through the meta-system) a person can browse this history.


-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 


Notes

This description is just a sketch, and there's a lot to be designed and figured out.  I was going to list here all of the unknowns, dark corners, and obviously wrong things, but I'll save that for some other time.  (It's probably better for that sort of knowledge to be maintained through discussion, in any case.)

One thing I wanted to mention.  Parts of this are shallowly modeled off of Clojure and Erlang, which is to say, I'm overlooking the deep, hard stuff that makes those languages reliable.  Clojure is adamant about safe concurrency, which means using STM transactions when coordinating multiple agents.  Erlang is adamant about fault-tolerance, so real Erlang systems run on the OTP platform, using supervision trees and other patterns for building systems that cannot fail.

I'm inclined to omit these complications for now, not least because I don't understand them well, which means that our systems almost certainly will have race conditions and almost certainly will have failures that require manual intervention.  I think that's okay -- it's a research system, and it's a means to the end of inventing new dynamic representations for thinking in.  The hard engineering can happen much later, and it doesn't need to happen in our lab.