Date: Mon, 1 Mar 2021 19:54:58 -0800
From: Bret Victor
Subject: Re: networking
SYNCED MEMORIES

This builds on the "persistent visible memories" email from 10/21/2020.


Memories: history and rationale

In Hypercard in the World, all data about a physical object was virtually "attached" to it.  This included what and where it was ("identify" and "location" attachments), the code it ran ("daemon" and "illumination" attachments), its state ("collection" and "variable" attachments), and other kinds of data ("link", "value", "file").  It was conceptually encouraging to know that every bit of state in the system was attached to some physical object, and could be found in that object's inspector.  But again, HitW didn't go very deep, so there wasn't that much state to attach.

Realtalk tries to be more about recognizing the current state of the world rather than working with lots of virtual state.  But you can't recognize everything, or represent everything physically.  Realtalk '17 kept page texts and patch texts as files on a server, which broadcast them as claims when requested.  Other kinds of data (video, audio, datasets) were placed in a "shared" folder using sshfs.  Other than patching pages or writing to the shared folder, there weren't good ways of persisting state, and those ways weren't very resonant with the Realtalk concepts of statements and spatiality.

In Realtalk 2020, memories are statements pinned to physical objects.  When the object is present, these statements are made.  Memories can be about large things like video and audio files, so instead of mucking around in shared folders, we think of these files as "remembered on" the physical object.  You can remember a video on a page, carry the video to another area, and there it is, still on that page.

A page's text and patched text are now memories on the page itself.  This means, like in HitW, an object is self-sufficient -- everything that needs to be remembered about an object is remembered on the object, instead of floating in the filesystem.  Spatially-localized memories now encompass all statements that can't be derived by looking at the real world.

This change also addresses another goal of Realtalk 2020, which is to work with objects beyond dot-framed sequentially-allocated pages.  Because text is now just a memory, and any object can have memories (not just pages), pages are no longer a special kind of object.  Any persistent object, recognized by any recognizer, can behave just like a page.


Persistent and transient object ids

There are two kinds of objects.  persistent object has something unique about it that lets it be recognized as the "same" object each time it shows up, such as a dot frame or other fiducial.  It could also be unique within a particular context, such as the "shoe" piece on a particular Monopoly board, or a miniature in a snapshot.

Persistent objects have ids with pointy brackets like "<This>".  For example, a dot-framed page has an id like

    <Page 1234>

If the Monopoly board is "<Page 1234>", then it might recognize its shoe as

    <Page 1234><Shoe>

Or if the snapshot is "<Page 1234>", it might recognize its miniatures as

    <Page 1234><1>

(That is, the miniatures don't need their own allocated page numbers.)

If a doohickey recognizer is able to recognize unique doohickeys, it might use ids such as

    <Doohickey xyz>

Persistent objects are expected to be unique within a site, because persistent objects persist their memories.  When the object shows up (in any area in the site), its memories are just the same as when it went away (wherever in the site it went away).

Transient objects are physical objects that are not uniquely distinguishable (like checkers on a checker board) or transient virtual objects (like keyboard bullets).  Their id can currently be any string without pointy brackets.  I've been using ids like

    *Object 12345*

Transient objects remember as long as they're present, but when they go away, their memories are discarded.

Thus, instead of being a claim, an object's persistent/transient status is expressed via the choice of id.  This avoids some subtle difficulties with memories, and allows, for example, a box to be either persistent or transient depending on which id you choose for it.  I think this makes sense -- it's a pretty fundamental distinction that is made by the recognizer, and recognizers are the ones making up object ids.


Presence

It's also the recognizer's job to say when the object has shown up and gone away.  The claim for that is:

    Claim (p) is present.

When an object is present, so are its memories.  When an object goes away, its memories go away.  Persistent objects save their memories, transient objects don't.

Wishing an object does not run also makes its memories go away.

You can remember on any object that is present.  It is an error to remember on an absent object, since that probably indicates a mistake -- either you're remembering on the wrong thing, or a recognizer isn't claiming an object is present.  (Deep memories are different, see below.)

The 10/21/2020 email described claims "(p) can remember" and "(p) persists memories", which have been removed in favor of presence and persistent ids.


As remembered on

You can query a particular object's memories directly, regardless of whether the object is present:

    When (p) has text /text/ [as remembered on (p)]:

This lets you access the memory database kind of like an actual database.  It could be used for page search, for example.  Even if the object isn't present, the query is properly reactive (for example, if the memory is changed in a different area).


Deep memories

Objects now remember their own page text.  But every page has text, and it would be distracting to see a blue memory box at the bottom of every page, instead of just on pages with interesting state.  Worse, you don't want to type "Forget." and accidentally wipe out a page's text, or accidentally modify the page's original text when editing memories in the editor.  If we want to tell people that they can't break anything by messing around in Realtalk, we need a way to protect certain memories from accidental clobbering.

Page and patched text are stored as deep memories:

    Remember [deeply] (p) has patched text (text).

This shows up in the editor as

    Remembering [deeply] (you) has patched text "...".

You can think of [deeply] as a kind of "sudo".  Deep memories cannot be affected by any non-deep "Remember" or "Forget".  (Of course, anyone is free to use [deeply] intentionally, and clobber whatever they want.)  Other differences:

 - Deep memories save and sync immediately when remembered, so saving a page in a shared rulebook takes effect immediately throughout the site, like old times.  (Normal memories don't save and sync until the object goes away, so remembering is cheap.)

 - Because of that, you are allowed to remember deeply on an absent object.  That's how fabricating a new object works.  Instead of writing a file, "Printing from page" makes up a new page id and simply remembers [deeply] the text of the new page.

 - Deep memories remain even when you wish the object does not run.

 - Deep memories don't show up in the blue memory box, unless you ask for them:

    Wish (p) shows all memories.

While normal memories are like "application state", deep memories are more like "metadata".  What other metadata might we want to associate with an object?  We could, for example, remember the data and time the object was created, or the image of the page we printed it from.  In a Marks Kit world, we might even create an object that has no text, but only an image, and that image is the program.  

In this way, moving text into memories is actually a step towards non-textual programs.  A "program" can now be any set of statements you want to associate with any recognizable object.


Local memories

Objects in rulebooks are peculiar, in that the same object id is "present" simultaneously in many areas.  The memories of such objects normally just pertain to the local area.  For example, "Print previewing" wants to remember the most recent print job in the area, not across the site.  Likewise with projection configuration, sync and networking state, etc.

Additionally, it's very useful to be able to supersede these objects -- to make a variant of "Previewing print jobs" on a Blank, which automatically doesn't run until it's swapped in with "Supersede duplicates".  We'd like to be able to swap these variants in and out while keeping their memories intact, and that's not possible if the memories are associated with the object id.

Instead, you can remember [locally]:

    Remember [locally] (processor) previewed print job (ps).

These memories appear on the object like normal memories, but instead of being associated with the object id, they are associated with the area and title.  (It actually uses the "base title" minus any parentheticals, which is also how superseding works.)  Because these memories are local to the area, they are never synced.

I'm not 100% enthusiastic about how local memories work yet.  They do serve their purpose, if you remember to use them.  And they're only useful for rulebook objects.  I wish I could make the editor's undo stack a local memory, but there are a lot of pages titled "Code editor" and they certainly don't supersede each other.


Transient memories

Some memories just shouldn't save.  For example, "Keyboards" remembers the key state of each keyboard.  If you shut down Realtalk with the control key down, it shouldn't assume the key is still down when it restarts.  So:

    Remember [transiently] keyboard at physical address (key address) has key states (states).

Like all memories on transient objects, transient memories on persistent objects are discarded when the object stops being "present".


Booting

Realtalk can still boot itself from nothing more than the memory database and a short "boot.sh" shell script.  The shell script does include a couple frightening "sed" commands to underhandedly extract page text from the memory files.  But you still can

    rm -rf cache && ./boot.sh

and Realtalk will build the Lua binary and all our modules automatically.  You'll be up and running in about seven seconds.


Sites and site leaders

Statements are scoped to the area.  Memories are scoped to the site.  When you carry an object from one area to another within the site, you carry its memories with it.

Currently, a site is specified simply with:

    Claim machine "<Machine dynabulb4.dynamic.land>" is the site leader.

Whatever machines claim the same site leader are thereby the same site, and will share memories.

Unlike Realtalk '17 with its special server, there's nothing special about the site leader -- it's just an ordinary Realtalk machine that's been given an additional role.  The site leader's post office syncs memories between machines, in exactly the same way that other post offices sync memories between local areas.

It's probably good for the site leader to be on most of the time other machines are on.  Unlike Realtalk '17, every machine is self-sufficient -- you can turn it on, use it, and change memories, even if all other machines are off.  (Although you can't allocate pages without the internet-based page allocator.)  However, while the site leader is off, memories remembered on one machine won't show up other machines.  When the site leader is turned back on, all those pent-up memories will sync and everything will catch up.

It is almost the case that you can change the site leader simply by changing the claim.  (I still need to do a bit more work to make that seamless.)

Syncing happens in realtime, through ordinary directed statements.  All memories are stored on all local machines, and no data leaves the Realtalk network.  I did set up a Github repo as periodic offsite backup, but that's not related to syncing.

The 10/21/2020 email described  "Wish (processor) syncs", which has been removed.


Refs

As described in the 10/21/2020 email, if you remember a value that isn't well represented in text, or remember the realtalk: URL of a file, the resulting memory will be saved with a "ref":

    Remembering (you) claims (you) has huge table (Ref "table-2989213782").
    Remembering (you) claims (you) has video (Ref "128937918379128312.mov").

When a post office receives a memory with refs while syncing, it holds off on acknowledging the memory until it has the ref'd value.  These values aren't transferred through Realtalk's networking; instead, each post office runs a lightweight web server (!) and the receiving post office simply "curls" it from the sender.

Values like the huge table are always downloaded along with their memories, but files like the video file only sync "upstream" to the site leader.  Downstream machines download files only when they need them. (For example, if you try to play the video.)  This works because all files are represented as URLs (in this case, a realtalk: URL), so the only way to do anything with them is to download them.  (With a bit more work, we should be able to stream the video instead of downloading it.)

This means that the site leader has a full set of files, whereas other machines might only have files that they have created or used.

Garbage refs can accumulate.  (At the moment, the editor's undo stack is usually a "huge table", so it gets ref'd and synced every time an editor goes away.)  Garbage collecting refs is straightforward -- scan all memories for Ref, and delete the unreferenced files in the "refs/" directory.  I currently put down a "Garbage-collect refs" page (33325) by hand occasionally, but I'd like to make it automatic.


Syncing on shutdown

When an area shuts down, either because of a signal or a wish

    Wish [in area (area)] area (area) shuts down because "it's time for bed".

it tries to sync before exiting.  Local areas sync unsaved memories to the post office, and post offices sync to the site leader's post office, if it's connected.  Even if the site leader isn't connected, the memories remain safely stored on the local machine, and will be synced whenever the site leader is available.

When I hit the power button on my remote control, I see the banner "Shutting down." for about a second on all pages before the machine turns off.


Syncing/sharing between sites

Memories sync throughout a site immediately, which suggests that sites should be within "shouting scope".  If you screw up a rulebook, you can shout to let everyone know; if something unexpected happens suddenly, you can shout to ask if someone made a change.  The model here is 469 9th St, which is nicely shoutable.  In the early days of Pandemictalk, patches I made at my house synced immediately (through GitHub) to your house, which was super uncomfortable because we were not sharing a physical context.

So your house and my house shouldn't sync automatically.  But we should be able to exchange our work, somehow.  I don't have a good solution yet.  

Luke has talked about sharing being a more deliberate act, where we can see what's new at other sites, and choose to adopt what we want.  In early pandemic days, I was emailing around little packets that updated your ghost pages.

Cross-site sharing could be asynchronous (push to GitHub, pull from GitHub , or it could be a realtime act where I bring your area into mine (using "included regions") and we talk about what we're sharing (either in person or through video call).

In any case, I'll probably cook up an interim solution via GitHub, but I haven't yet.


Page numbers

Remembering page text on the object is one step towards making pages no longer special.  Another step will be fixing the rules that expect page numbers (rulebooks, info maps, etc.) so they use object ids instead.  I haven't done this yet.

DSP-Kit-style replicas can be made to work, but they will use a different technique than claiming a page number.