Date: Tue, 9 Dec 2014 11:21:03 -0800
From: Bret Victor
Subject: Re: timelines/terrains
in the same way an author chooses when to break for paragraphs and sections for text. I could imagine "paragraphs" of video working fairly well—indented or with a line break/margin—despite being a simple-minded extension of text to video

Nothing wrong with such analogies.  The early greek scribes didn't put spaces between their words -- the writing was an unbroken stream of letters, not unlike these video grids.  Things sure got much easier to read when authors started visually segmenting into words, sentences, and paragraphs.

The video grids we've been looking at also remind me a bit of disassembling a binary -- the author's intent is lost, and it's one unbroken stream of nearly-identical little things.  I think breaking things up (word/sentence/paragraph-like) and annotating (like comments in code) will ultimately prove critical.  This structure might be brought in via any combination of three sources --

 - the author provides a structure while authoring (like in text or source code today)
 - the readers generate structures while reading (like rmo's interlace) and those structures are possibly inherited by other readers (like google earth layers)
 - algorithms infer structure (automatic scene detection, etc)

The disassembly analogy is useful, I think, because we might start to expect that in our new dynamic future, properly "open-source" video will not just be delivered as a raw sequence of frames, but in a form where a human-readable overview of the whole can be presented and explored.  This might be something like my "Media for Thinking the Unthinkable" page thing, or more like a well-structured and annotated video grid, or somewhere between the two.



On Dec 9, 2014, at 2:33 AM, Glen Chiacchieri wrote:

Dave—yeah! I'm really curious about making this even more physical. I'm hoping to do some experiments with etching frames into wood using the mill we've acquired. Ideally, this will become a control interface for a video, but we'll see.

To me, columns seem really promising (which I suppose is what I was driving at with my yellow scribbles), but those of course introduce arbitrary cuts based on the horizontal and vertical space you've given to the columns. I wonder if the decision to cut can be given to the author in the same way an author chooses when to break for paragraphs and sections for text. I could imagine "paragraphs" of video working fairly well—indented or with a line break/margin—despite being a simple-minded extension of text to video.

Something I was curious about: have you or Rob done experiments in diffing frames? I only remember Rob's "big limbs" pictures. Bret made an observation that in these timelines we tend to see the blue background John Berger stands in front of, but the blue background is not the subject of the frame—John Berger is. I don't think diffing is the answer, but I'm curious about experiments in that area.

On Tue, Dec 9, 2014 at 1:56 AM, Dave Cerf wrote:
It would be cool if you could drag your finger around on that paper to control playback on a separate screen. 

<unknown.jpeg>

Here are some talking head shots to start that conversation…

Higgs Boson scientists…

Bleak: a single, non-moving camera for 51 minutes (with the leader singer from KISS—I actually had the Higgs audio playing underneath for a moment and was baffled in fascination)!



On Dec 9, 2014, at 1:30 AM, Glen Chiacchieri wrote:

Teenage mutant ninja research

On Dec 8, 2014 11:58 PM, "Dave Cerf" wrote:
This is the Grilbert Times. It is a very good newspaper.

A number of years ago, I saw a cool exhibit and didn’t take any photos, assuming I’d find examples of the artist’s work on the Internet when I needed it, and now I can’t find it. The artist had people look at several images and recorded their eye movements. He then framed the eye traces with a caption describing the original image, such as…

A chair




Pornography



I just doodled these in OmniGraffle, but you get the idea. The best thing at the exhibit was a newspaper. Upon closer inspection, the newspaper was really a bunch of pages of printed eye traces, which formed familiar-looking columns. From a distance, it looked pretty much like a newspaper.

This led me to an article about the science of word recognition, and I wondered if anyone had done studies about thumbnail recognition in video filmstrips? I guess that’s what we’re doing right now.

I wondered what a movie might look like, in mid-process, if organized as though a newspaper, using thumbnail grids organized in columns [scrub video back and forth—don’t have animated GIF skills functioning at the moment]. It doesn’t actually seem practical, but it’s interesting to push this idea as far as it can go.





Dave



On Dec 8, 2014, at 5:00 PM, Bret Victor wrote:

what is this

<all-the-frames-thats-fit-to-grid.jpg>


On Dec 8, 2014, at 4:05 PM, Chaim Gingold wrote:

I agree with you about the difficult of a fractal reading order. This morning we discussed the power of columns. The thumbnail poster makes a mistake good designers don’t do with words, which is super extreme long lines. Columns as found in newspapers and magazines are very suggestive. e.g. ...
<10page-cityroom-popup.jpg>
On Dec 8, 2014, at 3:09 PM, Glen Chiacchieri wrote:

Something I was pondering. I really like the z-order, but I think being perfectly fractal doesn't make sense. Our western eyes really like scanning left-to-right (an eye-width) and top-to-bottom and I think that makes sense for the small rectangles of video. But I think the fractal nature makes sense for much longer segments of video. What I think would be an interesting experiment that my padding implies: linear in the small and fractal in the large.


<Screenshot 2014-12-07 19.43.24 copy.png>

​​

Newspapers and to a lesser extent books do this well. You scan to the right as far as your eye can go (due to margins and other blocks of text) and then drop down to the next line. This happens at the letter-level, the sentence-level, the paragraph-level, and the page-level. Why couldn't it happen at the room-level?

Dave, I'm really happy you pointed out the case of interview footage. That seems like a particularly good thing to think about for any representation of video. Are there any good experiments along those lines?

On Mon, Dec 8, 2014 at 1:47 PM, Dave Cerf wrote:
The Morton grid is very nice. In particular, I found the video showing the order in which the grid is drawn to be helpful.

I’ve been trying to figure out if there is some way to indicate order implicitly, via the borders between cells or something. I think “Western” eyes will prioritize left-to-right over top-to-bottom, so what I did here was try to show minimal borders to indicate order, except for the last cell, which is a kind of “folder page corner,” the departure point from one quad to the next. But I don’t think it’s enough.

<MortonGrid1.jpg>

This is the order I was trying to imply… chances are the layout doesn’t guarantee someone would read it this way.

<MortonGrid2_Order.jpg>

It may not be necessary for people to be able to “read” the order, especially if, once a zone of interest has been found, the interface can morph into a linear grid.

Due to the nature of the content (relative short, visually distinct shots), we see the rectangular land masses fairly well here. But try raw content, like an hourlong interview, and it may be harder to distinguish. In which case a Hilbert grid may still have some advantage.

Perhaps raising the “head shot” (interview footage) card here is unfair. The truth is that I think interview footage needs its own dedicated grid form—if even a grid at all. Lots and lots of thumbnails for head shots is a navigational disaster, and it is this dogmatic approach to representation (“All shots must be represented by all their thumbnails in the same manner”) that gets software like FCP X into trouble. I’m digressing into a topic that needs to be addressed, but not by this wonderful Zilberton Grillbot thread.


On Dec 7, 2014, at 6:07 PM, Glen Chiacchieri wrote:

Here's a z-order, or Morton, curve that I'm calling Grilton because I mistakenly thought Grilbert came from "grill" and "hilbert":

<demo.png>
And the video is attached.

I thought this would be a more natural-feeling order because it sort of preserves the left-to-right, top-to-bottom thing we're used to while adding that nice fractal sauce hilbert has.


On Sat, Dec 6, 2014 at 2:12 PM, Dave Cerf wrote:
This is officially fantastic. What I’m seeing—regardless of what you’re going for—is something we discussed a while back coming to fruition: a means of overcoming the abstract patterns formed in linear thumbnail grids.


———————————————————————————
1) We know that a weakness of the wrapping thumbnail grid is that abstract visual patterns emerge. In the visual parsing war within our brains, those patterns win out over the actual content of the thumbnails, which defeats some of the purpose of the grid in the first place. Despite its Tufte-like small multiple honesty (“Just plot all the data and people’s eyes will do the rest”), the linear thumbnail grid has failed to deliver (e.g. “Hey Dave, cool art project, but what is it for?”).

<AbstractPatterns.jpeg>


———————————————————————————
2) Changing the thumbnail size in a linear grid, which helps us identify the content of individual frames, is massively disorienting in the spatiotemporal dimension. Now we might be able to recognize what this shot is (though my guess is the “pattern abstraction" may still be overwhelming for most of you, not having seen the shot below playing normally), but we are really lost in time because we are so zoomed in.

<FCPX_Grid1.jpeg>

<FCPX_Grid2.jpeg>



———————————————————————————
3A) Like the linear grid, Grilbert forms abstract visual patterns, but the uniquely shaped, semi-rectangular land masses provide a “zoom zone” allowing for larger, easier-to-parse thumbnails! In terms of recognizing content while remaining oriented in time, this is a big improvement over the linear grid; the size and shape of the map never changes (maintains orientation) and we can see larger thumbnail images. Excellent!

Small thumbnails
<Grilbert_Small.png>



Large thumbnails
<Grilbert_Large.jpeg>


————
3B) Grilbert’s current shortcoming, in theory, is its lack of linearly mapped time. Without the yellow maze-like outline RMO showed and hid in the example, it isn’t obvious what the path to follow is for linear time, and a person “skimming” over the thumbnails with a cursor would not be able to reliably trace this shape.

The value of easily navigable time cannot be underestimated. Here is a mundane comparison between QuickTime 7 and QuickTime X, where the former has playback controls the full width of the window and underneath the video frame, while the latter imposes “shy” playback controls on top of the video frame and less than the full width.


Quicktime Player 7
<LinearTime_QT7.jpeg>

QuickTime Player X
<LinearTime_QTX.jpeg>


In professional circles, QuickTime Player X is universally hated (I can’t speak for non-professional use). Arguably, the shy controls in QuickTime Player X are the most egregious issue, but the narrower navigation track imposes a frustratingly unnecessary limit.


————
3C) Perhaps Grilbert could use a linear navigation track in addition to a skimming line (see the yellow rectangle around “nine seconds” into the Grilbert space—But where is 9 seconds in Grilbert space? you may ask. Yes, my point exactly). Here, I’ve cheaply pasted the QuickTime Player 7 navigation track underneath the Grilbert.

<GrilbertSkimming.jpeg>


———————————————————————————
4) Grilbert appears to provide a better overview of a video clip than a linear grid. Navigation-wise, it is also useful up to a point: you can get to “roughly around here” with Grilbert, but you still need a linear mapping for more precise navigation. How can one easily morph in and out of linear time from Grilbert as navigation necessitates? Or is that too much to ask (of the person using the tool)?


———————————————————————————
5) There is definitely a video-game-esque quality of this map. And probably the most important thing to get right in video games is navigation. Sometimes that’s all there is to a video game! Perhaps this may seem naive (or intriguing) to the current CDG team, but I’ve often had vague fantasies about building a video editing system into a playable video game: here is your footage, let the game begin! “Winning” the game might require some kind of MacGuffin (high score, save the princess), since the actual goal is simply to finish your movie.

Here are some video game maps in different dimensions and perspectives that remind me vaguely of the Hilbert grid, or interesting Hilbert variations to explore.

<the-legend-of-zelda-hyrule-map.jpeg>



<s1.jpg>


<game_final.png>

<google-maps-nesx-wide-community.jpg>

<figure3-full.jpeg>

<58630-210515-multi1png-noscale.jpg.png>

<pixel-box-screenshot-001.png>


On Dec 6, 2014, at 2:52 AM, Robert M Ochshorn wrote:

Grilbert!

I think I’m starting to figure out the hierarchical possibilities, though this isn’t quite right yet:

<teenage-hilbert-sm.mov>

For some reason I always forget that assembling big frames out of parts of little frames almost never a good idea.

I think I want zoomed-in levels to only show one of the containing frames, but to move along the hilbert track (in time & space). Likely I’ll have to implement this before that makes any sense.

This starts to show that the “land masses” can really start giving new possibilities, in terms of different sorts of zoom. Amazingly, this (sort of) works in all the desktop browsers.

Yr corresp,
r.m.o.

ps - dave, I’ve already had to serialize-and-destroy our “what’s good about griddle” whiteboard to even begin to understand the subtleties of hilbert indexing:

<IMG_0428.jpeg>

pps - chaim, i love your whitespace idea, but i don’t understand hilbert indexing well enough to know exactly how to implement it correctly, yet :-)


On Dec 5, 2014, at 9:12 PM, Bret Victor wrote:

I think we need to explore something that's somewhere between Griddle and Hilbert (possibly Griddle at micro scale but Hilbert at macro scale), mostly just to use the name "Grilbert".

On Dec 5, 2014, at 3:22 PM, Dave Cerf wrote:

I had similar thoughts, wanting a connect-the-dots line drawn from the center point of each frame to show their order in the grid of thumbnails. But then I thought, is this really what I want, or have we taken Hilbert too far in this case? For example, in a traditional linear thumbnail grid (Griddle, per RMO), we already know the order based on our reading skills: left to right, top to bottom. So we don’t need indicators at all—order is “intuitive” in the linear case.

RMO, maybe now is the time to photograph the whiteboard listing the benefits of the thumbnail grid, and then make a second list of benefits for the Hilbert Curve approach. When needed, a person can quickly switch between Hilbert and linear. For instance, in Hilbert, I like the recognizability of the “land masses” we see when zooming in and out, which is something totally absent from a linear grid.

Is one benefit of a dynamic medium the fact that representations can quickly transform from one to the other, smoothly enough to maintain orientation between the two? Maybe we can have multiple representations, of which Hilbert is one, and the new interaction isn’t so much learning about a new representation as it is learning to quickly and comfortably switch between representations as needed.



On Dec 5, 2014, at 12:58 PM, Chaim Gingold wrote:

That’s really nice, and helps explain the curve.

But now I want even more cues. What if it charted out with light outlined empty rectangles, or a path, the near future of where the video will pour into? (Also, by the end, I want the source video to get out of the way again and situate itself below the thumbs.) After the video sits down the order is harder to see. What about something like this, and while I’m sure I’ve got the curve wrong hopefully the idea is clear, where adjacent neighborhoods slightly pull away from one another. As if it was all on paper and you cut along these neighborhood boundaries and pulled a little...

<PastedGraphic-3.tiff>

On Dec 5, 2014, at 12:35 PM, Robert M Ochshorn wrote:

I was trying to keep it simple at first, but the pressures of the CDG prototyping environment escalated me towards the “spiral of death” design. Here’s another take:

<simple-hilbert-sm.mov>


On Dec 5, 2014, at 11:14 AM, Chaim Gingold wrote:

lol

(i think?)

On Dec 5, 2014, at 11:06 AM, Bret Victor wrote:

I hadn't realized there was a Poe's Law of prototyping.


On Dec 5, 2014, at 10:36 AM, Chaim Gingold wrote:


On Dec 5, 2014, at 10:06 AM, Dave Cerf wrote:

It may or may not help, but I am impressed by the real-time Hilbertizing you pulled off. What I struggled to see here was the Hilbert pattern forming in the background—but even when the video is near full-size, is it moving in a Hilbert trajectory? Maybe if the images were more transparent when they were near full-size, I could see the formation of the small cells in the background.

...Or if the foreground video sat still, didn’t go off the page, or otherwise hide itself or what is happening underneath! Its motion distracts from the background activity. It mostly just needs to sit still and let us see the video unfolding while getting out of the way of the background activity.


I am reminded of our other discussion regarding “peeking” underneath something, both in regards to quoted text and the hand-flipping animation technique. Your video made me wish I could peek under the layers obscuring the final Hilbert grid underneath.

I was just thinking about rack focus as a method of changing interface “focus.” Perhaps this is just skeuomorphism and I should be fired along with Scott Forstall. Is the desire to bring familiar and comforting interfaces from the “real world” to screens and keyboards inherently regressive?


On Dec 4, 2014, at 7:07 PM, Robert M Ochshorn wrote:

On Dec 3, 2014, at 9:57 PM, Dave Cerf wrote:

to better track with my eye what is going on

This may not help:

<hilbert-sm.mov>










<demo_lowquality.m4v>






<IMG_20141209_012806.jpg>