Hi Robert—
The thumbnail grids are up to almost 7000 out of 10K. So I've started exploring quickly browsing them… Quick Look was too slow, so I switched over to Mac OS X's Cover Flow view, which is stylistically outdated but is an easy way to fly through graphical content.
[GridScrolling.mov]
My first thought, which could easily be prototyped (but not easily sent to you given it would reveal movie content), was to pair this Cover Flow view of the grids with a Cover Flow view of a single thumbnail—a representative poster frame.
I think the poster frame would naturally be the user's focus, but the full grid can provide useful peripheral information—the size of the thumbnails in the grid currently indicates duration (tiny thumbnails mean the clip is longer; rough patterns about the shot, such as where the slate appears and where the "usable content" is, weather patterns/lighting changes, transients—a few frames of light or dark).
One thing that always troubled me at Apple was that we did not do significant research into the best ways to use filmstrips/grids/etc. There was simply a mandate to use Filmstrips because iMovie uses them and "they reveal your content better than abstract timeline rectangles"—a principle that most professionals balk at, finding the filmstrips to be noisy and distracting, especially in an editing timeline (in a browser view, the idea of filmstrips is a bit less controversial). There were a few basic ideas tossed around, such as: "always show the first frame, last frame, and middle frame for a clip), but that's about it. Now, as I start to look through many of these grids, a few thoughts occur to me that wouldn't have without this exploration…
• How about filmstrips that are composed of only frames that are significantly different from previous frames? In the case of a full movie, you would expect such a thumbnail algorithm to essentially show you one frame per cut, more or less. But this would also help you catch transient moments without having to weed through all the continuity.
• What does it mean to visually compress continuity? In other words, if a range of frames are all sufficiently similar, how might we represent that? Literal representation is noisy—we see all these frames that are essentially the same and our brains have to do the work of filtering that out.
Dave