Date: Tue, 1 Aug 2023 01:30:40 -0700
From: Shawn Douglas
Subject: Re: CCC talk
I didn't visually mock up specific scenarios for what we might do with papers and pdfs, but this can serve as a starting point for thinking about what to prioritize.


What should scientific paper reading/annotation/authoring tools look like in the future?

Bret's 2011 Scientific Communication As Sequential Art reimagined a network theory paper by Watts & Strogatz. It would be fun to plan a future imagination jam where we explore what it looks like to go back into existing literature, extract data and models, and "port" it into a communal computing environment. For now...


What tools do we need for the CCC talk?

We only have a few days left to add capabilities and integrate them with the papers we've selected, so it's probably best to focus on features we'd like to demonstrate during the presentation. We can start enumerating what types of information are typically associated with each paper. Then, think about what aspects of conventional papers are the most slow, clunky, and frustrating, and try imagining ways that we can flip and remix those to create something more understandable, seamlessly accessible, and joyful.

What can be found directly in the papers? (i.e., a printed copy of the PDF)

- Title
- Author(s)
- Abstract
- Materials
- Methods
- Discussion
- Figures & Tables
  - Schematics and conceptual drawings
  - Data plots and visualizations
  - Representative images (micrographs, photos)
- References
- Funding sources


What additional information might accompany the paper but is not typically visible in the PDF? (e.g., materials downloadable on the journal website)

- Figures & Tables
  - Images (high-res, supplemental)
  - Raw experimental data
  - Analyzed data (e.g., cryo-EM 2D image classes)
  - Custom design files (Cadnano, Nanobricks Software)
  - Lists of DNA origami/DNA brick staples
- Models, Algorithms, and Code
  - Biophysical models and equations (e.g. Dunn'15)
  - GitHub repositories
  - Computational notebooks (iPython/Jupyter)


What additional information can be retrieved from external sources? (e.g. using ascension numbers for records in databases and repositories)

- Structural models (PDB files, EMDB entries)
- DNA/RNA/Protein sequences and maps (GenBank files, Snapgene files)
- Raw EM micrographs on EMPIAR
- Author photos, lab websites, YouTube talks
- Materials: vendor product catalog items, pricing
- Subsequently published papers that cite the original paper
- Grant information (e.g., patents, other papers funded by the same grant in NIH RePORTER)


Focus areas

How does each paper appear locally? What does the paper look like when it's face up on the table? Is there a custom cover page? When casually flipping through each page, what appears on the page in addition to the printed text? Do we overlay boxes directly on the page? In what cases do we fan out boxes or other information nearby? How do we indicate something can be "picked up" and remixed elsewhere? Do animations/movies show playback controls?

How do tools interact with the paper? Should the "mock test tubes" be able to pick up other types of information beyond DNA and protein structures? Can we overlap an empty tube with any image to pick it up and then scale it on the table? Can we pick up multiple sequences directly off the paper in a single tube?

How does the paper interact with the environment? Do papers "see" each other on the table, for example, drawing a line when one cites the other? When a paper is on the table, it probably should show up on the timeline. Should its appearance on the timeline change in different contexts? If we turn to a paper's references page, do all the references appear on the timeline? Another thought: We could gradually build up a thumbnail page gallery of all the papers we open throughout the talk, reminiscent of Bret's library experiments at CDG that showed you a whole book at once. Maybe "show-all-page-thumbnails" is just a special view of the timeline we toggle by flipping over a card on the table. Lasering a thumbnail could bring up a phantom version of the page for quick retrieval of a DNA sequence or figure.


Concepts and scenarios

Unified dynamic document: It's possible to see everything in one place!
  - We flip a paper from Orion's lab, and we play movies of cells swarming around right there on the table without needing to visit the publisher's website.
  - Plots and visualizations "offer" their raw underlying data sets. We pick up some tabular data and rescale it, choosing a different representation.

Moving data around: Physicalized papers make it easier to pick up information, move it around, explore, compare, and remix it.
  - We pick up an image and scale it up on the table or wall.
  - We pick up one or several DNA sequences (all at once or one at a time?) from a PDF page and use them elsewhere.
  - We put molecular models and microscopy data from several papers on the table with a single scale bar. Everything scales automatically, and we immediately grasp the relative scales of cells and nanostructures from different papers.

Reproducibility: Figures offer everything necessary to recreate the experimental conditions that were used to generate them.
  - We pick up a DNA brick or DNA origami design from a PDF page and immediately see the "compilation" program (synthesis order as 96-well plates, materials list, protocol) for that shape.
  - A gel figure "offers" its full protocol that we can pick up, put on the table, and then modify. For example, when reviewing a magnesium-concentration screen for optimizing DNA brick/origami folding, we grab the protocol from the figure, put it down on the table, and then swap out the input design for a modified design.