Re: [journal] organizing state

On Jan 27, 2016, at 11:27 PM, Chaim Gingold wrote:

Not off the top of my head from the text itself, and I don’t have time right now to dig through Augmenting Human Intellect, but I’m almost certain it’s in there.

Hmmm…. not enough time to dig through Augmenting Human Intellect—since we’re talking about search, perhaps it would help if you had a search interface to AHI? (video)

Obviously, text search is commonplace by now. What this does is combines the in-browser text search on Engelbart’s AHI with a bidirectional link to a semantically-spatialized mapping of all of the words in the piece.

Talking with his daughter brings this point home: she points out that Doug’s tools offered more powerful ways of working, but people had to work hard to learn how. The chording keyboard is a great example. She and her siblings were drilled by their dad on the chording keyboard. And in AHI he talks about about the tools as well as craft knowledge (but I don’t remember the term he uses).

Training is an essential part of Engelbart’s vision; it’s part of the H-LAM/T acronym:

I think he’s right that there are some things that take a good deal of time to learn but that pay off. I’m happy, for instance, that I can type very quickly on a QWERTY keyboard. And it’s a valid point of argument that I might have been better off if I had learned a chorded keyboard. I can see benefits, as well as drawbacks.

But there’s a big difference between training to learn a system, and training to understand each individual document in the system. I was referring to Engelbart’s fixation on acronyms (he calls them “abbreviations”):

This is coming from a reading of Whorf which I don’t believe has held up into the 21st century:

This very interface captures my suspicion of extending-language-and-thereby-extending-thought-through-abbreviation (ELATETTA): while nearly all of the words that Engelbart uses have a rich set of connotations that are captured by the scatterplot—even many of the proper nouns like Korzybski and Whorf—H/LAM-T is a string of characters without any semantic connotation. Something might, eventually, be gained by adding “H/LAM-T” into common usage, but my claim is that there is a severe cognitive cost associated with the use of characters outside of known semantic understanding. Engelbart thinks that “quick look-up” will overcome the strangeness of these characters. I disagree.

I’ll include a little more about this technique, below. I haven’t really gotten it to a satisfactory state yet, but it’s starting to point some directions forward.

More broadly, I’m interested in “seeing inside” of search. I’m blown away by the amount of complexity that Google is able to hide with their ten blue results. It’s an incredible achievement, but perhaps we’d expand from our own “personalized” perspective if we were able to better understand the “space” in which we are looking.

Your correspondent,

R.M.O.

• • •

I’ve been transfixed lately by the open source Google-initiated word2vec library. I trained it on the first billion characters of Wikipedia (of course), and it has “learned” to assign a 200-dimensional vector to each word contained therein. The simplest implication of this is its function as a misbehaving thesaurus: you give it a word, and it will output the “closest” other words. Some examples will help.

In NYC I gave a talk at a center formerly known as Hacker School (now Recurse Center), and I started with the computational connotations (hereafter “CC”) of “hacker”:

Enter word or sentence (EXIT to break): hacker

Word: hacker  Position in vocabulary: 6799

                                              Word       Cosine distance

------------------------------------------------------------------------

                                           hackers		0.641573

                                           hacking		0.634869

                                              hack		0.519038

                                            jargon		0.496075

                                              geek		0.480616

                                              japh		0.473775

                                             hacks		0.472019

                                             cyber		0.461035

                                            newbie		0.434005

                                         leetspeak		0.426675

                                            ursine		0.424415

                                         cyberpunk		0.419892

                                     securityfocus		0.419757

                                        subculture		0.419074

                                              leet		0.415718

                                           cracker		0.407582

                                              lamo		0.404582

                                            steele		0.402015

                                         developer		0.401020

                                           sussman		0.398030

                                         malicious		0.392435

                                           mitnick		0.391394

                                         newsgroup		0.391120

                                         hackerdom		0.387594

                                          intercal		0.386764

                                              grep		0.377167

                                           fannish		0.374869

                                              perl		0.373532

                                          tridgell		0.369218

                                            flamer		0.366661

                                            shrdlu		0.364778

                                               fic		0.363257

                                         demoscene		0.363033

                                            usenet		0.360683

                                       obfuscation		0.360228

                                        programmer		0.357891

                                             warez		0.357649

                                          webcomic		0.356827

                                       neuromancer		0.356489

                                             slang		0.354669

Does this not explain why the organization might have decided to change their name? I’m very much familiar with the conscious attempts that humans take to shape our language. I know the dogma about how “hackers” are ethically motivated clever explorers of the digital unknown. But what does it mean to insist on a meaning that’s quantitatively different from the words it co-notes? Connotation, by the way, is a very good way to think about these results: the vectors are trained on pairs of words sharing a sentence.

At a Tactical Media Files workshop that I participated in on Saturday, I attempted a similar trick, this time opening with “tactical”:

Enter word or sentence (EXIT to break): tactical

Word: tactical  Position in vocabulary: 6478

                                              Word       Cosine distance

------------------------------------------------------------------------

                                            combat		0.549381

                                         strategic		0.534384

                                           tactics		0.526885

                                   countermeasures		0.514575

                                          warheads		0.486942

                                      interceptors		0.485585

                                          strategy		0.475635

                                    reconnaissance		0.472701

                                               abm		0.467699

                                          deployed		0.467696

                                           targets		0.466220

                                           stealth		0.465901

                                               alq		0.465775

                                         firepower		0.463735

                                        blitzkrieg		0.460129

                                            amraam		0.459038

                                          infantry		0.457187

                                         logistics		0.451000

                                              cvbg		0.450855

                                            aiming		0.446344

                                          maneuver		0.445396

                                         manoeuvre		0.443323

                                         aggressor		0.441265

                                         defensive		0.440341

                                     operationally		0.438317

                                         airlifter		0.436964

                                            tactic		0.435498

                                           defense		0.433977

                                        offensives		0.433798

                                        sidewinder		0.428010

                                          fighters		0.427416

                                              icbm		0.425953

                                        deployable		0.425019

                                            weapon		0.423242

                                           fighter		0.422070

                                         grappling		0.421749

                                             icbms		0.418057

                                       interceptor		0.414731

                                          weaponry		0.412819

                                           agility		0.411716

Look at how militant the vector of tactical co-notes! I turned these CC’s inward at my own work and values, and was tickled to see the very concept of “visibility” laced with the aerial perspective of modern warfare:

Enter word or sentence (EXIT to break): visibility

Word: visibility  Position in vocabulary: 12642

                                              Word       Cosine distance

------------------------------------------------------------------------

                                         altitudes		0.540396

                                          altitude		0.529357

                                   maneuverability		0.488706

                                          airspeed		0.488132

                                     survivability		0.467124

                                             speed		0.438493

                                           cockpit		0.436485

                                       sensitivity		0.428580

                                             glare		0.427989

                                          humidity		0.427278

                                       situational		0.415761

                                           fairing		0.412770

                                          subsonic		0.407297

                                               vfr		0.405353

                                     significantly		0.404337

                                           intakes		0.396664

                                        capability		0.395556

                                   manoeuvrability		0.395410

                                           leakage		0.391842

                                             level		0.390953

                                         airspeeds		0.389908

                                            acuity		0.389621

                                         transonic		0.387485

                                         intensity		0.387420

                                           quality		0.387222

                                          thermals		0.384842

                                        throughput		0.383414

                                        brightness		0.382153

                                            levels		0.381913

                                      dramatically		0.381580

                                       readability		0.378554

                                        congestion		0.378261

                                          ceilings		0.376643

                                         vigilance		0.374827

                                       performance		0.374289

                                         lethality		0.373615

                                          sunlight		0.373360

                                           greatly		0.373039

                                           agility		0.371887

                                       reliability		0.371846

Note the contrast with our aural sense. To “listen” is to acknowledge the other, to learn, to ask:

Enter word or sentence (EXIT to break): listen

Word: listen  Position in vocabulary: 7642

                                              Word       Cosine distance

------------------------------------------------------------------------

                                              hear		0.594319

                                               you		0.508852

                                             learn		0.507548

                                         listening		0.507300

                                          listened		0.498478

                                         realaudio		0.485728

                                           listens		0.484657

                                              sing		0.484529

                                        headphones		0.476096

                                             heard		0.472872

                                           podcast		0.466580

                                              want		0.460877

                                              your		0.460069

                                          instruct		0.459724

                                              talk		0.450591

                                          listener		0.449981

                                              tell		0.444445

                                              read		0.439060

                                        exclaiming		0.432227

                                           sampled		0.431869

                                          watching		0.431578

                                      conversation		0.428230

                                          playlist		0.421875

                                         listeners		0.419649

                                            loudly		0.417167

                                            forget		0.416092

                                           audible		0.414083

                                          remember		0.413376

                                           outtake		0.413345

                                           jingles		0.411440

                                             speak		0.409109

                                             gotta		0.405322

                                                me		0.403930

                                               say		0.399365

                                          whistles		0.398721

                                             laugh		0.397030

                                               ask		0.396598

                                             audio		0.396549

                                             pause		0.396406

                                              know		0.395876

Obviously there are drawbacks, and I was much tickled to see “forgetting” so closely involved with the act of listening.

The Minsky vectors were from this same technique. Minsky is dead, but “Minsky” still evokes a complex network of concepts and peers. What greater compliment for an intellectual than to enter into our language? But the query-response logic of traditional computer programs is not what I am after. For me, the point of a “symbiosis” between human and machine is help us find the questions to ask, not to presume to have an answer.

Date:	Mon, 8 Feb 2016 21:55:39 +0100
From:	Robert Ochshorn
Subject:	Re: [journal] organizing state