Date: Fri, 18 Mar 2016 11:29:34 -0400
From: Robert Ochshorn
Subject: Re: What are people trying to do when they're programming? (was: "[journal] organizing state")
A non-graphic update:


I did a lot of infrastructure work but haven’t yet managed to get the word vectors to a stable place beyond the Augmented Augmented Human Intellect sketch that I sent Chaim et al. They’re a little backgrounded now, but I was working in three directions simultaneously:

a - training semantic vectors on different datasets. In particular, I’ve trained different vectors on: the entire english wikipedia (not just the first billion characters); all of stack overflow (not just the titles); and an (apparently-standard) “crawl” of news articles from 2013 and 2014. It’s fascinating to see how the “computational connotations” of certain words change in context (and how astute this algorithm can be). Just to give a simple example, in the news “coffee” is a stimulant that people drink that’s often produced by a large retailer:

Word: coffee  Position in vocabulary: 3067

                                              Word       Cosine distance
------------------------------------------------------------------------
                                           coffees 0.762932
                                               tea 0.747759
                                          espresso 0.742584
                                       cappuccinos 0.716952
                                    instant_coffee 0.715701
                                            lattes 0.707537
                                             latte 0.707210
                                        cappuccino 0.687730
                                             decaf 0.682195
                                  starbucks_coffee 0.680670

In contrast, Stack Overflow knows “coffee” as somewhere between a javascript preprocessor and a commodity to be modeled in the DOM:

Word: coffee  Position in vocabulary: 5988

                                              Word       Cosine distance
------------------------------------------------------------------------
                                     coffee_coffee 0.625219
                              coffeescriptcompiler 0.569834
                                             coffe 0.561315
                                               tea 0.556719
                                        milk_li_ul 0.552825
                                    where_coffeeid 0.549666
                                         li_li_tea 0.546969


b - I was hoping to use these trainings to “augment,” connect, and extend arbitrary sets of PDFs, for example the research papers we’ve been collecting at CDG, but am currently overcoming some performance concerns that have left my efforts in a high-complexity state that I haven’t yet had a chance to finish up; I’ve been working out the abstractions in Seatbelt to allow for dependency-based lazy-evaluated computation of derivatives.

c - Another “output” I would like to develop is a fully-functioning search engine (e.g. for Stack Overflow) built on top of semantic vectors, but (perhaps through t-SNE) explorable and associative beyond traditional “query/response” search engines. I would like to “see inside” of a service like Google, to learn from the relations and associations that capture so much meaning even in the fairly “dead” query/response form we know. I find search inherently interesting, and it is increasingly essential to my understanding of what “computing” offers to the limitations of feeble human minds, but I also need to undertake a deeper semantic exploration to reach next steps in my speech research.


R.M.O.

On Feb 18, 2016, at 10:53 AM, Virginia McArthur wrote:

This is indeed amazing J
 
From: Dave Cerf [****************] 
Sent: Wednesday, February 17, 2016 5:46 PM
To: Dynamic Medium
Subject: Re: What are people trying to do when they're programming? (was: "[journal] organizing state")
 
Walk-in word-cloud?
 
After making this, it occurred to me that I’ve experienced a sensation like this when reading certain legal documents…
 
<image002.jpg>
 
<image004.jpg>