Date: Tue, 2 Jun 2015 17:51:17 -0700
From: Robert M Ochshorn
Subject: Re: Edlund on analog versus digital

On Jun 2, 2015, at 3:21 PM, Dave Cerf wrote:

I also wondered (but didn’t look around much) how to easily transcribe the interview: is there an equivalent to OpenCV for speech? I know the Mac has built-in dictation, so maybe I could wrangle that somehow.

I wish…

There’s an insidious, awful, project called “pocketsphinx" that shows up everywhere you go looking for open source speech recognition. Here’s what it produces with the sound-mail you edited:

000000000: oh 000000001: and he had found 000000002: what you know who had he and at a a a a her that you her he could i are you the the on her to or are her out are you the a big our own hundred 000000003: what did but iraq and and to big car or and for that 000000004: i in an a and he 000000005: do you and and out her 000000006: yeah it 000000007: yeah you knew who's and mary i yeah you 000000008: we if you how all the new you a hit 000000009: her yeah it didn't news 000000010: good he you he's 000000011: two yeah there in a are in a are the you in our 000000012: yeah it hey you're in to how to run room who it what and who the and her to i'm a and you get in the there how it in a will to to a sudden 000000013: good 000000014: can he and at her moon 000000015: a new not and won't 000000016: yeah in but her you're he hey no own 000000017: yeah i'm i'm her that i'm to our at e. for or the it who are or who as a 000000018: for the that he 000000019: how you and he to he who are or who the the new one who 000000020: we're and new car 000000021: for lou in her he grew you and i'm a e. that the in her in it don't you have i'm a e. he to have to that i'm 000000022: her or he is or the 000000023: her her a hurt out that a to that a the good a the and or go on our very good or you're out and her her league 000000024: put are i out a for the up on to u. can how to heard 000000025: out to what her our high and that an that are that a not only we that at you that you are heard you or the a the wrong in our new use our people to has who are on an a and are u. out and her our parent who are you a new high 000000026: he's got an did 000000027: when he c. a. and new yeah 000000028: i mean yeah i'm her and narrow do you 000000029: u. are he's new patient very big a head who's in a is a big high and he and 000000030: who 000000031: it hit 000000032: yeah newsroom 000000033: yeah her yeah you knew you really am did 000000034: you're a i her you you his yeah 000000035: the i'm a and i who loved her 000000036: in a had a room you to you and big i 000000037: i'm and you his and he is a to the a are or who 000000038: it it other or her is a to her 000000039: there he will or her for a a the in out to our to not over on the the and 000000040: to be in i'm the you're high see our government to he he you for for or the a a at you and are who the as to how are how to to have had 000000041: you have heard in hit 000000042: yeah you read you a you is a other in heard 000000043: it was a her or or our for it it it mean he e. 000000044: u. and who do you at for or that you 000000045: our or he can who his who sought had 000000046: you yeah it hit 000000047: yeah 000000048: there or car or for gore heard how the has hurt it are on the you at the that you to the lose his her out her c. who are of her i don't her out 000000049: he in the who murder or from he know are her she to 000000050: yeah in some the it 000000051: we is it know all a new pair u. hit 000000052: the and how soon you're so 000000053: you're in both to see you the a a two to new how he is who the the 000000054: the on is who the u. were to do you 000000055: is the as it's how you how good of i'm it has didn't 000000056: the is it as the two you the you you you a it is her a bit 000000057: it did his has two yeah and we do you say the it ahead and me go go on the is in going up to to a to use it back 000000058: yeah this he he in the it and so a a a a a a a who i'm there or his home the how who couldn't have that her 000000059: we're in over at 000000060: it is he is whose isn't what it's 000000061: i'm i the a yeah how are you 000000062: yeah 000000063: and he 000000064: her new run it our are at her 000000065: yeah you're who is as do the who are the two to do to that it it it 000000066: it has to or hurt 000000067: it he yeah are used to he how i mean it didn't know who to know the and keep have it are up the and on and in our out heard 000000068: u. he who you some yeah 000000069: what i'm 000000070: he but at her room you're on the in it o. the that is good good it a who yeah yeah i'm he he 000000071: she i'm i i i i i i i i i i do are i'm lou and he knew her 000000072: in her he who had and with whom has to or wasn't 000000073: it's what we're 000000074: hit 000000075: yeah you do is a home and at for that it to the you for for are and you who on going up are you do is who will 000000076: how 000000077: i do 000000078: perhaps yeah who is he in what 000000079: is a the is is is and a who are to see he get her 000000080: are you a you who are in the you're 000000081: he 000000082: do you oh who i the a of a when it yeah

CMU has a blogpost that starts with grand claims on the privacy implications of your TV sending all the audio in your home to their server, before withering down to their offering:

So if you really need a system which recognizes a dozen commands without the internet, pocketsphinx is a solution to consider.

Meanwhile, CMU does actually tend a state-of-the-art, open-source, speech recognition system: Papa Sphinx. Unfortunately, it’s a Java beast, and exceedingly difficult to get it to do anything. I am slowly acclimatizing myself to the vocabulary and available datasets for speech recognition, but am discouraged with the “scene” so far. 

Your e-mail came as I was installing HTK, which is Maneesh lab’s secret-sauce (paired with $1/minute transcription services) for all of their magic text-audio-video interfaces. As for what HTK might be able to recognize out of the box:

The HTK software distribution also contains an example of constructing a recognition system for the 1000 word ARPA Naval Resource Management Task.

Probably you’re best off wrangling Apple or Google’s dictations, for now… 

-RMO

ps - Also, I listened to and enjoyed the contents of your mail.