My friend (and former student) Tyler Schnoebelen now blogs regularly on the site of the company he works for, idibon (in San Francisco), where he’s Senior Data Scientist. These postings look at matters with a NLP (natural language processing) angle to them, but always with an engaging take on the material and often with an unexpected choice of topic. Four recent postings of this sort:

“Justice Kennedy’s favorite phrases” (link) 6/12/13: comparing Kennedy with Ginzburg and Scalia

“The street where you live” (link) 6/7/13: on street names

“Back the right horse (name)” (link) 5/2/13: on names for racehorses (posted just before the Kentucky Derby)

“We’ve lost that lovin’ feelin’ ” (link) 4/22/13: on song titles (and love-words)

On the company:

idibon: Language technologies for a connected world

Idibon helps companies understand their language data. Using cutting-edge natural language processing, Idibon takes unstructured data like emails, instant messages and social media, and provides structured answers to key business intelligence questions.

The company is heavily Stanford-connected: the CEO, Rob Munro, is, like Tyler, a Stanford Ph.D. in linguistics, and three of its advisors — Dan Jurafsky, Chris Manning, and Chris Potts — are Stanford faculty members in linguistics (and other departments as well).

2 Responses to “Schnoebelen at idibon”

  1. G.A. Perez Jimenez Says:

    Linguistic Bullying
    Gabina Aurora Pérez Jiménez (Leiden University)

    This year ‘Chalcatongo Mixtec’ (Sahin Sau) has been proclaimed the “weirdest language of the world” by linguists connected to the organization Idibon in the U.SA. ( In normal English usage “weird” can hardly been understood as a neutral term (as would be “exceptional” or “original”); it connotes meanings such as “strange, freaky and bizarre”. As a speaker, teacher and researcher of the Sahin Sau language, I find this qualification inappropriate and scientifically unsound.
    Sahin Sau is spoken in the village of Chalcatongo, in the State of Oaxaca, southern Mexico. It is not a stand-alone language, but it is a local variant of Mixtec, a Mesoamerican language spoken by approximately half a million people in Mexico and the U.S.A. The Mixtec language has an interesting history and literature. Before the Spanish colonization of Mexico (1521) the Mixtecs made use of a sophisticated pictographic script: in folding books of deerskin (codices) they registered sacred narratives, royal dramas and the genealogies of the different dynasties who ruled in their area. After colonization, Mixtec was written with the alphabet introduced by the Spanish monks: the first alphabetical book in Mixtec was printed in 1567, followed by a first grammar and a first comprehensive Spanish-Mixtec dictionary (published in 1593). Mexican colonial archives contain numerous writings in the Mixtec language. Even today grammars, dictionaries and texts are produced in different variants of Mixtec by Mixtec authors and by linguists.
    Idibon relies on the World Atlas of Language Structures, an inventory of several abstract features of languages. The whole investigation is based on a selection of only 21 features, which are compared in 2676 languages. The more than 4,000 other languages spoken in the world are not (yet) considered and the arguments for the selection of the features are not clear. In this case, the relevant data were extracted exclusively from a dissertation on Chalcatongo Mixtec, written by Monica Macaulay (1996). This linguist from the U.S.A. is not a speaker of Mixtec – many linguists do not even have an elementary speaking knowledge of the indigenous languages they claim to be experts in. She worked with Mixtec-speaking immigrants in the U.S.A. and did additional interviews in Chalcatongo. Macaulay’s study was not done to serve the World Atlas, but is now used in that Atlas as the reference publication to identify certain characteristics of Mixtec. The Atlas does not pay attention to the language as a whole, with all its variations, nor to its history, literature and social or cultural context. Specific fieldwork to check the information provided by Macaulay’s thesis was not conducted.
    An important reason for qualifying Chalcatongo Mixtec as exceptional (“weird”) is the supposed feature of not distinguishing between a question and an affirmative sentence. In my view this is a misunderstanding: that distinction is clearly made by using intonation, stress, or question words, while there are also words to express or stress the affirmative character of a sentence.

    Reading and checking the Idibon statement I get the strong impression that it is based on very shallow scholarship, which makes the conclusions premature and unreliable, to say the least.
    Young doctor Tyler Schnoebelen (Stanford University) himself admits that this research is not only incomplete and superficial, but also arbitrary, i.e. based on a selection of abstract “features” that follows a specific (“Western”) scientific tradition: “the linguists who developed and annotated the features were mostly speakers of European languages. What features might a person from Papua New Guinea or Ethiopia or the Amazon have come up with instead?” Right, but why then even start talking about “the weirdest language”?
    For these linguists the identification of the “weirdest language of the world” seems to be mainly a matter of having fun. Schnoebelen joked about his “discovery” that Chalcatongo Mixtec would not distinguish questions from affirmative sentences: “I have spent part of the day imagining a game show in this language”. Monica Macaulay reacted cheering on the internet “I’m # 1”. I wonder how such linguistic researchers come to think they have the authority to qualify a language as “weird”. That term itself already belongs to a less serious context, and evokes associations with prejudiced and narrow-minded individuals who find everything beyond their limited horizon crazy or frightening: the familiar small world is standard, everything else a strange deviation. Essentially the same attitude is found in social, cultural or racial discrimination.
    In any case, such linguists show little sensitivity for the social context of their profession. Many languages in the world are still discriminated against and stigmatized, not taught but prohibited in schools, and not supported by governments, but rather publicly treated as inferior (indeed “weird”) dialects. As a result of this attitude (and the corresponding negative education policy), most languages in the world will become extinct in the next hundred years. As for Mexico, it may be said that the majority of the approximately 60 indigenous languages spoken there (with their corresponding literatures and knowledge systems) are dead already – only the last speakers are still alive.
    In this tragic context, it would be fitting for linguists not to play silly games about which language they wish to call the “weirdest”. Instead they should do their utmost to contribute with solid research to the documentation, preservation and development of the linguistic and literary heritage of our society. And above all they should promote with concrete measures, scholarships and personal engagement the access of native speakers to university education, to participation in research on an equal footing and to leading positions in academia.
    All over the world we see indigenous peoples still being preyed upon by researchers from “Western countries” (linguists, anthropologists, archaeologists, historians, sociologists etc.), who often are only interested in their own career. To them our peoples are mere “objects of study” or “informants”. In their comfortable and privileged institutions they enjoy having fun with all those “interesting” and “weird” topics, and publish without thinking all kinds of speculative “theories” and useless “knowledge” to get their moment of fame and their next grant. They really do not care if those statements are discriminatory or offensive to anybody. Very rare are those who protest against the bad living conditions and oppression of indigenous peoples; even less try to contribute to their emancipation. Very few academics are collaborating with indigenous communities or actively creating opportunities for talented indigenous students. The social injustice, exploitation, exclusion and racism that indigenous peoples are still facing today, are an unpleasant – but real – context that many self-proclaimed “intellectuals” do not want to bother with: no concrete solidarity, no practical help, not even moral support. And then talking about “the weirdest language”: that is not a scientific contribution but merely a form of linguistic bullying.

    G. A. Pérez Jiménez is a researcher in the framework of a project on ‘Time in Intercultural Context’, financed by the European Research Council at the Faculty of Archaeology, Leiden University, The Netherlands. She published a course on Chalcatongo Mixtec (Sahin Sau, Curso de la Lengua Mixteca, Oaxaca 2008) and is co-author of several publications on Mixtec language, culture and history.

    • arnold zwicky Says:

      This is entirely inappropriate as a comment on my posting, since it has nothing to do with the content of that posting. I’ve let it through, but I will delete other off-topic comments from you. There are plenty of places to post, but using comments on unrelated postings as a vehicle for expressing opinions and arguments is just wrong; comments are not free spaces for posting on whatever interests or concerns you.

