Archive for the ‘Language processing’ Category

News for penguins: the misread petrel

May 15, 2019

Passed along on Facebook recently, a BBC One clip from 12/13/18, with this header:


I read the header before I looked down at the scene. And what I read was:

Emperor penguin chicks take on a giant pretzel

I found this mightily puzzling. The Giant Pretzels of the Antarctic? Then I saw the petrel.


3 for 15

November 15, 2017

Three recent cartoons, on different themes: a One Big Happy in which Ruthie misparses an expression; a Rhymes With Orange that requires considerable cultural knowledge for understanding; and a Prickly City that takes us once more into the territory of pumpkin spice ‘high quality’, now in a political context:


Why is this so hard to process?

April 21, 2014

From Chris Waigl, passed on by Chris Hansen:


The problem begins with the subject, a longboat full of Vikings. The (syntactic) head of this phrase is certainly longboat (and that’s what determines agreement on the verb), but it’s functioning here semantically / pragmatically as as an expression of measure, much like a collective noun. So the question is whether the subject is “about” a longboat or “about” Vikings. (Animate beings, especially humans, are especially favored as topics, ceteris paribus, so we should probably look to the Vikings.)

At the same time, the first sentence introduces the British Museum and the Palace of Westminster, implicitly (but quite subtly) introducing the Members of Parliament as entities in the discourse, though probably not as the topic.

Then we get the second sentence, which is clearly about Vikings (uncivilized, destructive, and rapacious), not boats (or the Members of Partliament, for that matter).


Word divisions

July 12, 2013

Today’s Pearls Before Swine, in which Pig continues to have language problems:

So Pig gets the word division wrong. But the sign-maker isn’t blameless here: the sign is printed solid, rather than divided — and (like so many sign-makers these days) eschews apostrophes, so that the sign as printed is ambiguous. Goat gets it right: MEN’S WEAR.

Schnoebelen at idibon

June 14, 2013

My friend (and former student) Tyler Schnoebelen now blogs regularly on the site of the company he works for, idibon (in San Francisco), where he’s Senior Data Scientist. These postings look at matters with a NLP (natural language processing) angle to them, but always with an engaging take on the material and often with an unexpected choice of topic. Four recent postings of this sort:


Dance with the one that’s nearest?

November 6, 2012

On today’s Morning Edition on NPR, in the story “Without Heat, Sandy Victims [‘victims of the storm Sandy’, not ‘victims who are covered with sand’] Guard Their Homes”:

He’s living in a house that was partially flooded so it doesn’t get robbed – for a second time.

The sentence adverbial so it doesn’t get robbed … is clearly intended to modify the main clause (he’s living in a house …) — it offers a reason for this man to live in a house that was partially flooded — but some listeners probably had a moment of wondering about partially flooding the house so it doesn’t get robbed. The intended interpretation involves “high attachment” (HA), to the main clause preceding the so-adverbial, rather than “low attachment” (LA), to the relative clause within the main clause. It’s been noted again and again that LA is preferred in syntactic processing, but also noted (see here, for example) that this is only a default, with context, real-world knowledge, and discourse organization often favoring HA instead.

In the cases that people have looked at in terms of LA vs. HA, the issue is how some constituent C  is parsed with respect to preceding material: is it parsed with a lower, smaller predecessor constituent B or with a higher, more inclusive predecessor A (ending in B)? Since the head word of B (was (flooded) in the hurricane example above) will of necessity be nearer to C (the so-adverbial in this example) than the head of A (is (living) in this example) is, this preference is often thought of as a preference for attachment to the nearest, but it’s the structural relationships that are key here.


Redundancy vs. simplicity

April 4, 2012

From David Parkinson on Facebook, an expression of his frustration in his German class:

If your language (like English) doesn’t have much inflectional morphology, then learning a language with a respectable amount of it (like German) can be a chore: you have to learn to mark all sorts of distinctions in grammatical categories that don’t come naturally to you.

Many of these inflectional marks are, at least in part, redundant (in a technical sense); they reinforce category distinctions that are marked in other ways. Marks of agreement are like this. So, in German, the definite article agrees in case, gender, and number with its head noun.

Speaking very crudely, these redundant marks are helpful to the hearer, by giving extra cues to relationships among the parts of phrases and clauses. They aid comprehension.

On the other hand, these redundant marks require effort on the part of the speaker, in planning language production and and accessing the appropriate inflectional forms. They work against simplicity.

There are trade-offs here. Redundancy is good. But simplicity is good too.


The context of danglers

October 1, 2011

What follows is an abstract for an academic conference (explanation to come) on “dangling modifiers” in context. This is only an abstract, with a 200-word limit and no space for a bibliography (though I’ll add two items below).


Premodifier, postmodifier

July 29, 2011

On Facebook today, Jeff Shaumeyer unloaded a variety of linguistic oddities that had come past him recently. Including this challenge to language processing:

True Blood Actor Denis O’Hare Marries Partner Hugo Redwood

Former Vampire King of Mississippi Russell Edgington portrayer Denis O’Hare married his partner, interior designer Hugo Redwood yesterday in New York. (link)

Thank goodness for the headline. Otherwise, as Shaumeyer observes, the sentence approaches crash blossom proportions.


The nanosecond of uncertainty

October 31, 2009

A couple of years ago, Neal Whitman and Mark Liberman scrutinized a claim by James J. Kilpatrick. From Mark’s summary, here:

James Kilpatrick complained in print about the “horrid” headline “Mass Transit Not An Option for All Drivers”, on the grounds that “if mass transit is not an option for ‘all’ drivers, it cannot be an option for even one driver”. He added, “Even a little ambiguity is a dangerous thing. The problem with this Horrid Example is that it creates a nanosecond of uncertainty.”

Neal Whitman and I ignored the “nanosecond of uncertainty” business, since a literal application of this idea would put pretty much all of the English language off limits.

Mark and Neal focused instead on Kilpatrick’s treatment of negation and quantification (and Jan Freeman joined in with a discussion of another example from this point of view). Here I’m going to go a bit further with the “nanosecond of uncertainty” matter and the dangers of “even a little ambiguity”.