Archive for the ‘Language processing’ Category

Why is this so hard to process?

April 21, 2014

From Chris Waigl, passed on by Chris Hansen:

 

The problem begins with the subject, a longboat full of Vikings. The (syntactic) head of this phrase is certainly longboat (and that’s what determines agreement on the verb), but it’s functioning here semantically / pragmatically as as an expression of measure, much like a collective noun. So the question is whether the subject is “about” a longboat or “about” Vikings. (Animate beings, especially humans, are especially favored as topics, ceteris paribus, so we should probably look to the Vikings.)

At the same time, the first sentence introduces the British Museum and the Palace of Westminster, implicitly (but quite subtly) introducing the Members of Parliament as entities in the discourse, though probably not as the topic.

Then we get the second sentence, which is clearly about Vikings (uncivilized, destructive, and rapacious), not boats (or the Members of Partliament, for that matter).

(more…)

Word divisions

July 12, 2013

Today’s Pearls Before Swine, in which Pig continues to have language problems:

So Pig gets the word division wrong. But the sign-maker isn’t blameless here: the sign is printed solid, rather than divided — and (like so many sign-makers these days) eschews apostrophes, so that the sign as printed is ambiguous. Goat gets it right: MEN’S WEAR.

Schnoebelen at idibon

June 14, 2013

My friend (and former student) Tyler Schnoebelen now blogs regularly on the site of the company he works for, idibon (in San Francisco), where he’s Senior Data Scientist. These postings look at matters with a NLP (natural language processing) angle to them, but always with an engaging take on the material and often with an unexpected choice of topic. Four recent postings of this sort:

(more…)

Dance with the one that’s nearest?

November 6, 2012

On today’s Morning Edition on NPR, in the story “Without Heat, Sandy Victims ['victims of the storm Sandy', not 'victims who are covered with sand'] Guard Their Homes”:

He’s living in a house that was partially flooded so it doesn’t get robbed – for a second time.

The sentence adverbial so it doesn’t get robbed … is clearly intended to modify the main clause (he’s living in a house …) — it offers a reason for this man to live in a house that was partially flooded — but some listeners probably had a moment of wondering about partially flooding the house so it doesn’t get robbed. The intended interpretation involves “high attachment” (HA), to the main clause preceding the so-adverbial, rather than “low attachment” (LA), to the relative clause within the main clause. It’s been noted again and again that LA is preferred in syntactic processing, but also noted (see here, for example) that this is only a default, with context, real-world knowledge, and discourse organization often favoring HA instead.

In the cases that people have looked at in terms of LA vs. HA, the issue is how some constituent C  is parsed with respect to preceding material: is it parsed with a lower, smaller predecessor constituent B or with a higher, more inclusive predecessor A (ending in B)? Since the head word of B (was (flooded) in the hurricane example above) will of necessity be nearer to C (the so-adverbial in this example) than the head of A (is (living) in this example) is, this preference is often thought of as a preference for attachment to the nearest, but it’s the structural relationships that are key here.

(more…)

Redundancy vs. simplicity

April 4, 2012

From David Parkinson on Facebook, an expression of his frustration in his German class:

If your language (like English) doesn’t have much inflectional morphology, then learning a language with a respectable amount of it (like German) can be a chore: you have to learn to mark all sorts of distinctions in grammatical categories that don’t come naturally to you.

Many of these inflectional marks are, at least in part, redundant (in a technical sense); they reinforce category distinctions that are marked in other ways. Marks of agreement are like this. So, in German, the definite article agrees in case, gender, and number with its head noun.

Speaking very crudely, these redundant marks are helpful to the hearer, by giving extra cues to relationships among the parts of phrases and clauses. They aid comprehension.

On the other hand, these redundant marks require effort on the part of the speaker, in planning language production and and accessing the appropriate inflectional forms. They work against simplicity.

There are trade-offs here. Redundancy is good. But simplicity is good too.

 

The context of danglers

October 1, 2011

What follows is an abstract for an academic conference (explanation to come) on “dangling modifiers” in context. This is only an abstract, with a 200-word limit and no space for a bibliography (though I’ll add two items below).

(more…)

Premodifier, postmodifier

July 29, 2011

On Facebook today, Jeff Shaumeyer unloaded a variety of linguistic oddities that had come past him recently. Including this challenge to language processing:

True Blood Actor Denis O’Hare Marries Partner Hugo Redwood

Former Vampire King of Mississippi Russell Edgington portrayer Denis O’Hare married his partner, interior designer Hugo Redwood yesterday in New York. (link)

Thank goodness for the headline. Otherwise, as Shaumeyer observes, the sentence approaches crash blossom proportions.

(more…)

The nanosecond of uncertainty

October 31, 2009

A couple of years ago, Neal Whitman and Mark Liberman scrutinized a claim by James J. Kilpatrick. From Mark’s summary, here:

James Kilpatrick complained in print about the “horrid” headline “Mass Transit Not An Option for All Drivers”, on the grounds that “if mass transit is not an option for ‘all’ drivers, it cannot be an option for even one driver”. He added, “Even a little ambiguity is a dangerous thing. The problem with this Horrid Example is that it creates a nanosecond of uncertainty.”

Neal Whitman and I ignored the “nanosecond of uncertainty” business, since a literal application of this idea would put pretty much all of the English language off limits.

Mark and Neal focused instead on Kilpatrick’s treatment of negation and quantification (and Jan Freeman joined in with a discussion of another example from this point of view). Here I’m going to go a bit further with the “nanosecond of uncertainty” matter and the dangers of “even a little ambiguity”.

(more…)

Slifted allegations

October 28, 2009

In the letters section of the New York Times on October 25 (in the Week in Review section), readers commented on conflicts between the public’s right to know and the rights of those involved in legal proceeedings. The last letter accused the Times (and other news media) of subverting the presumption of innocence, via the syntax of the sentences the paper uses to report charges (involving a construction known in the syntactic literature as Slifting).

Clark Hoyt, the public editor of the Times, then explained why journalists sometimes chose Slifting, but conceded that the letter-writer had a point.

(more…)

A momentary compound problem

May 9, 2009

The editorial “Toward Fair Lending” in today’s New York Times begins:

The predatory lending bill that passed the House on Thursday is less than what is needed …

I had a moment of right-branching parsing of predatory lending bill, as

predatory + [ lending + bill ]

(saying that some bill about lending is predatory), though I quickly realized that this interpretation was absurd and that the intended parsing was left-branching:

[ predatory + lending ] + bill

(referring to some bill about predatory lending, that is to say, about lending in a predatory fashion).

I’m not faulting the editorial writer for producing a potentially ambiguous expression (though “the bill about predatory lending” would have been clearer, at the expense of an extra word); potential ambiguities are everywhere, after all. Probably most readers moved right over “predatory lending bill” without a twinge.

Right-branching vs. left-branching has been in the literature for about 50 year, at first in connection with the idea that right-branching structures were easier to process than left-branching ones, at least for English speakers, though the topic was quickly complicated by the observation that some languages are rich in right-branching constructions (and were consequently labeled “right-branching languages”), while others are rich in left-branching constructions (and were consequently labeled “left-branching languages”).

In English, some NP examples can go either way out of context: small children’s school ‘school for small children’ (left-branching) or ‘children’s school that is small’ (right-branching). But even out of context, many examples massively favor one or the other parsing (because of real-world plausibility): young children’s school ‘school for young children’ (left-branching), new children’s school ‘children’s school that is new’ (right-branching).


Follow

Get every new post delivered to your Inbox.

Join 242 other followers