Back on the 8th, Charlie Doyle posted plaintively to ADS-L about a puzzle in alphabetization:

Yesterday my daughter-in-law called me with a question about my third-grader grandson’s homework. The assignment was to alphabetize a list of words, and the list included the four items girl/girl’s/girls/girls’. (My daughter-in-law made clear than both the academic career of my grandson and the family’s standing in the community were at stake, since the parents of the other third-graders were also depending on my answer.)

I failed. I could tell her that there exist various styles of alphabetizing, that certain traditional “rules” obtain, one of which is “Ignore apostrophes” — but the rules I am aware of don’t fully address the case at hand. I could tell her that if the Microsoft Corporation is asked to “sort” the words alphabetically, they will appear in the order in which I have listed them above, which seems reasonable — but not, as far as I can determine, “authoritative.”

Any suggestions?  (I don’t recall that third grade used to be this hard!)

Two issues here: one, why is the question being asked? and two, what’s the answer?

Why ask? It struck me as absurd that anyone, much less third-graders, should be expected to know the answer to the question. In fact, I challenged the assumption that there is one “right” answer to it, and I wondered why the question was being asked in the first place.

Let’s grant that alphabetization — the sorting of written expressions into a linear order based on the conventional sequence of letters of the alphabet — is a useful skill, and that understanding how expressions are alphabetized is useful for hand-searching of alphabetized lists (such as entries in a dictionary, indices for books, and bibliographies).

The core principle of alphabetization arranges expressions by first letter, then by second letter, and so on (A… before B…, AA… before AB…, ABA… before ABB…, etc.). That’s easy enough to learn (and teach), but all sorts of puzzles and arcana arise when you look at the details.

Here’s the compact version of the style sheet of the American Psychological Association on alphabetization:

1. Alphabetize letter by letter.

2. Ignore spaces, capitalization, hyphens, apostrophes, periods, and accent marks.

3. When alphabetizing titles or group names as authors, go by the first significant word (disregard a, an, the, etc.).

Each rule is a convention — a convention that is by no means universally followed. Rule 1 prescribes “letter by letter” alphabetization, but if you look at the Chicago Manual of Style, you’ll see that there’s a competing “word by word” scheme (more on this in a moment). And rules 2 and 3 tell you to disregard or ignore various items, with the result that expressions that are visually distinct are treated as equivalent for the purposes of alphabetization, so that the rules don’t in fact determine a unique ordering of expressions (at least not without subsidiary rules for imposing a sequencing of equivalent expressions). In particular, these “simple” rules don’t order girls, girl’s, and girls’. They don’t even order girl before these three, as they should in anyone’s style of alphabetization; there’s a suppressed rule 0 in operation here: Nothing comes before something (so that A is ordered before AA, etc.).

Before going into these details, let’s step back and ask whether it’s useful for people in general to know about them, and in particular whether it makes sense to be teaching them to schoolchildren, especially in light of the fact that there are many different styles of alphabetization.

The ADS-Lers wondered if the kids in Charlie Doyle’s grandson’s class had in fact been taught rules that would apply to words with apostrophes (or hyphens) in them. If so, then they were being asked to apply these rules to the case at hand (and Charlie’s counsel would count for nought). Garson O’Toole brought up the possibility that, instead, the question was a “stunt” question,

where the teacher does not expect a specific answer, and the goal is to teach about conflicting conventions and answer justification.

If so, the teacher should have made that clear and not expected the students to divine the teacher’s intent. (Victor Steinbok then suggested entertaining ways for the students to offer their own explicit rules for imposing a unique ordering.)

But what’s the answer? I went on to list some of the choices of alphabetization style that are out there:

Letter-by-letter vs. word-by-word (alphabet before alpha decay, going letter by letter, or after it, going word by word)? Disregard capitalization or order upper case before lower case or order lower case before upper case?  Treat numerals as coming before alphabetic characters, or after them, or as if they were spelled out in letters? Disregard punctuation or order punctuation marks before alphanumerics (or after them)?  Treat the prefixes Mc and Mac as equivalent or as ordered letter-by-letter? Disregard internal spaces, or extend “nothing before something” to the case of internal spaces?  And so on.

And I noted that the Microsoft ordering is character-by-character, following ASCII order; as a result, nothing comes before something, and punctuation marks (like the apostrophe, ASCII 39) come before alphabetic characters (which start at ASCII 65, with upper case before lower case). This gives the ordering of the girl words that Charlie Doyle reported.

This scheme is definitely not traditional, but it has the virtue of always giving a clear answer without human judgment; it’s eminently automatizable. The results are not always attractive; for instance, the algorithm doesn’t disregard initial the, a, or an in titles, since these are just character sequences. (But this ordering is amply attested on the net, for instance in the ordering of blog titles in Language Log’s blogroll.)

Garson O’Toole pointed out that the case of the girl words has come up on Yahoo! Answers at least twice:

[2007] In alphabetizing girl’s vs. girls’ what is the common rule? (link)

[2006] How would you alphabetize the words girls, girl’s and girls’?  (link)

Both answers were confused, though the first seems to boil down to the observation that girl’s is a form of girl and girls’ is a form of girls, so since girl comes before girls, girl’s comes before girls’; while the second argues for  girl’s < girls < girls’, maybe because the writer put girl’s before girls by the reasoning above, and girls before girls’ on the nothing-before-something principle.

In any case: girl < girl’s < girls < girls’, without an appeal to ASCII. Or with it.

Bonus. In assembling the notes above, I looked at the entry for alphabetize in the new AHD5. Well, I looked at the entry in the iPad app, and there discovered the ordering of entries:

[separated] alpha decay < [hyphenated] alpha-blocker < [solid] alphabet

This is a version of word-by-word ordering, probably a consequence of ASCII sorting. Fine by me, but then I looked at the dictionary’s front matter, which says, laconically, “All entries … are listed in alphabetical order” but helpfully illustrates this order with the list:

TBD
TBG
Tbilisi
T-bill
TBM
T-bone
tbs.
Tc
TCA
T cell
Tchaikovsky, Peter Ilich

which is transparently letter-by-letter ordering.

Ah, I then thought to look at the hard-copy AHD5, and there found

alphabet < alpha-blocker < alpha decay

just as the front matter would predict.

The different alphabetizations make sense: people will use automated search schemes when using the electronic version (so that the actual ordering of entries is pretty much irrelevant to the user) but hand-searching when using the printed version (in which case the alphabetization scheme becomes important to the user, and I doubt that any printed dictionary would want to explain ASCII ordering to its users, while letter-by-letter ordering is pretty easy to explain).

