Mining back-formations

Over on ADS-L, an thread has developed on quote mining, from which it turns out that the two-part verb to quote mine has been back-formed from the synthetic compound quote mining (itself possibly created on the model of the compound data mining or developed by independent invention). Which led Joel Berson to wonder if to data mine has been created yet. It certainly has, along with to data dredge, to data fish, and to data snoop. Back-formation flourishes.

(These two-part back-formed verbs are sometimes written solid, sometimes hyphenated, and sometimes separated. I’ll treat these orthographic versions as equivalent for the purposes of this posting.)

The story starts with the synthetic compound data mining (also data miner). From the Wikipedia entry:

Data mining …, a relatively young and interdisciplinary field of computer science, is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management.

… The related terms data dredging, data fishing and data snooping refer to the use of data mining techniques to sample portions of the larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered.

Back-formation then provides to data mine:

To Data Mine or Not to Data Mine in the Fight Against Terrorism (link) [intransitive]

How to Data-Mine for Trends Before You Date Around (link) [intransitive]

US Senate decides to data-mine the Internet everywhere for words like SLASH and KILL. (link) [transitive, ‘search by data mining’]

(tons of examples) and related back-formations:

If investigators don’t set out their hypotheses at the beginning, there may be the tendency to data dredge until they find something. (link) [intransitive]

The problem is that it’s not generally a good idea to data-dredge in this way. Your best bet is to think about the characteristics of the data (discrete or continuous, non-negative or real, symmetric or skewed) and try to narrow it down to a few distributions … (link) [intransitive]

There are several ways in which a trial report can make the results appear more impressive than they really are. One of these is to “data fish” amongst a large number of outcomes, rather than focus on a single, pre-specified primary outcome. (link) [intransitive]

Finally, if you are trying to create trading strategies, don’t overlook the ease with which computers allow you to data-snoop. (link) [intransitive]

Its also relatively trivial to data snoop something that vaguely fits considering the mountains of financial ratios/commodities/etc that are in the world. (link) [transitive, ‘find by data snooping’]

On to quote mining. From the Wikipedia entry on “Fallacy of quoting out of context”:

The practice of quoting out of context, sometimes referred to as “contextomy” [lovely coinage attributed to Milton Mayer] or “quote mining”, is a logical fallacy and a type of false attribution in which a passage is removed from its surrounding matter in such a way as to distort its intended meaning.

… Scientists and their supporters used the term quote mining as early as the mid-1990s in newsgroup posts to describe quoting practices of certain creationists. It is used by members of the scientific community to describe a method employed by creationists to support their arguments, though it can be and often is used outside of the creation-evolution controversy. Complaints about the practice predate known use of the term …

And then, of course, the back-formed verb:

How not to quote mine Einstein (link)

In the last post we saw that accusers are willing to quote mine the released CRU emails, selectively taking a choice phrase at face value and missing the preceding and proceeding context in the longer email. (link)

To quote mine in this way, the miner MUST have read it in context and then decided which bit best supported his cause. This tactic goes beyond cunning, sly and underhand – it enters the realm of reprehensible duplicity. (link)

Quote mine is often transitive, with an object denoting the source of the mined quotes, but (as in the third quote) can be intransitive, denoting the general practice.


3 Responses to “Mining back-formations”

  1. Rick Wojcik Says:

    Don’t forget “text mining”, which computer scientists like to distinguish from “data mining”. Text mining and data mining represent different communities of researchers, but people often use “data mining” for both. Hence, my management at work has put both data miners and text miners in the same group, even though they work with very different data (metadata records vs. text corpora) and use very different analytical techniques.

  2. xkcd back-formation « Arnold Zwicky's Blog Says:

    […] data mine and related verbs here and insider-trade here. I have hundreds in my files. Like this:LikeBe the first to like this […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: