Modeling competition and uncertainty in word class decisions

Adam Albright
Massachusetts Institute of Technology

In many languages, words fall unpredictably into different classes,
distinguished by different sets of endings, different changes in
their stems, or by other irregularities. When native speakers are
required to inflect unknown or novel words, they must often decide
which class to put them in. In order to do this, they must form
generalizations about what properties of a word most accurately
predict its class membership. There is a growing body of research
showing that in such cases, speakers are able to use detailed,
probabilistic generalizations about the sound, and meaning of words
to decide about its lexical class (Skousen 1989, Bybee 1995, Ramscar
2002, Zuraw 2002, Ernestus and Baayen 2003, Albright and Hayes 2003).
Furthermore, one class typically acts as the "default" or "regular"
pattern, and is used for novel words that do not have any of the
properties that are characteristic of the other patterns (Pinker and
Prince 1988, Prasada and Pinker 1993).



In this talk, I discuss two issues that arise in modeling word class
decisions. The first concerns situations where a new word contains
some properties that are typical of one class, and other properties
that are typical of another class. As might be expected, such cases
can lead to variation: both possibilities sound plausible, and
different speakers choose different outputs. Using data from
irregular vowel alternations in Spanish, I show that such situations
are best modeled using a system of stochastic rules with overlapping
contexts, as proposed by Albright and Hayes 2002. I then turn to a
more puzzling situation, in which multiple patterns may also apply,
but speakers are reluctant to produce any output at all. In this
case, the challenge is to understand why competition leads to
uncertainty, rather than variation, and why the default pattern
cannot apply. I argue that in these cases, the evidence supporting
the relevant generalizations is too sparse in the input data, leaving
speakers uncertain about the reliability of the rules involved.

Presentation (PDF File)

Back to Long Programs