10 May Word Structure Analysis
Analyzing Morphology
In many natural languages, gender or case has a profound affect on morphology – so much so that students have to memorize conjugation tables. It’s really easy to teach computers about conjugation tables and they remember very well. If it were not so, it may be difficult to keep their attention long enough to become proficient. For processing text, morphology can be the first characteristic that is analyzed. It is a natural choice because it is the closest to the surface of linguistic features of text (speech has pronunciation closer to the surface), and it is easily analyzed on the basis of the word strings alone.
I have not yet paid much attention to the overall structure of the text and other deeper features. These discussions will come soon. Today, I want to show how certain types of ambiguity can be resolved at the morphological level.
Understanding Context Cross-Reference |
---|
Click on these Links to other posts and glossary/bibliography references |
|
|
Prior Post | Next Post |
Stratum Morphology | Patterns at the Boundaries |
Definitions | References |
natural language morphology | Hammond 1988 |
computers attention ambiguity | Jappinen 1986 |
linguistics structure feature | Chomsky 1986 Stemberger 1988 |
Hammond and Noonan have proposed a treatment of morphology wherein the grammar model consists of “an autonomous morphological component that would be given complete responsibility for the creation of words, thus removing the syntax from word formation entirely. The new component would consist of a dictionary, which contain [sic] all and only the words of a language; a list of morphemes, understood as being distinct from the dictionary; a list of word formation rules; and a filter, which specifies exceptions and adds any idiosyncratic information, such as the fact that REFERENCE and REFERRAL have semantic idiosyncrasies vis-à-vis the verb REFER” (1988, p.4).
Hammond and Noonan suggest the introduction of a multi-strata mechanism for analyzing morphology as a way to deal with morphological irregularities, such as the interaction of inflection and compounding (1988, p.7). Another technique is to use the lexicon for explicit storage of all the possible forms of each word. This would, of course, require treating each variant form of a lexeme as a separate lexical item. I don’t think the 3DG model requires one or the other for correctness and efficiency, as long as the rules perform morphological analysis in a way that recognizes new application of affixes to words.
Morphology and Lexicon
In any formalism it is critical to decide what goes in the lexicon. In Chomsky‘s earlier framework for transformational analysis, there was no lexicon at all. If we are to include morphological variants in a lexicon, what constraints can be used to prevent massive over-complication and ten-million word dictionaries? If that sounds like a gross exaggeration, consider this: a Japanese company sells an incomplete electronic dictionary with two-million lexical entries!
Do we need morphology at all? Jappinen and Ylilammi, Finnish computational linguists, developed a computational strategy that used “fully realized morphs as primitives in analysis…analysis amounts then to the ordered recognition of phoneme substrings (morphs) within a phoneme string (input word form). The result of an analysis is the union of the morphemes associated with the morphs. This model resulted in an efficient running system” (1986, p. 270).
Their morphological analysis strategy, tailored to the language being interpreted, proved reasonable. Furthermore, the lexicon was of manageable size. Because the interaction of morphology and syntax has been shown to be so pervasive, even in languages less agglutinative than Finnish, an interleaved morphological/syntactic analysis can improve interpretation capabilities. Because of its proven usefulness, a morphological analysis strategy is part of 3-DG.
Stemberger and MacWhinney (1988, p. 112) propose that irregular forms and high-frequency inflected forms of words are stored in the human lexicon in the human brain. Other morphological changes to words are processed on-line in the brain on the basis of morphological rules. Instead of storing the form in memory, we know the rule; instead of processing the pattern, we process the rule. This is a good model for computational analysis.
Click below to look in each Understanding Context section |
---|