03 May Up from Words to Sentences
Words and Sentences
The game of Scrabble is completely about words. Crossword puzzles go from sentence or phrase to word. Our analytical approach begins with the word, but doesn’t stop there. From both the cognitive and computational perspectives, the sentence exhibits far more complexity and changeability than the word. At any given moment, the structure of the sentence, in any language, is dynamic and infinitely variable. For computers, NL analysis is done by parsing. Parsers match words in a text to words in machine-readable dictionaries. They then infer attributes based upon additional information in the dictionaries and grammar rules often contained in separate knowledge bases. Both human comprehension and machine analysis attempt to correctly glean the intended message of the NL utterance or text.
|Understanding Context Cross-Reference|
|Click on these Links to other posts and glossary/bibliography references|
|Prior Post||Next Post|
|Linguistic Building Blocks||A Slice of Language|
|cognitive inference||Grosz 1986 Matsumoto 1983|
|sentence word intent||Chomsky 1968 Hanson 1987|
|NL comprehension||Winograd 1983 Lamb 1966|
Comprehension is a non-trivial task. For humans, many years of repetitive exposure, instruction, use, and correction are required to develop comprehension skills. Despite the long-term learning process, humans still exhibit vast disparites in language competence. In fact, these disparities can become a mark of individualism and expression. In the book I’m now reading, the musical, and often ill-formed speech of World War I era Welsh laborers is an important part of the story, giving humanity, dimension and vibrant color to the story’s protagonists. The rigid “garbage in – garbage out” model that characterizes end-user computing fails to accommodate these natural, and rich, human differences.
For computers, robust comprehension has been under research and development for decades. The fact that robust understanding has only been achieved in machine-based systems at a massive cost of computational resources and time attests to the complexity of the task. Many parsing strategies in use today attempt to understand sentences by extracting grammatical features and applying them to the words. The complexity of sentences, as opposed to words, however, suggests a word-based approach could lead to better interpretation. The typical problem with word-based models is that, with separate instances of each word for every word sense, the lexicon can become unmanageably large, slowing the process to a crawl.
Sentence-based grammars describe or prescribe what elements in what order make up well-formed sentences. Word-based grammar theory, however, proposes that syntax and semantics are not properties of sentences only, but also of words. Therefore, a grammar describing semantic and syntactic attributes should describe them from the word-up, rather than (or in addition to) the sentence-down. Studies in the cognitive issues of language understanding, including language acquisition and discourse analysis, tend to support word-based theory.
Because children acquire language a word or a phrase at a time, conceptual representations of meaning are likely to be tied to words (and phrases such as “all gone” and “so big”) rather than sentences. Ungrammatical discourse is often perfectly lucid, so understanding words may be more important than understanding sentences. From a computational perspective, a word-based grammar may yield more robust NL understanding. In addition, modeling and matriculating the attributes of words, sentences, and discourses can yield a symmetrical representation that seems ideally suited for parallel distributed systems.
Rather than parsing from the top down, 3-DG begins at the bottom with the words, binds them to their categories, then applies the necessary reductions to infer the clause, phrase and sentence structure. Bottom-up techniques have generally proven efficient in providing high-speed, robust understanding (Matsumoto, 1983).
Bottom-up parsers have been written both in object-oriented and procedural programming languages, yielding impressive results. The logic of reduction used in bottom-up parsing also resembles the human act of logical reasoning. As we look further at automated NL understanding techniques, we will see how the parsing processes demonstrated by Matsumoto and others lead to powerful stratified models.
|Click below to look in each Understanding Context section|
|4||Perception and Cognition||5||Fuzzy Logic||6||Language and Dialog||7||Cybernetic Models|
|8||Apps and Processes||9||The End of Code||Glossary||Bibliography|