02 May Linguistic Building Blocks
While languages are infinite, each has a finite number of structures, functions and attributes. Functions and attributes are the building blocks of a grammar. Grammars or languages are categorized as regular, context-free (CF), context-sensitive (CS), recursive, and recursively enumerable. A context-sensitive grammar is a powerful formalism that describes the language in terms of known patterns of functions and structures making up the phrases, clauses and sentences in a natural language. The ordering of parts (their juxtaposition and sequence) is the focus of a CS grammar. But, as mentioned above, the infinite variability of patterns is hard to express in a finite set of rules.
|Understanding Context Cross-Reference
|Click on these Links to other posts and glossary/bibliography references
|Traditional Grammar from the Top Down
|Up from Words to Sentences
|language regular structures
|Schank 1972 Sowa 1984
|function recursive grammar
|Chomsky 1968 Jackendoff 1972
Individual words, however, are much more predictable than a language as a whole. Using a bottom-up approach, therefore, applying the grammar to the words rather than the language, provides a nearly infinite variability while accounting for exceptions on an individual basis. It also provides for fantastic flexibility in describing the idiosyncrasies of each word, creating a powerful mechanism for dealing with idiomatic structures and other forms of ambiguity. A key to the success of this approach is building semantic and pragmatic information into each word in context, thereby permitting more generalizable processes, such as syntactic analysis, to be performed from the top down.
By applying the grammar to each word, a parser has only to test the word against applicable patterns, effectively eliminating the overhead created in immense grammars where words must be tested against every rule in the book. At the cost of increasing the size of the database or dictionary, this approach can lead to tremendous increases in the speed of disambiguation.
What Is a Word?
The question of lexical segmentation – “What is a word?” – has long plagued morphologists, syntacticians, AI guys, and others. Consider the German practice of making compound words like this:
LEBENSVERSICHERUNGSGESELLSCHAFTSANGESTELLTER (life insurance company employee).
Obviously, this word could produce problems for a lexicon-based approach that attempts to place all possible “words” in the dictionary. One solution to this problem is to presume that people treat certain words as discrete concepts (or, as in logic, as propositions) and treat others as multiple conceptual tokens, automatically segmenting them each time they hear them.
Roger Schank’s (1972) conceptual dependency model advocates a meaning-centered approach to language understanding. Of course, the meaning of any word is based on the speaker’s background, experiences and language competence. Meaning is also based on the premise that language and lexicon give shape to perceptions and, to some extent, govern meaningfulness in our world. Words are building blocks of language. Conceptual dependency theory, as proposed by Schank and as formalized in conceptual schemata by Sowa (1984), is a model for representing concepts in the human brain or in an automated system. This neuromorphic quality makes the distributed conceptual model appealing as the basis of 3-DG.
If much of language understanding depends on real world knowledge, the relative merits of syntactic models and conceptual models must be weighed very carefully. Did you read my post on Building a Cathedral of Knowledge? In it, I suggested that the individual elements of knowledge form the bricks. The simile goes like this:
The Knowledge Base design has a foundation, binding and buttresses.
- The foundation of the cathedral is context.
- Context is perceived through the senses and inextricably bound to time and place
- The binding mortar is the associations between knowledge objects.
- Built through a lifetime of remembered experiences, associated in context, cured with repetition
- The buttresses are the statisticalprobabilities of the associations between knowledge objects, selectively strengthening the areas that most need reinforcement.
- Reasoned out in our minds, the likelihoods and unlikelihoods of what comes next keep us a step ahead of complete confusion
The bricks of the cathedral, each independent knowledge object, are built on the foundation of context, bound together by the mortar of their associations and buttressed by the strength of their likelihoods within each context. This is my idea of a reliable architecture for knowledge. The key question for today’s discussion is whether or not the knowledge architecture, a weighted, contextual ontology, that can describe the concepts, can be designed to support language analysis. Can such an ontology power an engine capable of extracting the intent from words arranged in sentences, paragraphs, monologues, dialogues, jokes, poems and other forms of word-based communication. As you may rightly presume, my research and prototyping shows that it can.
|Click below to look in each Understanding Context section
|Perception and Cognition
|Language and Dialog
|Apps and Processes
|The End of Code