26 Jul Parallel Distributed Pattern Processing
We have discussed recognition processes in the brain. Connectionism, a fundamentally implicit approach to neural modeling, was championed by the parallel distributed processing (PDP) group. PDP networks use many interconnected processing elements (PEs) that, according to the PDP Group, configure themselves to match input data with “minimum conflict or discrepancy” (Rumelhart & McClelland, 1986, Vol. 2, p.545). Connectionist systems continually tune themselves by adjusting their weights, making learning continuous.
In this model, new concepts, when fed into the network, build qualitatively different state configurations. As the network learns, “information is passed among the units, not by messages, but by activation values, by scalars not symbols” (ibid). This resembles the brain in that the output consists of hot spots rather that ASCII text or streams of numbers. The results, then, arise from the active states in the system, not explicit messages.
|Understanding Context Cross-Reference
|Click on these Links to other posts and glossary/bibliography references
|Patterns in the Mind
|Rumelhart 1986 Hinton 1984
Compare this to conventional systems in which learning takes place through changes in the data or its structure as explicitly represented in memory. The idea of a disk drive in the brain with numbered memory registers, from which you can look up data is clearly ludicrous. Conventional systems distinguish between information being processed and the processes themselves. “In this new approach, learning takes place by changes in the system itself. Existing connections are modified, new connections are formed, old ones are weakened. In the PDP system, they are the same: The information is reflected in the very shape, form and operation of the processing structures” (ibid).
One of the fundamental concepts of implicit distributed- processing theory is that the representations of specific items of data in the system are neither explicit nor predetermined. As a result of learning, the network assumes a certain set of weights for each connection. These weights are somewhat random, and they are clearly implicit representations of the learned data. In a sense, the representation of the data is not in the system, but it is the system because the entire system constitutes the data.
Determination of weights is a function of reducing the error by techniques such as backpropagation or the generalized delta rule. Thus, “learned” data is implicitly wrong from the beginning and converges toward implicit correctness. Some ANS learning theory, however, has been influenced by fundamental issues opposed to the implicit representation theory.
Minsky and Papert raised objections to strictly implicit models as early as 1969, pointing out that “significant learning at a significant rate presupposes some significant prior structure…there is little chance of much good coming from giving a high-order problem to a quasi-universal perceptron whose partial functions have not been chosen with any particular task in mind” (1969, p. 18).
This does not necessarily contradict the implicit representation theory. Still, it raises the question of balance between the flexibility of connectionist models and the power of explicit symbolic representation of data in computational modeling of the brain. If we insist on a connectionist model with no explicit representations of data, we eliminate the possibility of combining heuristic methods with spreading activation. This makes search, often a key part of automating a system, slower and more difficult, if possible at all.
In my prior post on Pattern Classification in Space, I described how some problems have two dimensions, and other problems have multiple dimensions. In a way, we can think of constraints, if not as dimensions of a problem, at least as skewing factors. PDP networks have been used successfully in trend-line analyses, image analysis, and other problems with limited dimensionality. My deepest interest is in automated language understanding and translation. I have shown that this problem has multiple dimensions (See Three-Dimensional Model of Language, A Slice of Language and Pairs of Language Strata). While PDP networks are fundamentally brain-like in their processing model, the implicit constraint may limit them. They may be a part of the language understanding solution, but I think we need to look for algorithms or approaches with various capabilities not strongly exhibited in PDP networks.
|Click below to look in each Understanding Context section
|Perception and Cognition
|Language and Dialog
|Apps and Processes
|The End of Code