How does connectionism work
They make the interesting observation that a solution to the systematicity problem may require including sources of environmental information that have so far been ignored in theories of language learning. This work complicates the systematicity debate, since it opens a new worry about what information resources are legitimate in responding to the challenge.
However, this reminds us that architecture alone whether classical or connectionist is not going to solve the systematicity problem in any case, so the interesting questions concern what sources of supplemental information are needed to make the learning of grammar possible. Kent Johnson argues that the whole systematicity debate is misguided.
Attempts at carefully defining the systematicity of language or thought leaves us with either trivialities or falsehoods. Connectionists surely have explaining to do, but Johnson recommends that it is fruitless to view their burden under the rubric of systematicity.
Aizawa also suggests the debate is no longer germane given the present climate in cognitive science. What is needed instead is the development of neurally plausible connectionist models capable of processing a language with a recursive syntax, which react immediately to the introduction of new items in the lexicon without introducing the features of classical architecture.
The authors report that their nets showed very accurate generalization at tasks that qualify for demonstrating strong semantic systematicity. The nets exhibited very poor performance when commands in the test set were longer or even shorter , than those presented in the training set.
So they appeared unable to spontaneously compose the meaning of complex expressions from the meanings of their parts. New research is needed to understand the nature of these failures, whether they can be overcome in non-classical architectures, and the extent to which humans would exhibit similar mistakes under analogous circumstances. So this brief account is necessarily incomplete. Aizawa provides an excellent view of the literature, and Calvo and Symons serves as another more recent resource.
One of the attractions of distributed representations in connectionist models is that they suggest a solution to the problem of providing a theory of how brain states could have meaning. The idea is that the similarities and differences between activation patterns along different dimensions of neural activity record semantical information. So the similarity properties of neural activations provide intrinsic properties that determine meaning.
However, when it comes to compositional linguistic representations, Fodor and Lepore Ch. The first problem is that human brains presumably vary significantly in the number of and connections between their neurons.
Although it is straightforward to define similarity measures on two nets that contain the same number of units, it is harder to see how this can be done when the basic architectures of two nets differ. The second problem Fodor and Lepore cite is that even if similarity measures for meanings can be successfully crafted, they are inadequate to the task of meeting the desiderata which a theory of meaning must satisfy. Churchland shows that the first of these two objections can be met.
Citing the work of Laakso and Cottrell he explains how similarity measures between activation patterns in nets with radically different structures can be defined. Not only that, Laakso and Cottrell show that nets of different structures trained on the same task develop activation patterns which are strongly similar according to the measures they recommend.
This offers hope that empirically well defined measures of similarity of concepts and thoughts across different individuals might be forged. However, most connectionists who promote similarity based accounts of meaning reject many of the presupposition of standard theories. They hope to craft a working alternative which either rejects or modifies those presuppositions while still being faithful to the data on human linguistic abilities.
If concepts are defined by everything we know, then the measures for activation patterns of our concepts are bound to be far apart. This is a truly deep problem in any theory that hopes to define meaning by functional relationships between brain states.
Philosophers of many stripes must struggle with this problem. Given the lack of a successfully worked out theory of concepts in either traditional or connectionist paradigms, it is only fair to leave the question for future research. Another important application of connectionist research to philosophical debate about the mind concerns the status of folk psychology. Folk psychology is the conceptual structure that we spontaneously apply to understanding and predicting human behavior.
For example, knowing that John desires a beer and that he believes that there is one in the refrigerator allows us to explain why John just went into the kitchen. Such knowledge depends crucially on our ability to conceive of others as having desires and goals, plans for satisfying them, and beliefs to guide those plans. The idea that people have beliefs, plans and desires is a commonplace of ordinary life; but does it provide a faithful description of what is actually to be found in the brain?
Its defenders will argue that folk psychology is too good to be false Fodor Ch. What more can we ask for the truth of a theory than that it provides an indispensable framework for successful negotiations with others? On the other hand, eliminativists will respond that the useful and widespread use of a conceptual scheme does not argue for its truth Churchland Ch. Ancient astronomers found the notion of celestial spheres useful even essential to the conduct of their discipline, but now we know that there are no celestial spheres.
A viable psychology may require as radical a revolution in its conceptual foundations as is found in quantum mechanics. Eliminativists are interested in connectionism because it promises to provide a conceptual foundation that might replace folk psychology. Presuming that such nets are faithful to how the brain works, concepts of folk psychology fare no better than do celestial spheres.
Whether connectionist models undermine folk psychology in this way is still controversial. There are two main lines of response to the claim that connectionist models support eliminativist conclusions.
One objection is that the models used by Ramsey et al. Ramsey et al. A second line of rebuttal challenges the claim that features corresponding to beliefs and desires are necessarily absent even in the feed forward nets at issue Von Eckardt The question is complicated further by disagreements about the nature of folk psychology. Many philosophers treat the beliefs and desires postulated by folk psychology as brain states with symbolic contents. For example, the belief that there is a beer in the refrigerator is thought to be a brain state that contains symbols corresponding to beer and a refrigerator.
From this point of view, the fate of folk psychology is strongly tied to the symbolic processing hypothesis. So if connectionists can establish that brain processing is essentially non-symbolic, eliminativist conclusions will follow. On the other hand, some philosophers do not think folk psychology is essentially symbolic, and some would even challenge the idea that folk psychology is to be treated as a theory in the first place.
Under this conception, it is much more difficult to forge links between results in connectionist research and the rejection of folk psychology. Two important trends worth mention are predicative coding and deep learning which will be covered in the following section. Predictive coding is a well-established information processing tool with a wide range of applications.
It is useful, for example, in compressing the size of data sets. Suppose you wish to transmit a picture of a landscape with a blue sky. Since most of the pixels in the top half of your image are roughly the same shade, it is very inefficient to record the color value say Red: 46 Green: 78 Blue: FF in hexadecimal over and over again for each pixel in the top half of the image.
Since the value of one pixel strongly predicts the value of its neighbor, the efficient thing to do is record at each pixel location, the difference between the predicted value an average of its neighbors and the actual value for that pixel. In the case of representing an even shaded sky, we would only need to record the blue value once, followed by lots of zeros.
It is well known that early visual processing in the brain involves taking differences between nearby values, for example, to identify visual boundaries. It is only natural then to explore how the brain might take advantage of predictive coding in perception, inference, or even action.
See Clark for an excellent summary and entry point to the literature. There is wide variety in the models presented in the predictive coding paradigm, and they tend to be specified at a higher level of generality than are connectionist models so far discussed. Assume we have a neural net with input, hidden and output levels that has been trained on a task say face recognition and so presumably has information about faces stored in the weights connecting the hidden level nodes.
Three features would classify this net as a predictive coding PC model. First, the model will have downward connections from the higher levels that are able to predict the next input for that task. The prediction might be a representation of a generic face.
Second, the data sent to the higher levels for a given input is not the value recorded at the input nodes, but the difference between the predicted values and the values actually present. So in the example, the data provided tracks the differences between the face to be recognized and the generic face.
In this way the data being received by the net is already preprocessed for coding efficiency. Third, the model is trained by adjusting the weights in such a way that the error is minimized at the inputs. In so doing it comes to be able to predict the face of the individual to be recognized to eliminate the error. Some advocates of predictive coding models suggest that this scheme provides a unified account of all cognitive phenomena, including perception, reasoning, planning and motor control.
By minimizing prediction error in interacting with the environment, the net is forced to develop the conceptual resources to model the causal structure of the external world, and so navigate that world more effectively. The predictive coding PC paradigm has attracted a lot of attention. For example, when trained on typical visual input, PC models spontaneously develop functional areas for edge, orientation and motion detection known to exist in visual cortex.
This work also raises the interesting point that the visual architecture may develop in response to the statistics of the scenes being encountered, so that organisms in different environments have visual systems specially tuned to their needs. It must be admitted that there is still no convincing evidence that the essential features of PC models are directly implemented as anatomical structures in the brain.
Although it is conjectured that superficial pyramidal cells may transmit prediction error, and deep pyramidal cells predictions, we do not know that that is how they actually function. On the other hand, PC models do appear more neurally plausible than backpropagation architectures, for there is no need for a separate process of training on an externally provided set of training samples.
Instead, predictions replace the role of the training set, so that learning and interacting with the environment are two sides of a unified unsupervised process. PC models also show promise for explaining higher-level cognitive phenomena. An often-cited example is binocular rivalry. The PC explanation is that the system succeeds in eliminating error by predicting the scene for one eye, but only to increase the error for the other eye. PC accounts of attention have also been championed.
For example, Hohwy notes that realistic PC models, which must tolerate noisy inputs, need to include parameters that track the desired precision to be used in reporting error. So PC models need to make predictions of the error precision relevant for a given situation.
Hohwy explores the idea that mechanisms for optimizing precision expectations map onto those that account for attention, and argues that attentional phenomena such as change blindness can be explained within the PC paradigm.
Predictive coding has interesting implications for themes in the philosophy of cognitive science. By integrating the processes of top-down prediction with bottom-up error detection, the PC account of perception views it as intrinsically theory-laden. Deployment of the conceptual categorization of the world embodied in higher levels of the net is essential to the very process of gathering data about the world.
This underscores, as well, tight linkages between belief, imaginative abilities, and perception Grush It is too early to evaluate the importance and scope of PC models in accounting for the various aspects of cognition.
Providing a unified theory of brain function in general is, after all, an impossibly high standard. One objection that is often heard is that an organism with a PC brain can be expected to curl up in a dark room and die, for this is the best way to minimize error at its sensory inputs. However, that view may take too narrow a view of the sophistication of the predictions available to the organism.
If it is to survive at all, its genetic endowment coupled with what it can learn along the way may very well endow it with the expectation that it go out and seek needed resources in the environment.
Minimizing error for that prediction of its behavior will get it out of the dark room. However, it remains to be seen whether a theory of biological urges is usefully recast in PC terminology in this way, or whether PC theory is better characterized as only part of the explanation.
Another complaint is that the top-down influence on our perception coupled with the constraint that the brain receives error signals rather than raw data would impose an unrealistic divide between a represented world of fantasy and the world as it really is.
It is hard to evaluate whether that qualifies as a serious objection. Were PC models actually to provide an account of our phenomenological experience, and characterize the relations between that experience and what we count as real, then skeptical conclusions to be drawn would count as features of the view rather than objections to it.
In trying to explain everything they explain nothing. Without sufficient constraints on the architecture, it is too easy to pretend to explain cognitive phenomena by merely redescribing them in a story written in the vocabulary of prediction, comparison, error minimization, and optimized precision.
The real proof of the pudding will come with the development of more complex and detailed computer models in the PC framework that are biologically plausible, and able to demonstrate the defining features of cognition.
Their many promising applications include recognition of objects and faces in photographs, natural language translation and text generation, prediction of protein folds, medical diagnosis and treatment, and control of autonomous vehicles. The success of the game-playing program AlphaZero Silver et al. Its ability to soundly defeat expert-knowledge-based programs at their forte has been touted as the death knell for the traditional symbolic paradigm in artificial intelligence.
However, the new capabilities of deep learning systems have brought with them new concerns. Deep networks typically learn from vastly more data than their predecessors AlphaZero learned from over million self-played Go games , and can extract much more subtle, structured patterns. It is natural, therefore, to have second thoughts about depending on deep learning technologies for tasks that must be responsive to human interests and goals.
The success of deep learning would not have been possible without specialized Graphics Processing Units GPUs , massively-parallel processors optimized for the computational burden of training large nets.
Although the literature describes a bewildering set of variations in deep net design Schmidhuber , there are some common themes that help define the paradigm. The most obvious feature is a substantial increase in the number of hidden layers. Whereas Golden Age networks typically had only one or two hidden layers, deep neural nets have anywhere from five to several hundred. The key is that the patterns detected at a given layer may be used by the subsequent layers to repeatedly create more and more complex discriminations.
The number of layers is not the only feature of deep nets that explain their superior abilities. Examples of nuisance parameters in visual categorization tasks include pose, size, and position in the visual field; examples in auditory tasks include tone, pitch, and duration.
Successful systems must learn to recognize deeper similarities hiding under this variation to identify objects in images, or words in audio data. One of the most commonly-deployed deep architectures—deep convolutional networks—leverages a combination of strategies that are well-suited to overcoming nuisance variation.
Golden Age nets used the same activation function for all units, and units in a layer were fully connected to units in adjacent layers. However, deep convolutional nets deploy several different activation functions, and connections to units in the next higher layer are restricted to small windows, such as a square tile of an image or a temporal snippet of a sound file.
A toy example of a deep convolutional net trained to recognize objects in images will help illustrate some of the details. The input to such a net consists of a digitized scene with red, green, and blue RGB values for the intensity of colors in each pixel.
This input layer is fed to a layer of filter units, which are connected only to a small window of input pixels. Filter units detect specific, local features of the image using an operation called convolution. For example, they might find edges by noting where differences in the intensity of nearby pixels are the greatest. ReLU units send their signals to a pooling layer, which collects data from many ReLU units and only passes along the most-activated features for each location.
This feature map can then be sent to a whole series of such sandwiches to detect larger and more abstract features. For example, one sandwich might build lines from edges, the next angles from lines, the next shapes from lines and angles, and the next objects from shapes.
A final, fully-connected classification layer is then used to assign labels to the objects detected in the most abstract feature map delivered by the penultimate layer. This division-of-labor is extremely efficient at overcoming nuisance variation, compared to shallow Golden Age networks. Furthermore, limiting the inputs of the filter nodes to a small window significantly lowers the number of weights that must be learned at each level, compared to a fully-connected network.
If features usually depend only on local relations i. Furthermore, pooling the outputs of several different filter nodes helps detect the same feature across small differences in nuisance variables like pose or location. These points also interface with the innateness controversy discussed in Section 6. For example, Buckner has recently argued that these activation functions combine to implement a form of cognitive abstraction which addresses problems facing traditional empiricist philosophy of mind, concerning the way that minds can efficiently discover abstract categorical knowledge in specific, idiosyncratic perceptions.
The increase in computational power that comes with deep net architecture brings with it additional dangers. Neural networks and learning machines. A comprehensive textbook introducing many neural network algorithms, though not written specifically for readers in psychology or linguistics. Levine, Daniel. Introduction to neural and cognitive modeling. Mahwah, NJ: Lawrence Erlbaum. DOI: This book focuses on general organizing principles underlying neural and cognitive modeling, including competition, association, and categorization.
It contains many more technical and mathematical details than do other books discussed here. Computational explorations in cognitive neuroscience: Understanding the mind by simulating the brain.
A comprehensive book covering cognitive and neural modeling with the aim of uniting computation, brain, and cognition for a field called computational cognitive neuroscience. The book is more suitable for readers at the graduate student level or higher.
Readers who are interested in this text should explore the wiki site provided by the authors. Shirai, Yasuhiro. Connectionism and second language acquisition. New York: Routledge. A recent synthesis of connectionism as it applies to the study of bilingualism and second language acquisition. The book not only focuses on the existing models but also situates these models within the larger contexts and theoretical debates of connectionism and language learning.
Shultz, Thomas. Computational developmental psychology. An excellent discussion of using neural networks to study cognitive development, especially stages of development and mechanisms of transition. The book provides a good neural network primer along with mathematical basics of connectionist principles.
It also discusses the cascade-correlation model that the author uses, a neural network that can dynamically recruit new, hidden units in response to task demands. Spitzer, Manfred. The mind within the net: Models of learning, thinking, and acting. Users without a subscription are not able to see the full content on this page. Please subscribe or login. Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.
Not a member? A closely related and extremely common aspect of connectionist models is activation. At any time a unit in the network has an activation, which is a numerical value intended to represent some aspect of the unit.
For example, if the units in the model are neurons the activation could represent the probability that the neuron would generate an action potential spike.
If the model is a spreading activation model then over time a unit's activation spreads to all the other units connected to it. Spreading activation is always a feature of neural network models, and it is very common in connectionist models used by cognitive psychologists. Neural networks are by far the dominant form of connectionist model today. A lot of research utilizing neural networks is carried out under the more general name "connectionist".
These connectionist models adhere to two major principles regarding the mind:. Though there is a large variety of neural network models, they very rarely stray from these two basic principles.
Most of the variety comes from:. Connectionists are in agreement that recurrent neural networks networks wherein connections of the network can form a directed cycle are a better model of the brain than feedforward neural networks networks with no directed cycles. A lot of recurrent connectionist models incorporate dynamical systems theory as well.
Many researchers, such as the connectionist Paul Smolensky , have argued that the direction connectionist models will take is towards fully continuous , high-dimensional, non-linear , dynamic systems approaches. The neural network branch of connectionism suggests that the study of mental activity is really the study of neural systems.
This links connectionism to neuroscience, and models involve varying degrees of biological realism. Connectionist work in general need not be biologically realistic, but some neural network researchers try to model the biological aspects of natural neural systems very closely.
As well, many authors find the clear link between neural activity and cognition to be an appealing aspect of connectionism.
However, this is also a source of criticism, as some people view this as reductionism. Connectionists generally stress the importance of learning in their models. As a result, many sophisticated learning procedures for neural networks have been developed by connectionists. Learning always involves modifying the connection weights.
These generally involve mathematical formulas to determine the change in weights when given sets of data consisting of activation vectors for some subset of the neural units. By formalizing learning in such a way connectionists have many tools at their hands.
A very common tactic in connectionist learning methods is to incorporate gradient descent over an error surface in a space defined by the weight matrix. All gradient descent learning in connectionist models involves changing each weight by the partial derivative of the error surface with respect to the weight.
Backpropagation , first made popular in the s, is probably the most commonly known connectionist gradient descent algorithm today. Connectionism can be traced back to ideas more than a century old. However, connectionist ideas were little more than speculation until the mid-to-late 20th century. It wasn't until the 's that connectionism became a popular perspective amongst scientists. PDP was a neural network approach that stressed the parallel nature of neural processing, and the distributed nature of neural representations.
PDP provided a general mathematical framework for researchers to operate in. The framework involved eight major aspects:. McClelland , David E. Although the books are now considered seminal connectionist works, the term "connectionism" was not used by the authors to describe their framework at that point.
0コメント