AlanTuring.net Reference Articles on Turing

What is Artificial Intelligence?

By Jack Copeland

© Copyright B.J. Copeland, May 2000

 

Connectionism

Connectionism, or neuron-like computing, developed out of attempts to understand how the brain works at the neural level, and in particular how we learn and remember.

A natural neural network. The Golgi method of staining brain tissue renders the neurons and their interconnecting fibres visible in silhouette.

 

In one famous connectionist experiment (conducted at the University of California at San Diego and published in 1986), David Rumelhart and James McClelland trained a network of 920 artificial neurons to form the past tenses of English verbs. The network consisted of two layers of 460 neurons:

Each of the 460 neurons in the input layer is connected to each of the 460 neurons in the output layer

Root forms of verbs--such as "come", "look", and "sleep"--were presented (in an encoded form) to one layer of neurons, the input layer. A supervisory computer program observed the difference between the actual response at the layer of output neurons and the desired response--"came", say--and then mechanically adjusted the connections throughout the network in such a way as to give the network a slight push in the direction of the correct response (this procedure is explained in more detail in what follows).

About 400 different verbs were presented one by one to the network and the connections were adjusted after each presentation. This whole procedure was repeated about 200 times using the same verbs. By this stage the network had learned its task satisfactorily and would correctly form the past tense of unfamiliar verbs as well as of the original verbs. For example, when presented for the first time with "guard" the network responded "guarded", with "weep" "wept", with "cling" "clung", and with "drip" "dripped" (notice the double "p"). This is a striking example of learning involving generalisation. (Sometimes, though, the peculiarities of English were too much for the network and it formed "squawked" from "squat", "shipped" from "shape", and "membled" from "mail".)

The simple neural network shown below illustrates the central ideas of connectionism.

A pattern-classifier

Four of the network's five neurons are for input and the fifth--to which each of the others is connected--is for output. Each of the neurons is either firing (1) or not firing (0). This network can learn to which of two groups, A and B, various simple patterns belong. An external agent is able to "clamp" the four input neurons into a desired pattern, for example 1100 (i.e. the two neurons to the left are firing and the other two are quiescent). Each such pattern has been pre-assigned to one of two groups, A and B. When a pattern is presented as input, the trained network will correctly classify it as belonging to group A or group B, producing 1 as output if the pattern belongs to A, and 0 if it belongs to B (i.e. the output neuron fires in the former case, does not fire in the latter).

Each connection leading to N, the output neuron, has a "weight". What is called the "total weighted input" into N is calculated by adding up the weights of all the connections leading to N from neurons that are firing. For example, suppose that only two of the input neurons, X and Y, are firing. Since the weight of the connection from X to N is 1.5 and the weight of the connection from Y to N is 2, it follows that the total weighted input to N is 3.5.

N has a "firing threshold" of 4. That is to say, if N's total weighted input exceeds or equals N's threshold, then N fires; and if the total weighted input is less than the threshold, then N does not fire. So, for example, N does not fire if the only input neurons to fire are X and Y, but N does fire if X, Y and Z all fire.

Training the network involves two steps. First, the external agent inputs a pattern and observes the behaviour of N. Second, the agent adjusts the connection-weights in accordance with the rules:

(1) If the actual output is 0 and the desired output is 1, increase by a small fixed amount the weight of each connection leading to N from neurons that are firing (thus making it more likely that N will fire next time the network is given the same pattern)

(2) If the actual output is 1 and the desired output is 0, decrease by that same small amount the weight of each connection leading to the output neuron from neurons that are firing (thus making it less likely that the output neuron will fire the next time the network is given that pattern as input).

The external agent--actually a computer program--goes through this two-step procedure with each of the patterns in the sample that the network is being trained to classify. The agent then repeats the whole process a considerable number of times. During these many repetitions, a pattern of connection weights is forged that enables the network to respond correctly to each of the patterns.

The striking thing is that the learning process is entirely mechanistic and requires no human intervention or adjustment. The connection weights are increased or decreased mechanically by a constant amount and the procedure remains the same no matter what task the network is learning.

Another name for connectionism is "parallel distributed processing" or PDP. This terminology emphasises two important features of neuron-like computing. (1) A large number of relatively simple processors--the neurons--operate in parallel. (2) Neural networks store information in a distributed or holistic fashion, with each individual connection participating in the storage of many different items of information. The know-how that enables the past-tense network to form "wept" from "weep", for example, is not stored in one specific location in the network but is spread through the entire pattern of connection weights that was forged during training. The human brain also appears to store information in a distributed fashion, and connectionist research is contributing to attempts to understand how the brain does so.

Recent work with neural networks includes:

(1) The recognising of faces and other objects from visual data. A neural network designed by John Hummel and Irving Biederman at the University of Minnesota can identify about ten objects from simple line drawings. The network is able to recognise the objects--which include a mug and a frying pan--even when they are drawn from various different angles. Networks investigated by Tomaso Poggio of MIT are able to recognise (a) bent-wire shapes drawn from different angles (b) faces photographed from different angles and showing different expressions (c) objects from cartoon drawings with grey-scale shading indicating depth and orientation. (An early commercially available neuron-like face recognition system was WISARD, designed at the beginning of the 1980s by Igor Aleksander of Imperial College London. WISARD was used for security applications.)

(2) Language processing. Neural networks are able to convert handwriting and typewritten material to standardised text. The U.S. Internal Revenue Service has commissioned a neuron-like system that will automatically read tax returns and correspondence. Neural networks also convert speech to printed text and printed text to speech.

(3) Neural networks are being used increasingly for loan risk assessment, real estate valuation, bankruptcy prediction, share price prediction, and other business applications.

(4) Medical applications include detecting lung nodules and heart arrhythmia, and predicting patients' reactions to drugs.

(5) Telecommunications applications of neural networks include control of telephone switching networks and echo cancellation in modems and on satellite links.

History of connectionism

In 1933 the psychologist Edward Thorndike suggested that human learning consists in the strengthening of some (then unknown) property of neurons, and in 1949 psychologist Donald Hebb suggested that it is specifically a strengthening of the connections between neurons in the brain that accounts for learning.

In 1943, the neurophysiologist Warren McCulloch of the University of Illinois and the mathematician Walter Pitts of the University of Chicago published an influential theory according to which each neuron in the brain is a simple digital processor and the brain as a whole is a form of computing machine. As McCulloch put it subsequently, "What we thought we were doing (and I think we succeeded fairly well) was treating the brain as a Turing machine".

McCulloch and Pitts gave little discussion of learning and apparently did not envisage fabricating networks of artificial neuron-like elements. This step was first taken, in concept, in 1947-48, when Turing theorized that a network of initially randomly connected artificial neurons--a Turing Net--could be "trained" (his word) to perform a given task by means of a process that renders certain neural pathways effective and others ineffective. Turing foresaw the procedure--now in common use by connectionists--of simulating the neurons and their interconnections within an ordinary digital computer (just as engineers create virtual models of aircraft wings and skyscrapers).

However, Turing's own research on neural networks was carried out shortly before the first stored-program electronic computers became available. It was not until 1954 (the year of Turing's death) that Belmont Farley and Wesley Clark, working at MIT, succeeded in running the first computer simulations of small neural networks. Farley and Clark were able to train networks containing at most 128 neurons to recognise simple patterns (using essentially the training procedure described above). In addition, they discovered that the random destruction of up to 10% of the neurons in a trained network does not affect the network's performance at its task--a feature that is reminiscent of the brain's ability to tolerate limited damage inflicted by surgery, an accident, or disease.

During the 1950s neuron-like computing was studied on both sides of the Atlantic. Important work was done in England by W.K. Taylor at University College, London, J.T. Allanson at Birmingham University, R.L. Beurle and A.M. Uttley at the Radar Research Establishment, Malvern; and in the U.S. by Frank Rosenblatt, at the Cornell Aeronautical Laboratory.

In 1957 Rosenblatt began investigating artificial neural networks that he called "perceptrons". Although perceptrons differed only in matters of detail from types of neural network investigated previously by Farley and Clark in the U.S. and byTaylor, Uttley, Beurle and Allanson in Britain, Rosenblatt made major contributions to the field, through his experimental investigations of the properties of perceptrons (using computer simulations), and through his detailed mathematical analyses. Rosenblatt was a charismatic communicator and soon in the U.S. there were many research groups studying perceptrons. Rosenblatt and his followers called their approach connectionist to emphasise the importance in learning of the creation and modification of connections between neurons and modern researchers in neuron-like computing have adopted this term.

Rosenblatt distinguished between simple perceptrons with two layers of neurons--the networks described earlier for forming past tenses and classifying patterns both fall into this category--and multi-layer perceptrons with three or more layers.

 

A three-layer perceptron. Between the input layer (bottom) and the output layer (top) lies a so-called 'hidden layer' of neurons.

 

One of Rosenblatt's important contributions was to generalise the type of training procedure that Farley and Clark had used, which applied only to two-layer networks, so that the procedure can be applied to multi-layer networks. Rosenblatt used the phrase "back-propagating error correction" to describe his method. The method, and the term "back-propagation", are now in everyday use in neuron-like computing (with improvements and extensions due to Bernard Widrow and M.E. Hoff, Paul Werbos, David Rumelhart, Geoffrey Hinton, Ronald Williams, and others).

During the 1950s and 1960s, the top-down and bottom-up approaches to AI both flourished, until in 1969 Marvin Minsky and Seymour Papert of MIT, who were both committed to symbolic AI, published a critique of Rosenblatt's work. They proved mathematically that there are a variety of tasks that simple two-layer perceptrons cannot accomplish. Some examples they gave are:

(1) No two-layer perceptron can correctly indicate at its output neuron (or neurons) whether there are an even or an odd number of neurons firing in its input layer.

(2) No two-layer perceptron can produce at its output layer the exclusive disjunction of two binary inputs X and Y (the so-called "XOR problem").

The exclusive disjunction of two binary inputs X and Yis defined by this table.

It is important to realise that the mathematical results obtained by Minsky and Papert about two-layer perceptrons, while interesting and technically sophisticated, showed nothing about the abilities of perceptrons in general, since multi-layer perceptrons are able to carry out tasks that no two-layer perceptron can accomplish. Indeed, the "XOR problem" illustrates this fact: a simple three-layer perceptron can form the exclusive disjunction of X and Y (as Minsky and Papert knew). Nevertheless, Minsky and Papert conjectured--without any real evidence--that the multi-layer approach is "sterile" (their word). Somehow their analysis of the limitations of two-layer perceptrons convinced the AI community--and the bodies that fund it--of the fruitlessness of pursuing work with neural networks, and the majority of researchers turned away from the approach (although a small number remained loyal).

This hiatus in research into neuron-like computing persisted for well over a decade before a renaissance occurred. Causes of the renaissance included (1) a widespread perception that symbolic AI was stagnating (2) the possibility of simulating larger and more complex neural networks, owing to the improvements that had occurred in the speed and memory of digital computers, and (3) results published in the early and mid 1980s by McClelland, Rumelhart and their research group (for example, the past-tenses experiment) which were widely viewed as a powerful demonstration of the potential of neural networks. There followed an explosion of interest in neuron-like computing, and symbolic AI moved into the back seat.

[Previous section] [top of page] [Next section]