The s–s account of conditioned reinforcement states that:

Many experiments have used an extinction procedure to investigate conditioned reinforcement. In most of these experiments, a conspicuous stimulus is presented just before the delivery of food. The new-response method involves pairing a distinctive stimulus such as a click with unconditioned reinforcement. After several pairings, the stimulus is presented without unconditioned reinforcement and is used to shape a new response. Another extinction technique is called the established-response method. An operant that produces unconditioned reinforcement is accompanied by a distinctive stimulus, just prior to reinforcement. When responding is well established, extinction is implemented but half of the subjects continue to get the stimulus that accompanied unconditioned reinforcement. The other subjects undergo extinction without the distinctive stimulus. Generally, subjects with the stimulus present respond more than the subjects who do not get the stimulus associated with unconditioned reinforcement. This result is interpreted as evidence for the effects of conditioned reinforcement.

Pairing, Discrimination, and Conditioned Reinforcement

Both extinction methods for analyzing conditioned reinforcement involve the presentation of a stimulus that was closely followed by unconditioned reinforcement. This procedure is similar to CS-US pairings used in respondent conditioning. One interpretation, therefore, is that conditioned reinforcement is based on classical conditioning. This interpretation is called the stimulus-stimulus or S-S account of conditioned reinforcement. That is, all CSs are also conditioned reinforcers.

Although this is a straightforward account, the experimental procedures allow for an alternative analysis. In both the new-response and established-response methods, the stimulus (e.g., a click) sets the occasion for behavior that produces unconditioned reinforcement. For example, the click of a feeder (SD) sets the occasion for approaching the food tray (operant) and eating food (Sr). Thus, the discriminative-stimulus account is that an SD is a conditioned reinforcer only and does not function as a CS associated with food.

Many experiments have attempted to distinguish between the SD and S-S accounts of conditioned reinforcement (see Gollub, 1977; Hendry, 1969, for reviews). For example,

Schoenfeld, Antonitis, and Bersh (1950) presented a light for 1 s as an animal ate food. This procedure paired food and light, but the light could not be a discriminative stimulus since it did not precede the food delivery. Following this training, the animals were placed on extinction and there was no effect of conditioned reinforcement.

Given this finding, it seems reasonable to conclude that a stimulus must be discriminative in order to become a conditioned reinforcer. Unfortunately, current research shows that simultaneous pairing of CS and US results in weak conditioning (see Chapter 3). For this and other reasons, it has not been possible to have a definitive test of the SD and S-S accounts of conditioned reinforcement.

On a practical level, distinguishing between these accounts of conditioned reinforcement makes little difference. In most situations, procedures that establish a stimulus as an SD also result in that stimulus becoming a conditioned reinforcer. Similarly, when a stimulus is conditioned as a CS it almost always has an operant reinforcement function. In both cases, contemporary research (Fantino, 1977) suggests that the critical factor is the temporal delay between the onset of the stimulus and the later presentation of unconditioned reinforcement.

Information and Conditioned Reinforcement

Stimuli that provide information about unconditioned reinforcement may become effective conditioned reinforcers. Egger and Miller (1962) used the extinction method to test for conditioned reinforcement. They conditioned rats by pairing two different stimuli (S1 and S2) with food. Figure 10.3 describes the procedures and major results. In their experiment (panel A), S1 came on and S2 was presented a half-second later. Both stimuli were turned off when the animals were given food. Both S1 and S2 were paired with food, but only S1 became an effective conditioned reinforcer. In another condition (panel B), S1 and S2 were presented as before, but S1was occasionally presented alone. Food was never given when S1 occurred by itself. Under these conditions, S2 became a conditioned reinforcer.

The s–s account of conditioned reinforcement states that:

FIG. 10.3. Procedures and major results of an experiment using the extinction method to test for conditioned reinforcement. From "Secondary Reinforcement in Rats as a Function of Information Value and Reliability of the Stimulus," by M. D. Egger and N. E. Miller, 1962, Journal of Experimental Psychology, 64, pp. 97-104. Copyright 1962. Author figure.

FIG. 10.3. Procedures and major results of an experiment using the extinction method to test for conditioned reinforcement. From "Secondary Reinforcement in Rats as a Function of Information Value and Reliability of the Stimulus," by M. D. Egger and N. E. Miller, 1962, Journal of Experimental Psychology, 64, pp. 97-104. Copyright 1962. Author figure.

To understand this experiment, consider the informativeness of S2 in each situation. When S1 and S2 are equally correlated with food, but S2 always follows S1, S2 is redundant— providing no additional information about the occurrence of food. Because it is redundant, S2 gains little conditioned reinforcement value. In the second situation, S1 only predicts food in the presence of S2 and for this reason S2 is informative and becomes a conditioned reinforcer. These results, along with later experiments (e.g., Egger & Miller, 1963), suggest that a stimulus will become a conditioned reinforcer if it provides information about the occurrence of unconditioned reinforcement.

Good News and Bad News

The informativeness of a stimulus should not depend on whether it is correlated with positive or negative events, because bad news is just as informative as good news. Wyckoff (1952, 1969) designed an observing-response procedure to evaluate the strength of a conditioned reinforcer that predicted good or bad news. In this procedure, periods of reinforcement and extinction alternate throughout a session, but the contingencies are not signaled by SDs or SAs. The contingency is called a mixed schedule of reinforcement. A mixed schedule is the same as a multiple schedule, but without discriminative stimuli. Once the animal is responding on the mixed schedule, an observing response is added to the contingencies. The observing response is a topographically different operant that functions to produce an SD or SA depending on whether reinforcement or extinction is in effect. In other words, an observing response changes the mixed to a multiple schedule. Figure 10.4 shows the relationships among mixed, multiple, tandem, and chain schedules of reinforcement.

Wyckoff (1969) showed that pigeons would stand on a pedal in order to observe red and green colors associated with FI 30-second reinforcement or EXT 30 seconds. Before the birds had an observing response available, they pecked equally in the reinforcement and extinction phases—showing failure to discriminate between the schedules. When the observing response was added, the pigeons showed a high rate of pecking in the reinforcement phase and very low rates during extinction. Because the observing response was maintained, the results suggest that stimuli correlated with either reinforcement or extinction (good or bad news) became conditioned reinforcers.

Unconditioned reinforcement

One component All components

Unconditioned reinforcement

One component All components

Chain

Multlple

R ->-Sr+-

R —>-Sr+-

Tandem

Mixed

R -*-S'+-

R —*Sr+—*

R —>-Sr+ —

FIG. 10.4. The relationships among mixed, multiple, tandem, and chain schedules of reinforcement.

FIG. 10.4. The relationships among mixed, multiple, tandem, and chain schedules of reinforcement.

Although Wyckoff's data are consistent with an information view of conditioned reinforcement, it is noteworthy that his pigeons only spent about 50% of the time making the observing response. One possibility is that the birds were observing the stimulus correlated with reinforcement (red color) but not the stimulus that signaled extinction (green color). In other words, the birds may have only responded for good news.

In fact, subsequent experiments by Dinsmoor et al. (1972), and Killeen et al. (1980) supported the good-news interpretation of conditioned reinforcement. In Dinsmoor et al. (1972) pigeons were trained to peck a key on a VI 30-s schedule of food reinforcement that alternated with unpredictable periods of extinction. The birds could peck another key in order to turn on a green light correlated with reinforcement and a red light correlated with extinction. That is, if reinforcement was in effect, an observing response turned on the green light, and if extinction was occurring, the response turned on the red light.

Observing responses were maintained when they produced information about both reinforcement and extinction. In the next part of the experiment, observing responses only produced the green light signaling reinforcement, or the red light associated with extinction. In this case, observing responses produced either good or bad news, but not both. When observing responses resulted in the green light correlated with reinforcement, the birds pecked at a high rate. In contrast, the pigeons would not peck a key that only produced a stimulus (red) signaling extinction. Thus, good news functions as conditioned reinforcement, but bad news does not.

The good-news conclusion is also supported by research using aversive, rather than positive, consequences. Badia, Harsh, Coker, and Abbott (1976) exposed rats to electric shocks. The shocks were delivered on several variable-time schedules, independent of the rats' behavior. During training, a light was always on and a tone occurred just before each shock. In Experiment 2 of their study, the researchers allowed the animals to press a lever that turned on the light for 1 min. During this time, if shocks were scheduled, they were signaled by a tone. In one condition, the light was never accompanied by tone and shocks. That is, when the light was on the animal was completely safe from shocks. Other conditions presented more and more tones and shocks when the animal turned on the light. In these conditions, the light predicted less and less safety, and responding for the light decreased. In other words, the animals responded for a stimulus correlated with a shock-free period, but not for information about shock given by the tone signals (see also DeFran, 1972; Dinsmoor, Flint, Smith, & Viemeister, 1969). Once again, conditioned reinforcement is based on good news but not on bad news.

There are human examples of the good- and bad-news effect (see Case, Ploog, & Fantino, 1990 for good news effects; Lieberman, Cathro, Nichol, & Watson, 1997 for bad news effects). Students who usually do well on mathematics exams quickly look up their marks on posted lists, while those who have done poorly wait for their grades to come in the mail. Seeing a grade is a conditioned reinforcer for students who are skilled at mathematics, but not for those who find the subject difficult. People who have taken care of their teeth find it easy to make a dental appointment, but those with inadequate dental health postpone the visit. Visiting the dentist is a safe period for patients with good teeth, but it signals "pulling and drilling" for those with poor dental hygiene. Unfortunately, the worse things get in such situations, the less likely people are to do anything about them—until it is too late.

Overall, research has shown that stimuli correlated with positive or negative reinforcement maintain an observing response (Dinsmoor et al., 1972; Fantino, 1977), and stimuli that are correlated with extinction or punishment do not (Blanchard, 1975; Jenkins & Boakes, 1973; Katz, 1976). For this reason, the mere informativeness of a stimulus is not the basis of conditioned reinforcement.

Continue reading here: Concurrent Chains Schedules of Reinforcement

Was this article helpful?