Saturday, May 12, 2012

       Another way of understanding substance abuse is through looking at how the consumption of illicit substances progresses from an initial point, during which a drug is consumed voluntarily and the motivation for doing so is tied to the drug being perceived as having pleasurable and reinforcing effects, to an ultimate end point at which an addict consumes a drug in a compulsive manner and complains of being "unable to quit," and how this transition is correlated with changes in activity in specific areas of the brain.  A study by Everitt and Robbins called "Neural systems of reinforcement for drug addiction: from actions to habits to compulsion" did just that.
       Interestingly, Everitt and Robbins (2005) argued that such transitions depend upon "interactions between pavlovian and instrumental learning processes," (Everitt and Robbins, 2005, p. 1481). As Domjan (2009) explains, the difference between pavlovian (or "classical") conditioning processes and instrumental (or "operant") conditioning processes is that, while undergoing classical conditioning, an organism develops an understanding of relationships between processes in its environment which it cannot directly control, and develops appropriate responses to these processes. Thus, classical conditioning can be seen as more "passive." By contrast, during instrumental conditioning (sometimes called "operant" conditioning, in the sense that an organism "performs operations" on its environment, in the pursuit of a specific goal), "responding is necessary to produce a desired environmental outcome," (Domjan, 2005, p. 144).
       At the neural level, Everitt and Robbins hypothesized that such a change from voluntary and purposeful self-administration of a drug to the more compulsive drug-seeking and self-administration which most people would see as characterizing an addict "... represents a transition at the neural level from prefrontal cortical to striatal control over drug seeking and drug taking behavior as well as a progression from ventral to more dorsal domains of the striatum, involving its dopaminergic innervation," (Everitt and Robbins, 2005, p. 1481). In unpacking this statement, it's important to note that while the cortex, near the front of the brain, is an area involved with higher-order thinking and planning, ("Mapping the brain," 2012), striatum is an area lying deeper within the brain that is involved with reward and anticipation of pleasurable outcomes (Speert, 2012).
       As Everitt and Robbins go on to explain, the striatum as a whole has increasingly been implicated in both drug abuse and subsequent drug addiction and dependence. As they note, this view has gained traction due to both increased understanding of how various parts of the striatum are linked to one another, and of how behavioral output (i.e., the actions an organism engages in) is a product of both classical and instrumental conditioning. They argue that the two types of learning initially occur in parallel, but that, as drug-seeking and self-administration continues, instrumental learning comes to dominate. They also argue that, as a whole, these processes eventually result in "action- outcome and stimulus-response ('habit') learning," (Everitt and Robbins, 2005, p. 1481).
       It is interesting to look at how the "action-outcome and stimulus-response ('habit') learning," (Everitt and Robbins, 2005, p. 1481) mentioned by Everitt and Robbins relates to the pioneering psychologists Thorndike's conceptualization of instrumental behavior and its outcomes as "... reflecting the learning of an S-R association," (Domjan, 2009, p. 146). As Domjan explains, Thorndike's Law of Effect "states that if a response in the presence of a stimulus is followed by a satisfying event, the association between the stimulus (S) and the response (R) is strengthened. If the response is followed by an annoying event, the S-R association is weakened," (Domjan, 2009, p. 146). Furthermore, as Domjan is careful to note, "... the consequence of the response is not one of the elements in the association. The satisfying or annoying consequence simply serves to strengthen or weaken the association between the preceding stimulus and response," (Domjan, 2009, p. 147).
      Thus, in order for Thorndike's Law of Effect to successfully map onto the increasingly-compulsive drug-seeking and drug-taking behavior of addicts, the addicts' behavior would have to function such that the association between their response (that of self-administration of the drug), which would necessarily be taking place in the presence of particular stimuli (i.e., the inside of a particular apartment, building, or other environment; the presence of the addicted individual's friends who share their interest in taking the drug, etc.) would be strengthened if administration of the drug leads to a desirable consequence (i.e., a pleasurable "high") and weakened self-administration of the drug were to lead to a negative consequence (i.e., foolishly smoking marijuana while onboard an airplane, resulting in the airplane's smoke detectors, which might lead to a large fine or even arrest for the offender).
      It is important to note how, in the above example, while the setting of "onboard an airplane" may seem a bit contrived, it illustrates something important about how such "S-R" (as Thorndike saw them) associations function; in that case, the addict's association between smoking and being on an airplane would be weakened- but would be no change in any associations the addict may have formed between smoking and positive/negative consequences in general.
      This is an important point, as it might serve to explain  why in a previous study performed by Caddy and Lovibond (1976), wherein participants, in the context of a highly-controlled therapeutic setting, were exposed to a discriminated aversive conditioning procedure, which involved the participants entering a specific room in which the bookshelves and other surfaces contained numerous empty canisters from various alcoholic beverages, and other stimuli relating to the consumption of alcohol. While in this setting, the participants were fitted with shock electrodes (these were attached to their larynx). The participants were then encouraged (by the therapeutic personnel) to continue consuming alcohol beyond a pre-prescribed limit of 0.065% (as measured by a breath analysis test). If the participants did indeed continue drinking above this limit, electric shock was then administered to them (through the electrodes with which they had been outfitted). While the idea in that study was to get the participants to form an association between consuming alcohol above a certain pre-prescribed limit, what might perhaps have actually happened is that the participants formed an association, in line with Thorndike's Law of Effect, between the environmental stimulus "being in the highly-controlled therapeutic setting (i.e., "being in that room with all the bottles on display and other stimuli which were related to alcohol and its consumption) and the response of "consuming alcohol." Since the consumption of alcohol was, in that study, followed by the aversive stimulus [according to the criteria described by Domjan (2009), this would be called "punishment" or "positive punishment," (Domjan, 2009, p. 154)] of the administration of shock, the participants' association between that setting and the consumption of alcohol was then weakened. However, upon completion of the study, and after the participants returned to the real world, wherein the settings were quite different from the highly-controlled therapeutic setting which they had just left-[especially because, in Caddy and Lovibond's (1976) study, the criteria for "success" in treatment allowed for the participants to still engage in moderate drinking- so the participants were still likely to frequent the same bars and other alcohol-centered environments which they had previously gone to]-  and because the participants had no previously-formed association between the environments and settings of the outside world (a new "S") and the response of drinking (the same "R)- they were far less successful in abstaining from drinking when they returned to the outside world than they had been while in therapy; while Caddy and Lovibond (1976) cite success rates as high as 85% of the participants showing some type of improvement at 12 months follow-up, this figure drops to a far-lower 59% at 24 months follow-up. Perhaps Thorndike's Law of Effect and the specific types of associations the participants will have formed during the experiment can go some distance in explaining this- especially given that the "S" part of the "S-R" association which (according to Thorndike's Law of Effect) is formed during conditioning includes the stimulus and the response- but not the eventual consequence of making that response (in which case, the results of that particular type of learning might be limited to a particular type of setting).
       In terms of how their study, Everitt and Robbins (2005), argue that drugs act as 'instrumental reinforcers' (Everitt and Robbins, 2005, p. 1481), thus making them the "reinforcers" in Thorndike's Law of Effect. They thus "increase the likelihood of the responses that produce them, resulting in drug self-administration of 'drug-taking,'" (Everitt and Robbins, 2005, p. 1481). Furthermore, the "S" or "stimuli" portion of Thorndike's Law of Effect would be covered by the stimuli that have contiguity (close association in time and space) with the administration of the drugs. According to Everitt and Robbins, these would "gain incentive salience through pavlovian conditioning," (Everitt and Robbins, 2005, p. 1418). They argue that the "rewarding" effects of a drug likely result from the increasing attention to interoceptive physiological cues which they produce, as well as (in the case of hallucinogens and the like) the changes in perception of the environment and other external cues that drug taking results in. They go on to mention that this can be particularly reinforcing if it occurs in relation to Conditioned Stimuli (those which are reliably paired with the occurrence of important environmental events). Crucially, Everitt and Robbins (2005) argue that it is the sense of control over environmental and interoceptive cues that an individual who is using drugs feels that using the drug of choice allows them to obtain which acts as the instrumental reinforcer (R) in the context of Thorndike's Law of Effect, as applied to drug abuse and addiction. 
       Interestingly, Everitt and Robbins (2005) mention that conditioned stimuli (CSs) which act as signals for the impending delivery of positive reinforcers can have several other effects, aside from simply eliciting approach and consummatory behaviors. For instance, when conditioned stimuli are presented unexpectedly, increased rates of responding often result- which, according to Everitt and Robbins (2005), implies that conditioned stimuli can have motivational effects.  This effect of the unexpected presentation of conditioned stimuli resulting in increased rates of responding is also consistent with the results of an experiment on the effects of a shift in the quantity of a reinforcer conducted by Mellgren (1972). In his experiment, Mellgren tested the performance of four groups of rats in an experiment involving a runway apparatus. Initially, rats in two of the four groups received a food reward of two food pellets for every successful completion of the runway task, while rats in the other two groups were rewarded with a comparatively much-larger reward of twenty-two food pellets for each successful trip down the runway which they achieved. Then, in the second phase of the experiment, Mellgren took one group of rats from each of the two reward conditions (i.e., one of the two groups of rats from the two-pellet-reward condition, and one of the two groups of rats from the twenty-two pellet reward condition), and switched them with two groups of rats in the corresponding reward condition (i.e., one group of rats that had previously been in the two-pellet reward condition suddenly found itself in the twenty-two pellet reward condition, while another group of rats which had previously been in the twenty-two pellet reward condition suddenly found itself in the two pellet reward condition. The other two groups of rats continued to receive the same amount of food reinforcement for each successful completion of the runway task as they had been before. Interestingly, while following the completion of the first phase of the experiment, rats that were initially assigned to the twenty-two pellet reward condition ran only slightly faster than rats that had been assigned to the two-pellet condition, following the switch, the rats that had suddenly been switched into the twenty-two pellet food condition from the two-pellet condition were suddenly running much faster than they had been before, while rats that were suddenly switched into the two-pellet condition from the twenty-two pellet condition were suddenly running much slower than they had been before the switch. These phenomena- with the one where the rats began running much faster than before following the increase in reward from the baseline level being called positive contrast (Mellgren, 1972, page 185), and the other one where the rats began running much more slowly than before following the decrease in reward from the baseline level being called negative contrast (Mellgren, 1972, page 185), is led Mellgren to conclude that there are emotional and motivational aspects to conditioned responding- which is similar to the conclusion drawn by Everitt and Robbins (2005) regarding there sometimes being an emotional component to conditioned responding- such as might become evident upon the unexpected presentation of a conditioned stimulus.
       Everitt and Robbins also argued that, on a neural level, it is the "midbrain dopamine neurons" (dopamine is a neurotransmitter involved in pleasure and reward) (Kullmann and Jennings, 2011) that "show fast phasic burst firing in response to such CSs..." (Everitt and Robbins, 2005, p. 1481). In testing this, Everitt and Robbins (2005) also found that unexpected presentations with conditioned stimuli that were normally paired with the administration of drugs also "resulted in dopamine release in the core but not in the shell region of the nucleus accumbens," (Everitt and Robbins, 2005, p.1482). Furthermore, Everitt and Robbins (2005) found that disabling the sensitivity of the nucleus accumbens to dopamine (as done through either selective lesioning of core areas of the nucleus accumbens, or administration of NMDA or dopamine antagonists during conditioning), results in greatly attenuated conditioned responding, while infusing either NMDA or dopamine antagonists into the core region of the nucleus accumbens after results in the subject having poor memory of the conditioned procedure (Everitt and Robbins, 2005).
       However, Everitt and Robbins (2005) caution that while, from the above-mentioned data, it may thus seem that drug addiction depends upon the presentation of the above-mentioned factors, (i.e., that drug addiction might be dependent on either the presentation of a conditioned stimulus that was previously paired with a drug, resulting in approach to that conditioned stimulus, or that drug-seeking and, thus, self-administration can be made more frequent via the unexpected presentation of a conditioned stimulus). Everitt and Robbins (2005) argue that while these effects have been proven to be  causally involved in the conditioning of animals with natural rewards, these same effects have yet to be proven definitively in regards to the drug-seeking behaviors of humans.
       Everitt and Robbins (2005) do note, however, that "In certain circumstances, CSs can also function as conditioned reinforcers," (Everitt and Robbins, 2005, p. 1482). As they go on to explain, this occurs when certain stimuli which were "initially motivationally neutral," [i.e., they did not have anything about them which made them immediately responsible to an organism' biological needs], became reinforcing in their own right via association with primary reinforcers such as food or drugs. These stimuli help to maintain instrumental responding by bridging delays to the ultimate goal..." (Everitt and Robbins, 2005, p. 1482).  Such "bridging reinforcement" via conditioned reinforcers that have come to function in the manner of primary reinforcers is important, because conditioned responding can be severely disrupted by even seemingly-insignificant temporal delays in the delivery of a reinforcer. Domjan (2009) explains that this occurs for several reasons, including the fact that a delay can make it difficult for a participant to determine which of several actions or responses they provided (as they have likely performed several different actions since the time of the delivery of the previous reinforcer, i.e., a lever press, walking about the cage in a circle, sniffing the food delivery magazine, etc.) is the one which actually led to the delivery of the reinforcer! Using a conditioned stimulus that was previously trained with and has thus come to be associated with the reward is one way to overcome this potential difficulty (Domjan, 2009). 
A study by Cronin (1980) involving pigeons and a visual discrimination task confirmed these results. In her study, Cronin divided a group of pigeons into three groups, and varied the conditions to which they were exposed during the delay interval between an instrumental response and reinforcement between the groups. One group, called the "nondifferential" (Cronin 1980, p. 352) group, was presented with the same type of stimulus in between an instrumental response and a reward, whether they had made a correct or an incorrect response. The other three groups of pigeons, by contrast, were exposed to different stimuli during the delay period between making an instrumental response and receiving a reinforcer, based on whether they had made a correct or incorrect choice. One of these latter three groups, the "differential"(Cronin 1980, p. 352) group, received the differential stimuli (differential depending on whether the response the pigeon made was correct or not) over the course of the entire delay period. Another of the latter three groups, the "reinstatement"(Cronin 1980, p. 352) group, received the differential stimulus during the ten seconds immediately following their response (whether correct or incorrect, although the nature of the stimulus varied, of course, with whether the previously made response had been correct or incorrect), and during the ten seconds immediately preceding the reinforcer. The last of the differential groups, this one called the "reversed cue group,"(Cronin 1980, p. 352) was treated in the same way as the "reinstatement"(Cronin 1980, p. 352) group (in that they were exposed to the differential stimulus during the ten seconds immediately following their making a response, and during the ten seconds prior to the actual delivery of the reinforcer), although for  this "reversed cue" (Cronin 1980, p. 352) group, cues were indeed "reversed" in that the same cue which was presented ten seconds following an incorrect responses was also presented ten seconds before the reinforcing stimulus following correct responses (hence the reversal!)- and, likewise, for this group, the stimulus which was normally presented ten seconds after a correct response was also presented ten seconds before a lack of reward on nonreinforced trails (trials on which the birds had failed to make the correct response). The results of the study indicated that pigeons in the "nondifferential"(Cronin 1980, p. 352) group failed to learn how to make appropriate instrumental responses to the task, while birds in both the "differential"(Cronin 1980, p. 352) and "reinstatement" (Cronin 1980, p. 352) conditions had no difficulty learning the task. Finally, the "reversed-cue" (Cronin 1980, p. 352) birds learned to make a response- but the wrong response! Thus, this study highlights the importance of the delay interval between the making of an instrumental response and the presentation of a reinforcer (or nonreinforcement) and the stimuli which are presented during it, which can either aid in the formation of conditioned responding, or prevent the formation of conditioned responding, or even condition responding- but responding involving the giving of incorrect responses!
       In their discussion of how CSs can sometimes come to act as conditioned reinforcers, Everett and Robbins (2005) specify that "The effects of conditioned reinforcers, especially drug-related conditioned reinforcers, are pervasive and profound. For example, they support the learning of new drug-seeking responses, an effect that persists for at least two months without any further experience of self-administered cocaine and that is resistant to the extinction of the original CS-drug association. Drug-associated conditioned reinforcers also help maintain responding under second-order schedules of reinforcement." (Everitt and Robbins, 2005, p. 1483) Thus, it would appear that the use of drugs as conditioned reinforcers not only seems to be something that leads further into itself (with such use of drugs as conditioned reinforcers leading to further increases in drug-seeking behavior), but the fact this effect still persists even after extinction of the original CS-US association is truly remarkable, and indeed sheds some light on some of the factors making drug addictions so difficult to overcome!
       Everitt and Robbins (2005) do qualify these claims about the potency of drugs as secondary reinforcers, however, with the statement that "The CSs must be presented as conditioned reinforcers (that is, their presentation must depend on the animal's behavior); merely presenting them unexpectedly fails to increase drug seeking. This seems to contradict the 'incentive salience' model of drug-seeking behavior, which would predict enhancement from pavlovian, or unexpected presentations of the CS,"(Everett and Robbins, 2005, p.1483). This latter point underscores both the importance of contingency, as well as contiguity in instrumental conditioning, and fits nicely with Staddon and Simmelhag's (1971) attempted replication (and resulting re-interpretation) of Skinner's famous "superstitious behavior" experiment. In their study, Staddon and Simmelhag (1971) used more systematic measures of pigeon behaviors than Skinner had utilized- and, crucially, they also specified when, in relation to prior and subsequent deliveries of free food, each type of behavior occurred, labeling such responses as occurred just prior to the next delivery of food "terminal responses,"(Staddon and Simmelhag, 1971, p. 4) and responses that occurred closer to the middle of the time interval between deliveries of food "interim activities," (Staddon and Simmelhag, 1971, p.4). Furthermore, they found the effects of accidental reinforcement (i.e., reinforcement occurring in the absence of the target response, as if "by accident") to be minimal; according to them, presentations of reinforcement in the absence of the target response merely strengthened the termal responses, but had little other influence (Staddon and Simmelhag, 1971).

                                                                    References

1). Caddy, G. R., & Lovibond, S. H. (1976). Self-regulation and discriminated aversive conditioning in the modification of alcoholic's drinking behavior.Behavior Therapy, 7(2), 223-230. doi: 10.1016/S0005-7894(76)80279-1


2). Cronin, P. B. (1980). Reinstatement of postresponse stimuli prior to reward in delayed-reward discrimination learning by pigeons. Animal Learning & Behavior8(3), 352-358. Retrieved from http://www.springerlink.com/content/c016725275680566/

3). Domjan, M. (2009). Learning and behavior. (6 ed., pp. 59-60). Belmont, CA: Wadsworth, Cengage  Learning.

4). Everitt, R. J., & Robbins, T. W. (2005). Neural systems of reinforcement for drug addiction: from actions to habits to compulsion. Nature Neuroscience,8(11), 1481-1489. doi: 10.1038/nn1579


5).Kullmann, D., & Jennings, A. (2011, June 01).Dopamine and addiction. Retrieved from http://www.brainfacts.org/Diseases-Disorders/Addiction/Articles/2011/Dopamine-and-Addiction


6). Mapping the brain. (2012, April 01). Retrieved from http://www.brainfacts.org/Brain-Basics/Neuroanatomy/Articles/2012/Mapping-the-Brain


7). Mellgren, R. L. (1972). Positive and negative contrast effects using delayed reinforcement. Learning and Motivation3(2), 185-193. doi: 10.1016/0023-9690(72)90038-0

8). Staddon, J. E., & Simmelhag, V. L. (1971). The "superstition" experiment: A reexamination of its implications for the principles of adaptive behavior . Psychological Review78(1), 3-43. doi: 10.1037/h0030305


9). Speert, D. (2012, February 02). Neuroeconomics: Money and the brain. Retrieved from http://www.brainfacts.org/In-Society/In-Society/Articles/2012/Neuroeconomics-Money-and-the-Brain







No comments:

Post a Comment