abnormal lecture on research

Lecture 3

Assessment, Classification and the Science of Abnormal Psychology

Dale L. Johnson

Chapter 3 covers a lot of territory. Only a few years ago all textbooks in abnormal psychology devoted three chapters to the topics covered here. In condensing to one chapter my guess is that the authors got rid of some of the fluff and kept the core ideas. If so, it means that a careful reading is necessary. In this lecture, I will comment on some of the key ideas.

Clinical Assessment Procedures

Psychologists are the only mental health professional with training in systematized assessment. Psychiatrists and social workers are trained to do interviews, but only rarely are they trained to do the structured interviews that are now required for research. For many years, a major role for psychologists in hospitals and clinics was the administration of psychological tests and interpreting and reporting the findings. I have given hundreds of Rorschach Inkblot Tests, Thematic Apperception Tests (TATs), and Wechsler Adult Intelligence Scales (WAISs), and written hundreds of reports. However, I have not given a Rorschach since 1961 when I tested a Navajo woman deep in the reservation at the request of an anthropologist. I gave it up because in 1960 I became director of a human relations research ward at the Houston VA hospital and we adopted alternate ways of assessing our patients (or participants, as we called them). We found the widely used tests such as Rorschachs and TATs were irrelevant to the kind of treatment we provided. We needed behavioral and self-report measures, not methods based on dubious psychodynamic assumptions. We found the Minnesota Multiphasic Personality Inventory (MMPI) was useful for diagnostic purposes and behavioral measures provided information needed to tailor the treatment program to meet individual needs. In addition, we used the measures to monitor change and to show us whether the treatment had been effective in the long run (In general it was).

Nevertheless, clinical psychologists still do give the projective tests (as the Rorschach and TAT are called) in many practice settings. The clinical psychology training program at UH no longer teaches these tests, nor do other programs that emphasize a scientist-practitioner model. Recent research has focused on the identification of specific problems. One such line of research was the use of projective tests to detect child sexual abuse. Results showed low validity for projective tests. Many children who had not been abused would be said to have been abused by projective test results. Conversely, many abused children would have been missed.

Reasons for Assessing

Assessment is done for clinical purposes, to aid in making a diagnosis, identifying symptoms to target for special attention, and to find personal characteristics that may facilitate or inhibit treatment. This use is elaborated in the textbook.

The other main reason for assessment is in research. For example, in our research at the UT medical school in Galveston on the lasting effects of otitis media with effusion (OME) or middle ear infection on child development. We examined baby's ears for signs of OME with biweekly examination of the ears for three years. Then we looked at effects on what we thought, from our review of the literature and experience of the pediatricians on the project, might be long term consequences of having partial hearing loss for periods of time early in life. Some of the children had impaired hearing for one-third of their early lives. We measured intelligence, language development, and behavior problems at ages 3, 5 and 7, and added social relations and school achievement at age 7. All of the measures used were selected because research had shown they were highly reliable, valid, and had been standardized. As a research project we were less interested in how individual children developed than how a group of children, representative of the Galveston area, developed. The results are mixed, with some evidence of effects on intelligence, or IQ to follow the textbook, at age three that disappeared at ages 5 and 7; and effects on language development at age 5. None of the effects were very great. Although statistically significant, they were probably not clinically significant. We suggested that persistent OME does not pose a great danger to child development. Persistent OME after age three may be a different matter altogether. We did not study that. We are still working on the age 7 data.

Choice of Method

In clinical work the psychologist wants to know what the problem is. To answer this, it is necessary to obtain information about the characteristics of the client and of her or his environment. Assessment examines the client's life system. It is done with the cooperation of the client. The assessment is not limited to seeking the source of the problem, but looks at the client's way of thinking about the problem and at how the person copes with problems. For example, some people become depressed if they set goals for themselves that they do not really want. They feel they must have these goals. They go through life uncomfortable with their lack of achievement and a vague feeling that they have missed their true calling. They become depressed. Assessment might reveal that this unrealistic goal setting is a part of the problem and lead the therapist to help the client to explore alternate goals. A depressed, but fairly well-to-do, real estate salesperson might prefer to be a much less financially well-off rock musician, and not be depressed. Assessment might find that another person with depression has habitually negative modes of thinking about self, others and the world in general, and rarely identifies positive experiences for self. This assessment result sets the stage for cognitive-behavioral therapy. Assessment of this type is typically used in an on-going basis, and the client and therapist can both see progress being made.

Reliability and Validity in Assessment

These are important matters and every student of psychology should have a clear understanding of this importance. The textbook section on this is very good. Note the following especially:

Reliability

Interrater

test retest

internal consistency (split-half)

Validity

Types of Psychological Assessment

Clinical Interviews

Interviews are a standard part of any assessment; they offer flexibility, comprehensiveness, and sensitivity. The most common type of interview used is the open, unstructured interview. It has some advantages; it can be adapted to the interests of the client and can lead into a therapeutic relationship with ease. A disadvantage is that important issues for diagnoses may be missed or the client may lead the clinician down the wrong path. This type of interview fails to meet the requirements for reliability, validity and standardization.

The structured, or semi-structured interview was developed along with the newer, research-based, diagnostic criteria. In these interviews the interviewer follows a prescribed set of questions. These questions are designed to elicit all of the information necessary to make a DSM-IV diagnosis. Although many clinicians use these interviews, most prefer the flexibility and possible brevity of the open interview. I have used structured interviews for years and they have helped me discover information useful for treatment planning that I would never have found using an open interview. Structured interviews were designed for research use, and are nearly always used when researchers must define a distinct clinical group for their research. Thus, if one is studying obsessive-compulsive disorder, all of the subjects must have obsessive-compulsive disorder (OCD) and not just something that might be OCD. These interviews can be reliable (with training), valid and standardized.

Psychological Tests

The first psychological tests were developed in England in the middle 1800s, but the first widely used test was that of Binet and Simon in Paris about 100 years ago. Their test of intelligence, designed to predict success in school, formed the basis for all subsequent IQ tests. The method of selecting tasks, presenting them to subjects in a standard way, and comparing children of the same age, is basic for later tests.

Projective

The Rorschach inkblots, Thematic Apperception Test (TAT), Incomplete Sentences Test, are only a few of the projective tests that grew out of the fascination with uncovering unconscious feelings and thoughts. After World War II when American clinical psychology began to prosper, these tests comprised the core of the battery of tests used. Every clinical psychologist was trained in their use and expected to administer them as one of the unique contributions of psychology.

Gradually, a sense of disquietude appeared. There were problems. On the Rorschach, for example, how do you compare a person who gives only one response per card (there are ten cards) with a person who gives 100 responses? How can you make sense of the fact that if you give the test one day to a person, ask him to return the next day and take the test again, but this time to give different responses, and then ask experts to find the two sets of responses. They cannot do it. The same person apparently can have two different personalities at will. These problems have led some psychologists to doubt the value of the test. More important, however, is that increasingly, it was obvious that the interpretations of Rorschach test results were simply irrelevant; e.g., "This patient has latent homosexual wishes." So what? Or, "This patient often feels anger toward father figures." So what else is new?

On the positive side, Rorschach responses may have some value in identifying signs of thought disorder in patients suspected of having schizophrenia. However, patients with bipolar disorder, major depression and OCD may also show signs of thought disorder. The Exner system, mentioned in the textbook, does improve reliability of administration and scoring, but has not improved the validity question. The test has not been standardized in an adequate way.

The same can be said of other projective techniques. If we were to assign them grades, they would get a low C.

Personality Inventories

Personality inventories such as the MMPI-2, the Millon, and various measures of specific symptoms such as the Beck Depression Inventory, are quite another matter. They are highly reliable, valid, and are standardized. They are widely used in clinical practice and research. The MMPI-2 can be scored in a great many different ways, many more than are suggested in the textbook. I do not hesitate to say that I can look at the results of an MMPI-2 and tell you a great deal about a person, and chances are good that I would be right, not because I am a clinical whiz, but because I would follow standard procedures for making interpretations. A computer could do as well.

Cognitive Tests

These include the Wechsler series such as the Wechsler Adult Intelligence Scale (WAIS) or the Stanford-Binet-Fourth Edition (SBIV), and many other tests. The SAT and GRE are cognitive tests, and whether they should be considered aptitude or intelligence tests or test of achievement, is a matter for discussion. Suffice it to say here they are very reliable, remarkably valid, and all are beautifully standardized. However, none are standardized in the way the textbook says on page 66. If "an African-America male, 19 years old, and from a middle-class background" took the test, his results would never be compared with a group of AA, male, age 19, middle-class. He would be compared with a group of adults from a sample representative of the American population which would include African-Americans. It would be impossible to make such specific standardizations as the book suggests, and totally unnecessary.

IQ tests seem always to draw controversy, but in fact they are probably the best predictors of behavior in the whole library full of psychological tests.

[Something to ponder: president George W. Bush had an SAT Verbal score of 566 when he was admitted to Yale. Early presidential candidate Bill Bradley had a 438. Quantitative scores were not given in the report as it appeared in WWW.Slate.com. George W. had a father and grandfather who were Yale alums, and Bill B. was a basketball star. Both men completed their freshman year with a C average. Bill B. raised his average enough to get a Rhodes scholarship, but George W. B. continued to get Cs. That intelligence or aptitude tests tend to agree is shown by George W.'s results when he took the LSAT when applying to the UT School of Law. He was rejected because his score was too low. Of course, he did get an MBA from Harvard.]

Neuropsychological Testing

In seeking an understanding of the relations between brain function and behavior neuropsychological testing plays a central role. When I was a psychologist with the VA Hospital I spend many hours working with neurologists in trying to determine whether the symptoms presented by a patient could be attributed to brain damage. Today no psychologist does that. The identification of brain damage is done with MRI scans, or some other form of brain imaging (See textbook pp. 78-79). Neuropsychologists have not joined the ranks of the unemployed; they have shifted their work to an exploration of how specific brain damage affects behavior. Thus, they might use the Wisconsin Card Sorting Test to understand how the cognitive function called "executive functioning" is affected by damage to the frontal lobes. Neuropsychological testing is essential in cases of traumatic brain injury, as in automobile accidents, to see which cognitive functions are intact and which show impairment. In suspected Alzheimer's Disease, the neuropsychologist does the testing of memory functions that makes the first diagnosis.

Neuropsychological testing is also used to plan rehabilitation. This is true for traumatic injury, strokes, and psychiatric disorders involving brain dysfunction such as schizophrenia.

The University of Houston has one of the most highly regarded neuropsychological training programs in the country. It is part of the clinical psychology training program.

Behavioral Assessment

The textbook chapter is very good on this topic and does not need to be repeated. The authors point out how important behavioral assessment is, and I would like to support that opinion. It does get past some of the problems of interviews and of self-report, and it is an answer to the criticism of psychological tests that they lack ecological validity (i.e., they do not account for the real-life situation. On the other had, they typically do not fit in with managed care limitations on time and there are observer reliability problems. For example, in assessing children for the evaluation of the Houston Parent-Child Development Center (H-PCDC), we wanted to know if the program had resulted in a reduction of expected behavior problems. We had mothers, teachers and the children themselves, now young adolescents, describe behavior problems on such standard measures as the Child Behavior Checklist (CBCL). The measures used were reliable, valid, and standardized. All three measures showed that children who had been in the H-PCDC had fewer problems than children in the control group. However, the three sources of information (mother, teacher and child) were not correlated significantly. Each source saw the behaviors somewhat differently. Our finding is typical of research in this area. In a sense, we failed to obtain inter-rater reliability. We might have obtained higher reliability if we had trained mothers, teachers and children in the use of the measure, but this was not practical when assessing 250 children. It is a research problem that is unresolved.

Gordon Paul's research (He is in the Psychology Department at UH) on behavior observation is regarded as excellent. His system is designed to assess behavior with carefully trained observers in institutional settings. The results of the observations are used on a daily basis in his social learning treatment program.

Neuroimaging and Psychophysiological Assessment

See the book for information on these topics.

Assessment of Anxiety--Elusive Construct

Some clinical problems require several kinds of assessment. Anxiety disorders are an example of this. A clinical interview is essential, as is a self-report on some measure of anxiety. In addition, however, it is often useful for treatment planning to include psychophysiological assessment and observation reports by people who know the client well.

Classification and Diagnosis

Classification is not unique to abnormal psychology. All humans classify, as do other mammals. My dog knows which animals are also dogs, and knows that cats and cows are not dogs. She seems to find these other animals interesting, but knows they are not of her kind, and not as interesting to her as are other dogs.

The history of psychiatric classification is long. For example, the Greek physician, Hippocrates, wrote that he could tell if a pregnant woman would have a girl or boy baby. He found a way to classify women. Those who were pale in face would have girls and those who were rosy cheeked would have boys. How many cases did he have to see to arrive at this apparently fairly accurate classification system? He kept track, talked with other observers and come to some conclusions. Note that this is a predictive system: classes determined at one time predict later events.

Early Classification

There is early evidence in writing that shows that the Egyptians of more than 2000 years ago classified war wounds. They sorted them into various types so that each type could be treated in an appropriate way.

The Greek, Empedocles, believed the number of elements was four: Earth, Water, Fire and Air This led to an elaboration of the four elements into the four qualities (bile, blood, phlegm, black bile), the four humors (sanguine, phlegmatic, choleric, melancholic), the four ages of man, and the four seasons. He created an inter-related explanatory system

Galen developed a classification by function, but one that was based on physical structure. Thus, "hot" diseases were treated by "cold" remedies.

Hypocrates developed the idea of acute and chronic diseases. We still use this idea.

It was not until the 18th century that the idea of a specific physical location for mental disorders was rejected by Boussard. He asked, "What is the seat of mania?" No one could show it to him.

Psychiatric Classification

Pinel, in France, was the first to observe and describe what we think of as schizophrenia. Kraepelin added to the specificity of the diagnosis around the turn of the last century, and E. Bleuler gave the disorder the name we use today. The Japanese no longer use the term "schizophrenia." They call the same set of symptoms "integrative disorder." Now diagnosis of this disorder is done with greater reliability, but the question of validity remains open. One question is whether there is one schizophrenia or many.

The tradition of classification is very strong in medicine. You feel ill, go to a doctor, and expect to hear a diagnosis: "You have measles, not the flu." One either has measles or not. You may have it in a severe form or just a touch, but it is measles.

The problem is somewhat different in psychiatric classification. The Diagnostic and Statistical Manual, fourth version (DSM-IV) uses behavioral symptoms for the diagnosis of all of the disorders listed. In no case is it possible to do a blood test and say, "Ah, the results show that you have moderate depression." The problem then is to find sets of behaviors (syndromes) that reliably occur together to form a specific disorder. We will explore these sets of symptoms or behaviors as we go further into our subject matter. What you will see, however, is that the sets are sometimes comprised of an odd lot of behaviors. Furthermore, it is sometimes possible for two people to be given the same diagnosis and not have any symptomatic behaviors in common.

Psychiatric classification is essential for research, record keeping and practice, in that order. Researchers have to be able to assemble groups of people with similar problems in order to develop an understanding of the disorders. For example, research on schizophrenia was not very productive until researchers came to some agreement about what behaviors or symptoms comprised schizophrenia.

Record keeping also requires some information about diagnosis. Insurance payments make use of this information. Ironically, we have no idea of how common schizophrenia, or any of the other DSM-IV disorders appear in America, that is incidence or number of first time cases. We know the incidence of polio, or whooping cough, but not psychiatric disorders. Scandinavians, the Dutch and Germans have these data, but not the USA. Psychiatric providers do not want to be bothered with having to compile the data and send it to the Center for Disease Control (CDC) for compilation. At present there is concern about an apparent increase in the incidence of autism. Is there really an increase? We don't know because we do not collect incidence data in a systematic way.

Clinicians use diagnoses in an approximate way, but they really base treatment plans on measures of the severity of various symptoms. That is, "schizophrenia" is not treated as much as are positive symptoms such as delusions and hallucinations.

Problems with Classification

Granted we need classification, but there are problems. The discussion of this in the textbook is good and I will just add a few things.

Stigmatizing

To have psychiatric disorder is to be stigmatized to some degree. This is less true in the United States than in some other countries, and less true today than 50 years ago, but stigma still exists.

Children given a diagnosis may be red flagged; expected to fail, or to present problems. An adult with a diagnosis of epilepsy may not be able to get an automobile license or insurance coverage for some forms of employment even though his epilepsy is controlled; an adult with schizophrenia may not be able to get a job even though recovered, if the employer knows the diagnosis. These are reasons for not assigning diagnostic labels, but, people who have a psychiatric problem need help and a diagnosis is part of the helping process.

Until quite recently there was great reluctance to assign a diagnosis of schizophrenia or to communicate it to the patient or family. Patients and families would just go along wondering what is the matter. I visited Turkey recently and learned that patients with schizophrenia are kept in hospitals, crowded two to a bed, for two years before learning of their diagnosis. The psychiatrists said they did not want to give it because "it was a death sentence." Patients and their families almost always say that they appreciate knowing what the diagnosis is. It helps them to understand unusual behaviors.

Criticism of Current Diagnostic Practice

There are many critics of current diagnostic practices. It should be noted, however, that no one has offered anything better that can be accepted by other mental health professionals. These criticisms include the following:

1. Heterogeneity within diagnostic classes.

Categories are very broad and there is great overlap of symptoms, but the importance of the specific symptoms is not addressed. The revisions seen in the DSM-IV are a partial solution to this problem. Nevertheless, two patients may receive the same diagnosis and not have any of the same symptoms. This is true for only some disorders that have long lists of symptoms from which the diagnostic specialist must select the ones that apply to a certain patient. There are 93 ways of getting a diagnosis of Borderline Disorder and two people may have the diagnosis and have no common symptoms.

Too many childhood behaviors have become psychiatric illnesses. Is Mathematics Disorder more appropriate for the realm of psychiatry or for education?

2. Lack of Reliability:

This is still a problem, but much less than it was before diagnosis became based on research evidence, not clinical theory. There are several sources of the problem. The percentages shown refer to estimates of how much of unreliability is accounted for by this source. These estimates were based on DSM-II in 1962.

a. Patient inconsistencies in telling of symptoms (5%).

b. Inconsistencies on the part of the examiner (32.5%). The use of a standard, structured interviews helps correct this. Note that it is not only psychiatric classification that presents this type of problem. There are inconsistencies in fingerprint identification owing to analyst errors. A recent FBI study of fingerprint identification in local law enforcement agencies found an error rate of 20%. The FBI expert error rate was only 0.03%.

c. Inadequacies of the diagnostic classification system (62%). Much of this has been corrected in the DSM-IV.

The DSM-IV has improved reliability. It is now moderately good for Axis I disorders, but still not adequate for Axis II disorders

3. Lack of Validity

a. etiological--same set of antecedents is assumed for many disorders; e.g., early child abuse may be involved in dissociative disorder, post-traumatic stress disorder, depression and anxiety.

b. concurrent--are other symptoms part of the pattern; e.g., difficulty holding job if schizophrenic. Why isn't this a symptom?

c. predictive--does the diagnosis have anything to do with outcome or the course of the illness? This is the issue of prognosis. What long-term outcome is expected for this disorder?

DSM-IV

Multiaxial (p. 84). The classification system calls for a recording of five kinds of information.

I. Major categories

II. Personality disorders and childhood disorders

III.Physical disorders

IV. Severity of psychosocial stressors

V. Global assessment of Functioning Scale (GAF) The DSM-III, DSM-III-R and DSM-IV show many changes from the DSM-II which was based on theory only. Since DSM-III (in 1980, not 1990 as the text suggests) criteria have been assigned on the basis of empirical research.

The distinction between I. Major categories and II. Personality disorders and childhood disorders seems artificial and will probably be dropped in the next revision of the system.

The category of neurosis was dropped because it was too vague.

Hysteria was omitted because it was found to be a meaningless compendium of symptoms.

Homosexuality as a psychiatric disorder was dropped by a vote of the members of the American Psychiatric Association. It is now not seen as pathological. The psychoanalytic faction of the Psychiatric Association wanted to keep it, insisting that it is a pathological condition, but other, more biologically-oriented psychiatrists, objected saying they could not find evidence of pathology.

The concept "psychosomatic" was dropped in recognition of a psychological role in all illnesses.

Some added:

Many child disorders.

Psychosexual dysfunctions.

Improved distinctions between types of depression.

Provided warnings about cultural appropriateness

SCIENCE AND SCIENTIFIC METHOD

This is related to chapter 4 in your textbook and it provides a quick overview of the science of psychology. You have already had something on this in the Introduction to Psychology course, but it bears repeating. It is so easy to slip into non-scientific popular psychology.

About 50 years ago, Carl Rogers and his associates advocated using client centered therapy to treat people with schizophrenia. They believed that schizophrenia was caused by a lack of self-regard, a terrible self-concept, and reasoned that their therapy would help restore or build a positive view of self. Client centered therapy was adopted by many clinicians. To Rogers' credit, he used scientific methods to study his hypothesis (see textbook). He found that his therapy did nothing at all for people with schizophrenia, and he abandoned its use with this population. Apparently his theory was wrong; it was not supported by research results. Also in the 1950s it appeared that two new drugs, reserpine and chlorpromazine, might have positive effects on the symptoms of schizophrenia, but it was not clear which might be better. Many papers were published based on the opinions of psychiatrists who had used one drug or the other. Finally, a multi-site, double-blind, multi-measure study was done within the VA Hospital system. The results favored chlorpromazine by a large margin, and it was the standard drug for many years.

Today, people are told in advertisements that herbal remedies such as kava kava for anxiety and St. John's Wort for depression are as effective as prescription drugs for the disorders mentioned. But are they? What is the nature of the evidence. Most of the claims for kava kava are based on the reports of individuals. I have not seen any controlled research. St. John's Wort occupies a middle position. There are many controlled studies showing positive effects on depression, with few side effects, but none of the studies meets the standards of the Food and Drug Administration (FDA) which in the United States rules on the efficacy of medications. A large study funded by the National Institute of Mental Health (NIMH) was recently completed and the results were mixed. St. John's Wort seems to help some people, but not all.

Claims can be made for the efficacy of many treatments, but until they have been investigated with scientific procedures, we will not know if they are efficacious or not. Note that in the section on assessment, nothing was said about the use of astrology. It is a method of determining something important about individuals that has been around for a very long time, and is still popular. St. Augustine in the 400s AD said it was a false method because he observed that twins have the same birth date and yet have different fates. Francis Bacon (1561-1626) regarded astrology as a superstition because it was not based on observations in the scientific method (as he was then inventing it).

Case Study Method

This is also called the idiographic, or pertaining to the individual, method. It is often used. It was Freud's basic method. It was also used by Masters and Johnson in their research on the treatment of sexual problems. Joseph Wolpe used it in his development of behavior therapy. It is vital and interesting method, but has some problems:

How can one generalize results from one person to others?

It does not use the scientific method.

Coincidence is not controlled.

I have had several discussions with friends about the loss of the Taos Planned Parenthood center. One woman insists that it is gone because Hispanics are Catholic and are opposed to contraception. How does she know? She has a neighbor who is Hispanic and has repeatedly voiced her opposition to birth control. In contrast, I have data from 600 Hispanic mothers of very young children in San Antonio. I found that 92% were using contraception and were highly in favor of effective means of controlling the size of their families. My friend's method was case study or idiographic and mine was nomothetic. Which is more persuasive?

The case study method is basic to most popular psychology and most claims of cures via alternative medicine are based on case studies. The same is true of many of the claims about new forms of psychotherapy.

It does have some value if it can:

Provide an account of a rare phenomena.

Disconfirm alleged universal aspects of a theory.

Generate hypotheses, preliminary to science.

Help to understand the link between variables

Help to understand normative data as it relates to individual persons. Necessary for the development of personality theories.

But, it is never a substitute for the controlled trial using a large number of people.

Empirical

The more scientific method is nomothetic: pertaining to the formulation of general laws and principles. It has to do with groups of people, allowing the use of statistical procedures and experiments. Scientific procedures also make use of mathematical formulations. In psychology the mathematics most often used is statistical.

We begin with Lord Kelvin's dictum that all things that exist exist to some degree and therefore can be measured. This is basic to everything that follows in these lectures. For example, I was recently in a discussion group in which the topic was "happiness." The discussion ranged broadly and I remarked that having money is related to happiness, but only up to a point. That point was where basic needs and expectations had been satisfied, but that there was no evidence that the very rich were more happy than ordinary middle-class people. The discussion continued and one person asserted forcefully that the rich were happier because they had more freedom. I repeated that empirical research has shown that the rich were not happier. The other person would not recognize that degree of happiness could be measured and that research supported my contention. We measure psychological phenomena.

Some Useful Terms

Operational Definitions

Terms used must be precisely defined in by means of measuring them. For example, in a study designed to test whether anxiety is relieved by a therapeutic method, systematic desensitization. G. Paul hypothesized that with systematic desensitization anxiety would decrease. What do we mean by anxiety? He had to define it and provide a way of measuring it in terms of the definition. Paul used three measures--self-report, observations by others, and physiological. He defined anxiety in terms of how it could be measured.

Hypotheses

These are often stated in terms that suggest the researcher knows the outcome. "Anxiety will be reduced by systematic desensitization." However, philosopher Karl Popper points out that scientists do not set out “to prove” something. They test hypotheses in order to confirm whether a theory receives research support or not.

Testability

This is the researcher's ability to confirm or disconfirm the hypothesis. The question asked is, Has the hypothesis been stated in such a way that it can be proven wrong? Much of psychoanalytic theory fails this test.

Dependent Variable

This is the outcome measure. In the case of the anxiety study, it is the measure of anxiety after the intervention.

Independent Variable

This is a factors that has an influence on the dependent variable in the anxiety study it was the systematic desensitization treatment.

Internal and External Validity

Confound What if some of the clients in Paul’s anxiety study were taking an anti-anxiety medication such as Xanax. Could the results be attributed to the psychotherapy or to the medication? Not clear. This is a confound.

The degree to which confounds are present is an index of internal validity. One can improve internal validity by using:

1) a control group.

2) randomization to experimental or control groups.

3) analog models. Replicating aspects of the study under the controlled conditions in a laboratory.

Generalizability

The extent to which the results apply to others. Try to think of exceptions--cultural or national variation? For example, it is sometimes said that obesity is common because of metabolic disorders. One must then ask, why obesity not common in other nations where people are similar to Americans genetically (and metabolically) such as Holland where there is little obesity. Restrictive settings; e.g., prisons. Age? Gender?

Necessary, Sufficient and Contributory Conditions

Necessary

This is something that must be present for an event to occur. Is a genetic condition necessary for schizophrenia to occur? Probably not because it seems there may be other causes. A genetic condition is a necessary and sufficient for the development of Huntington's Chorea because it is a genetic dominant condition.

Sufficient

A condition that is enough by itself to cause an abnormal reaction. A genetic predisposition may be a necessary condition, but does not in itself cause the condition. A sense of hopelessness may be enough in itself to cause depression, but when would this ever exist by itself?

Contributory

Conditions that are neither necessary nor sufficient, but may be involved. The death of a loved one may contribute to depression.

Statistical Versus Clinical Significance

Statistical significance is the product of statistical tests that tell whether the probability of obtaining an observed effect is small. It is possible to obtain statistically significant effects that have little practical meaning.

Example: In one Houston Parent-Child Development Center (H-PCDC) follow-up of effects on child behavior problems our outcome measure was a teacher rating of problems. We obtained statistically significant differences between groups and concluded that the prevention program had lasting effects. However, one of editors of the journal to which we sent the manuscript wanted to know how important the differences were between groups, that is, to test for clinical significance. We found the measure we used to assess behavior problems had been used in other research. In one study the measure was used to assess behavior problems of children who had been independently referred for clinical services. If children had a score above a certain level they tended to have been referred for services. We used this as a cut-off point and examined the scores of children in our study to indicate clinical significance. We found that this method also showed statistical significance; i.e., fewer H-PCDC children would have been referred than control group children.

Effect Size

This measure is used much today to compare the results of different studies of the same problem, such as how effective is cognitive-behavior therapy for depression. This looks at differences between mean scores of groups on some key outcome measure; e.g., the Beck Depression Inventory, divided by the pooled standard deviation. Thus, it is a measure of how large the differences are between groups after taking into consideration variation within the groups. It is a measure of the clinical magnitude of a treatment effect.

Social Validity

Do other people think the person treated has been improved? What about relatives, employers, therapists? Interestingly, many studies have shown that the best predictor of later performance is not from therapist, psychological assessors, employers, or the person herself or himself, but from peers, that is, people like the person who know the person well. They seem most free of biases that would interfere with accurate prediction and perhaps they have the best real-world view of the person's performance.

Replicability

If you do a study and get a result, can you do it again and get the same result? This is at the heart of science. Whenever some chemist makes a chemical discovery and it is reported, there will soon be a dozen or more replications of the study. We are less likely to do this in psychology, because it is more difficult to do replications, more expensive to find subjects and to conduct the research. Nevertheless it must be done. My H-PCDC experiment on primary prevention was done with one primary study and seven replications. We got essentially the same outcome results each time. Thus, we demonstrated replicability.

This insistence on replicability limits the kinds of things that can be studied. Whether UFOs exist or not is in question because one cannot bring them forward (if they exist at all) for study at will. Dreams pose difficulties for the scientist because they are so ephemeral. A person dreams many times each night and only once in awhile are dreams remembered. Even then, the memory fades quickly. What can be done? Perhaps the best method is to observe the sleeping person, notice when eye movements indicate dreaming is underway, wake the person and obtain a record of the recalled dream at that time.

Role of Theory

Theory guides research and is essential to the scientific process. It helps pull ideas together and structures subsequent research. Philosophers of science have asked about how it works. Does science proceed from a collection of facts and an examination of these to build a theory, or from a general theory that leads to the collection of facts. Medawar (1969) favors the latter. I am not so sure. In our work on otitis media, our theorizing was based on facts available to us. We did have a general theory before collecting data, but it was not specific and yet our research has been productive. Perhaps now that we have a collection of facts a more specific theory can be created.

The Scientific Method and the Search for Answers

Does television watching enhance aggression in children? Nearly everyone has an opinion on this matter, but what is the truth? This is a question that has produced an enormous amount of research, and the answer is pretty clear by now, but it has not always been clear. There has been a sequence of studies.

One of the earliest was by Eron (1963) who found a significant correlation between the amount of time children watched television and the amount of aggression they showed. This result leaves us with the question, does TV watching cause aggression or does aggression cause TV watching? As it was a correlational study, we do not know.

The next study on the topic was by Liebert and Baron (1972) who did an experiment. They randomly assigned children to experimental and control groups to watch aggressive TV programs or neutral programs. The children were then to observe another child in a learning situation and were told they could, if they wished, punish the other child by giving him or her a small electric shock when learning mistakes were made (The child did not really get a shock).

Sample

E C

X watch aggress TV Watch neutral TV

O "hurt" most "hurt" least

Children who watched aggressive TV were more likely to administer the shock. The authors concluded that TV watching of aggression causes aggressive behavior; it increases the rate of aggressive behavior.

This was an experimental study and not subject to the criticisms brought forward against the earlier study. But there is still the question of external validity: "The extent to which the results of any particular piece of research can be generalized beyond their immediate experiment." There is also the question of ecological validity: this was an artificial situation. Would the result be the same in a natural environment?

This led to the third study, a field experiment.

Friedrich-Cofer (who was on the faculty at UH for many years) and Stein (1973) carried this out with children in Kansas. Here the measure of aggressive behavior was observation of natural behavior on the preschool playground.

Sample

E C

O Baseline---------observe children at play

X Watch violent TV Watch nonviolent TV

O Repeat baseline observation

They found that watching violent TV resulted in an increase in the aggressive behavior of children who were already quite aggressive. Watching violent TV had little effect on nonaggressive children.

Conclusion to the series of studies: There is more watching of violent TV by children, and playing violent computer games, than could have been imagined in the 1960s and 1970s. The scientific results are in, but they have had a very small effect on TV policy which is driven by advertising revenue, not science.

Internal Validity

Research in which the results can be attributed with confidence to the manipulation of the experimental variables.

Double Blind

This is commonly used in medical experiments, as in drug studies. It is used in randomized clinical trials (RTCs). The "blind" is that neither the patient nor the staff know who is receiving the drug in question. Only the research staff know and they have nothing to do with administering the medication or making prescription adjustments. An example is the study of the efficacy of clozapine, an atypical anti-psychotic medication used to treat the symptoms of schizophrenia.

The research began with a sample of patients who have not benefited from a typical antipsychotic medication. All patients had been placed on halolperidol (Haldol). This was tried for 6 weeks. Then the researchers took the group that showed no improvement and assigned patients randomly to

E C

O baseline observations

X Clozapine Thorazine

O repeat observations

Double blind: Neither patient, nor persons administering the drug knew who is E and who is C. Of course, the researchers knew, but not the observers.

Result: 30% improved 4% improved

In addition, clozapine had fewer side effects. The conclusion was that clozapine was a superior drug for patients who had not benefited from typical anti-psychotic medication.

E = Experimental

C = Control

O = Observation

X = Treatment

Single-Subject Experiments

Reversal design

ABAB

Baseline, intervention, observation, intervention.

Does diet cause hyperactivity?

A: Baseline observation

B: Remove suspect diet

A: Observe again

B: Resume normal diet

A: Observe

This method is useful for exploratory research, but still cannot generalize because it is limited to one person. Many studies of this kind showed that the Feingold diet has positive effects for hyperactive children, but controlled experiments did not. Perhaps diet is involved in some children, but which ones? One solution is to try the diet and see what happens. However, in general the Feingold diet (no sugar, food dyes) is no longer recommended because of the lack of evidence of efficacy in the controlled group trials.

Longitudinal Research

It is finally being recognized that it is necessary to follow people over long periods of time to understand the effects of treatments. I have been an advocate for this point of view as a member of the National Advisory Council of the NIMH. I argued that psychiatric drug trials tell us too little because they cover only a few weeks or possibly months of a person's life with the drugs. As an example, some patients with schizophrenia continue to show benefits from clozapine several years after beginning treatment. This is not recognized by brief drug trials.

Cross-sectional research tells us what is happening at one point in time. Longitudinal research tells what happens across many points in time. For example, we evaluated the efficacy of the H-PCDC at the end of the program when children were 3 years of age and got positive effects. Would they last? As a program designed to prevent school failure and behavior problems we had to follow the children. We did, at ages 4 and 5, then again at 7-9, at 11-13, and finally at 13-17. We found lasting effects, although only on some measures. We found that many effects that were present soon after the program (such as increased IQ) dwindled with time. Other researchers of child behavior problems have obtained similar results.

Recent research on the treatment of depression has shown the value of follow-up or longitudinal research. The effectiveness of both anti-depressive medication and psychotherapy has been demonstrated many times. All of the studies were short term. Now there are a number of longer follow-ups. They show very high relapse rates, especially for medication. This has resulted in a revolution in thought about how to treat depression. It is now regarded as a chronic condition in need of continued observation and treatment.

Epidemiological Research

Epidemiological research began in London with Dr. John Snow and the cholera epidemic of 1854. There were two water systems involved in the cholera area: Lamberth and Vauxhall, one up-Thames and one down. Snow found the Broad Street water pump the source of drinking water for people in the area. He carefully charted the cases of cholera and where people lived. He then derived that there were more cases associated with water from the down-stream source. He did not know that cholera was caused by a water-borne germ because Pasteur's germ theory was still to come, but he suspected that there was something bad about the water. Snow carefully mapped the area, marking where each case was located. His search of the area was thorough. He removed the pump handle and cases of cholera decreased. His methods have been used with virtually all diseases ever since. For example, they were used in the successful world-wide eradication of small pox.

How much of a problem is mental illness? How could we know? In the United States there have been two national studies. The first, the Epidemiological Catchment Area study of the 1980s and Kessler’s study of about 10 years ago. These samples of the American population have told us just how many people have each kind of psychiatric disorder. We will review results as we go along.

There are no national studies of the rates of mental disorder for children in the USA. We are must rely on data from Canada and Holland to have some idea of prevalence.

Two concepts are of use.

Incidence. This is the number of new cases of an illness in a given period of time. If there is an outbreak of measles in the county each new case is a matter for the incidence record. Prevention research attempts to reduce the incidence of illnesses. We have no national data on psychiatric disorders in the USA. Again, we rely on research in Scandinavia and other European countries.

Prevalence. This has to do with the number of cases of an illness in the community at a given time. This is the count that is most often used. We will refer to these figures as we go along. Measles would not be included in prevalence records because it is not an illness that lingers. Schizophrenia, on the other hand, lasts for a life-time and so a count of the number of people in the community with the disorder is very important.