Lecture 3
Assessment, Classification
and the Science of Abnormal Psychology
Dale
L. Johnson
Chapter
3 covers a lot of territory. Only a few years ago all textbooks in abnormal
psychology devoted three chapters to the topics covered here. In condensing to
one chapter my guess is that the authors got rid of some of the fluff and kept
the core ideas. If so, it means that a careful reading is necessary. In this
lecture, I will comment on some of the key ideas.
Clinical
Assessment Procedures
Psychologists are the only mental
health professional with training in systematized assessment. Psychiatrists and
social workers are trained to do interviews, but only rarely are they trained
to do the structured interviews that are now required for research. For many years,
a major role for psychologists in hospitals and clinics was the administration
of psychological tests and interpreting and reporting the findings. I have
given hundreds of Rorschach Inkblot Tests, Thematic Apperception Tests (TATs),
and Wechsler Adult Intelligence Scales (WAISs), and written hundreds of
reports. However, I have not given a Rorschach since 1961 when I tested a
Navajo woman deep in the reservation at the request of an anthropologist. I
gave it up because in 1960 I became director of a human relations research ward
at the Houston VA hospital and we adopted alternate ways of assessing our
patients (or participants, as we called them). We found the widely used tests
such as Rorschachs and TATs were irrelevant to the kind of treatment we provided.
We needed behavioral and self-report measures, not methods based on dubious
psychodynamic assumptions. We found the Minnesota Multiphasic Personality
Inventory (MMPI) was useful for diagnostic purposes and behavioral measures
provided information needed to tailor the treatment program to meet individual
needs. In addition, we used the measures to monitor change and to show us
whether the treatment had been effective in the long run (In general it was).
Nevertheless, clinical psychologists
still do give the projective tests (as the Rorschach and TAT are called) in
many practice settings. The clinical psychology training program at UH no
longer teaches these tests, nor do other programs that emphasize a
scientist-practitioner model. Recent research has focused on the identification
of specific problems. One such line of research was the use of projective tests
to detect child sexual abuse. Results showed low validity for projective tests.
Many children who had not been abused would be said to have been abused by
projective test results. Conversely, many abused children would have been
missed.
Reasons
for Assessing
Assessment is done for clinical
purposes, to aid in making a diagnosis, identifying symptoms to target for
special attention, and to find personal characteristics that may facilitate or
inhibit treatment. This use is elaborated in the textbook.
The other main reason for assessment
is in research. For example, in our research at the UT medical school in
Galveston on the lasting effects of otitis media with effusion (OME) or middle
ear infection on child development. We examined baby's ears for signs of OME
with biweekly examination of the ears for three years. Then we looked at
effects on what we thought, from our review of the literature and experience of
the pediatricians on the project, might be long term consequences of having
partial hearing loss for periods of time early in life. Some of the children
had impaired hearing for one-third of their early lives. We measured
intelligence, language development, and behavior problems at ages 3, 5 and 7,
and added social relations and school achievement at age 7. All of the measures
used were selected because research had shown they were highly reliable, valid,
and had been standardized. As a research project we were less interested in how
individual children developed than how a group of children, representative of
the Galveston area, developed. The results are mixed, with some evidence of
effects on intelligence, or IQ to follow the textbook, at age three that
disappeared at ages 5 and 7; and effects on language development at age 5. None
of the effects were very great. Although statistically significant, they were
probably not clinically significant. We suggested that persistent OME does not
pose a great danger to child development. Persistent OME after age three may be
a different matter altogether. We did not study that. We are still working on
the age 7 data.
Choice of Method
In clinical work the psychologist
wants to know what the problem is. To answer this, it is necessary to obtain
information about the characteristics of the client and of her or his
environment. Assessment examines the client's life system. It is done with the
cooperation of the client. The assessment is not limited to seeking the source
of the problem, but looks at the client's way of thinking about the problem and
at how the person copes with problems. For example, some people become
depressed if they set goals for themselves that they do not really want. They
feel they must have these goals. They go through life uncomfortable with their
lack of achievement and a vague feeling that they have missed their true
calling. They become depressed. Assessment might reveal that this unrealistic
goal setting is a part of the problem and lead the therapist to help the client
to explore alternate goals. A depressed, but fairly well-to-do, real estate
salesperson might prefer to be a much less financially well-off rock musician,
and not be depressed. Assessment might find that another person with depression
has habitually negative modes of thinking about self, others and the world in
general, and rarely identifies positive experiences for self. This assessment
result sets the stage for cognitive-behavioral therapy. Assessment of this type
is typically used in an on-going basis, and the client and therapist can both
see progress being made.
Reliability
and Validity in Assessment
These are important matters and
every student of psychology should have a clear understanding of this importance.
The textbook section on this is very good. Note the following especially:
Reliability
Interrater
test
retest
internal
consistency (split-half)
Validity
Types of Psychological
Assessment
Clinical
Interviews
Interviews are a standard part of any
assessment; they offer flexibility, comprehensiveness, and sensitivity. The
most common type of interview used is the open, unstructured interview. It has
some advantages; it can be adapted to
the interests of the client and can lead into a therapeutic relationship with
ease. A disadvantage is that important issues for diagnoses may be missed or
the client may lead the clinician down the wrong path. This type of interview
fails to meet the requirements for reliability, validity and standardization.
The structured, or semi-structured
interview was developed along with the newer, research-based, diagnostic
criteria. In these interviews the interviewer follows a prescribed set of
questions. These questions are designed to elicit all of the information necessary
to make a DSM-IV diagnosis. Although many clinicians use these interviews, most
prefer the flexibility and possible brevity of the open interview. I have used
structured interviews for years and they have helped me discover information
useful for treatment planning that I would never have found using an open
interview. Structured interviews were designed for research use, and are nearly
always used when researchers must define a distinct clinical group for their
research. Thus, if one is studying obsessive-compulsive disorder, all of the
subjects must have obsessive-compulsive disorder (OCD) and not just something
that might be OCD. These interviews can be reliable (with training), valid and
standardized.
Psychological Tests
The first psychological tests were
developed in England in the middle 1800s, but the first widely used test was
that of Binet and Simon in Paris about 100 years ago. Their test of
intelligence, designed to predict success in school, formed the basis for all
subsequent IQ tests. The method of selecting tasks, presenting them to subjects
in a standard way, and comparing children of the same age, is basic for later
tests.
Projective
The Rorschach inkblots, Thematic
Apperception Test (TAT), Incomplete Sentences Test, are only a few of the
projective tests that grew out of the fascination with uncovering unconscious
feelings and thoughts. After World War II when American clinical psychology
began to prosper, these tests comprised the core of the battery of tests used.
Every clinical psychologist was trained in their use and expected to administer
them as one of the unique contributions of psychology.
Gradually, a sense of disquietude
appeared. There were problems. On the Rorschach, for example, how do you
compare a person who gives only one response per card (there are ten cards)
with a person who gives 100 responses? How can you make sense of the fact that
if you give the test one day to a person, ask him to return the next day and
take the test again, but this time to give different responses, and then ask
experts to find the two sets of responses. They cannot do it. The same person
apparently can have two different personalities at will. These problems have
led some psychologists to doubt the value of the test. More important, however,
is that increasingly, it was obvious that the interpretations of Rorschach test
results were simply irrelevant; e.g., "This patient has latent homosexual
wishes." So what? Or, "This patient often feels anger toward father
figures." So what else is new?
On the positive side, Rorschach
responses may have some value in identifying signs of thought disorder in
patients suspected of having schizophrenia. However, patients with bipolar
disorder, major depression and OCD may also show signs of thought disorder. The
Exner system, mentioned in the textbook, does improve reliability of
administration and scoring, but has not improved the validity question. The
test has not been standardized in an adequate way.
The same can be said of other
projective techniques. If we were to assign them grades, they would get a low
C.
Personality Inventories
Personality inventories such as the
MMPI-2, the Millon, and various measures of specific symptoms such as the Beck
Depression Inventory, are quite another matter. They are highly reliable,
valid, and are standardized. They are widely used in clinical practice and
research. The MMPI-2 can be scored in a great many different ways, many more
than are suggested in the textbook. I do not hesitate to say that I can look at
the results of an MMPI-2 and tell you a great deal about a person, and chances
are good that I would be right, not because I am a clinical whiz, but because I
would follow standard procedures for making interpretations. A computer could
do as well.
Cognitive Tests
These include the Wechsler series
such as the Wechsler Adult Intelligence Scale (WAIS) or the
Stanford-Binet-Fourth Edition (SBIV), and many other tests. The SAT and GRE are
cognitive tests, and whether they should be considered aptitude or intelligence
tests or test of achievement, is a matter for discussion. Suffice it to say
here they are very reliable, remarkably valid, and all are beautifully
standardized. However, none are standardized in the way the textbook says on
page 66. If "an African-America male, 19 years old, and from a
middle-class background" took the test, his results would never be
compared with a group of AA, male, age 19, middle-class. He would be compared
with a group of adults from a sample representative of the American population
which would include African-Americans. It would be impossible to make such
specific standardizations as the book suggests, and totally unnecessary.
IQ tests seem always to draw
controversy, but in fact they are probably the best predictors of behavior in
the whole library full of psychological tests.
[Something to ponder: president George W. Bush had an SAT Verbal
score of 566 when he was admitted to Yale. Early presidential candidate Bill
Bradley had a 438. Quantitative scores were not given in the report as it
appeared in WWW.Slate.com. George W. had a father and grandfather who were Yale
alums, and Bill B. was a basketball star. Both men completed their freshman
year with a C average. Bill B. raised his average enough to get a Rhodes
scholarship, but George W. B. continued to get Cs. That intelligence or
aptitude tests tend to agree is shown by George W.'s results when he took the
LSAT when applying to the UT School of Law. He was rejected because his score
was too low. Of course, he did get an MBA from Harvard.]
Neuropsychological
Testing
In seeking an understanding of the
relations between brain function and behavior neuropsychological testing plays
a central role. When I was a psychologist with the VA Hospital I spend many
hours working with neurologists in trying to determine whether the symptoms
presented by a patient could be attributed to brain damage. Today no
psychologist does that. The identification of brain damage is done with MRI
scans, or some other form of brain imaging (See textbook pp. 78-79). Neuropsychologists have not joined the ranks
of the unemployed; they have shifted their work to an exploration of how
specific brain damage affects behavior. Thus, they might use the Wisconsin Card
Sorting Test to understand how the cognitive function called "executive
functioning" is affected by damage to the frontal lobes.
Neuropsychological testing is essential in cases of traumatic brain injury, as
in automobile accidents, to see which cognitive functions are intact and which
show impairment. In suspected Alzheimer's Disease, the neuropsychologist does
the testing of memory functions that makes the first diagnosis.
Neuropsychological testing is also
used to plan rehabilitation. This is true for traumatic injury, strokes, and
psychiatric disorders involving brain dysfunction such as schizophrenia.
The University of Houston has one of
the most highly regarded neuropsychological training programs in the country.
It is part of the clinical psychology training program.
Behavioral
Assessment
The textbook chapter is very good on
this topic and does not need to be repeated. The authors point out how
important behavioral assessment is, and I would like to support that opinion.
It does get past some of the problems of interviews and of self-report, and it
is an answer to the criticism of psychological tests that they lack ecological
validity (i.e., they do not account for the real-life situation. On the other
had, they typically do not fit in with managed care limitations on time and
there are observer reliability problems. For example, in assessing children for
the evaluation of the Houston Parent-Child Development Center (H-PCDC), we
wanted to know if the program had resulted in a reduction of expected behavior
problems. We had mothers, teachers and the children themselves, now young
adolescents, describe behavior problems on such standard measures as the Child
Behavior Checklist (CBCL). The measures used were reliable, valid, and
standardized. All three measures showed that children who had been in the
H-PCDC had fewer problems than children in the control group. However, the
three sources of information (mother, teacher and child) were not correlated
significantly. Each source saw the behaviors somewhat differently. Our finding
is typical of research in this area. In a sense, we failed to obtain
inter-rater reliability. We might have obtained higher reliability if we had
trained mothers, teachers and children in the use of the measure, but this was
not practical when assessing 250 children. It is a research problem that is
unresolved.
Gordon Paul's research (He is in the
Psychology Department at UH) on behavior observation is regarded as excellent.
His system is designed to assess behavior with carefully trained observers in
institutional settings. The results of the observations are used on a daily
basis in his social learning treatment program.
Neuroimaging and Psychophysiological
Assessment
See the book for information on
these topics.
Assessment of
Anxiety--Elusive Construct
Some clinical problems require
several kinds of assessment. Anxiety disorders are an example of this. A
clinical interview is essential, as is a self-report on some measure of
anxiety. In addition, however, it is often useful for treatment planning to
include psychophysiological assessment and observation reports by people who
know the client well.
Classification and Diagnosis
Classification
is not unique to abnormal psychology. All humans classify, as do other mammals.
My dog knows which animals are also dogs, and knows that cats and cows are not
dogs. She seems to find these other animals interesting, but knows they are not
of her kind, and not as interesting to her as are other dogs.
The
history of psychiatric classification is long. For example, the Greek physician,
Hippocrates, wrote that he could tell if a pregnant woman would have a girl or
boy baby. He found a way to classify women. Those who were pale in face would
have girls and those who were rosy cheeked would have boys. How many cases did
he have to see to arrive at this apparently fairly accurate classification
system? He kept track, talked with other observers and come to some
conclusions. Note that this is a predictive system: classes determined at one
time predict later events.
Early
Classification
There
is early evidence in writing that shows that the Egyptians of more than 2000
years ago classified war wounds. They sorted them into various types so that
each type could be treated in an appropriate way.
The
Greek, Empedocles, believed the number of elements was four: Earth, Water, Fire
and Air This led to an elaboration of the four elements into the four qualities
(bile, blood, phlegm, black bile), the four humors (sanguine, phlegmatic,
choleric, melancholic), the four ages of man, and the four seasons. He created
an inter-related explanatory system
Galen
developed a classification by function, but one that was based on physical
structure. Thus, "hot" diseases were treated by "cold"
remedies.
Hypocrates
developed the idea of acute and chronic diseases. We still use this idea.
It was
not until the 18th century that the idea of a specific physical location for
mental disorders was rejected by Boussard. He asked, "What is the seat of
mania?" No one could show it to him.
Psychiatric
Classification
Pinel,
in France, was the first to observe and describe what we think of as
schizophrenia. Kraepelin added to the specificity of the diagnosis around the
turn of the last century, and E. Bleuler gave the disorder the name we use
today. The Japanese no longer use the term "schizophrenia." They call
the same set of symptoms "integrative disorder." Now diagnosis of
this disorder is done with greater reliability, but the question of validity
remains open. One question is whether there is one schizophrenia or many.
The
tradition of classification is very strong in medicine. You feel ill, go to a
doctor, and expect to hear a diagnosis: "You have measles, not the
flu." One either has measles or not. You may have it in a severe form or
just a touch, but it is measles.
The
problem is somewhat different in psychiatric classification. The Diagnostic and
Statistical Manual, fourth version (DSM-IV) uses behavioral symptoms for the
diagnosis of all of the disorders listed. In no case is it possible to do a
blood test and say, "Ah, the results show that you have moderate
depression." The problem then is to find sets of behaviors (syndromes)
that reliably occur together to form a specific disorder. We will explore these
sets of symptoms or behaviors as we go further into our subject matter. What
you will see, however, is that the sets are sometimes comprised of an odd lot
of behaviors. Furthermore, it is sometimes possible for two people to be given
the same diagnosis and not have any symptomatic behaviors in common.
Psychiatric
classification is essential for research, record keeping and practice, in that
order. Researchers have to be able to assemble groups of people with similar
problems in order to develop an understanding of the disorders. For example,
research on schizophrenia was not very productive until researchers came to
some agreement about what behaviors or symptoms comprised schizophrenia.
Record
keeping also requires some information about diagnosis. Insurance payments make
use of this information. Ironically, we have no idea of how common
schizophrenia, or any of the other DSM-IV disorders appear in America, that is
incidence or number of first time cases. We know the incidence of polio, or
whooping cough, but not psychiatric disorders. Scandinavians, the Dutch and
Germans have these data, but not the USA. Psychiatric providers do not want to
be bothered with having to compile the data and send it to the Center for
Disease Control (CDC) for compilation. At present there is concern about an
apparent increase in the incidence of autism. Is there really an increase? We
don't know because we do not collect incidence data in a systematic way.
Clinicians
use diagnoses in an approximate way, but they really base treatment plans on
measures of the severity of various symptoms. That is,
"schizophrenia" is not treated as much as are positive symptoms such
as delusions and hallucinations.
Problems
with Classification
Granted
we need classification, but there are problems. The discussion of this in the
textbook is good and I will just add a few things.
Stigmatizing
To have
psychiatric disorder is to be stigmatized to some degree. This is less true in
the United States than in some other countries, and less true today than 50
years ago, but stigma still exists.
Children
given a diagnosis may be red flagged; expected to fail, or to present problems.
An adult with a diagnosis of epilepsy may not be able to get an automobile
license or insurance coverage for some forms of employment even though his
epilepsy is controlled; an adult with schizophrenia may not be able to get a
job even though recovered, if the employer knows the diagnosis. These are
reasons for not assigning diagnostic labels, but, people who have a psychiatric
problem need help and a diagnosis is part of the helping process.
Until
quite recently there was great reluctance to assign a diagnosis of
schizophrenia or to communicate it to the patient or family. Patients and
families would just go along wondering what is the matter. I visited Turkey
recently and learned that patients with schizophrenia are kept in hospitals,
crowded two to a bed, for two years before learning of their diagnosis. The
psychiatrists said they did not want to give it because "it was a death
sentence." Patients and their families almost always say that they
appreciate knowing what the diagnosis is. It helps them to understand unusual
behaviors.
Criticism
of Current Diagnostic Practice
There
are many critics of current diagnostic practices. It should be noted, however,
that no one has offered anything better that can be accepted by other mental
health professionals. These criticisms include the following:
1. Heterogeneity within diagnostic classes.
Categories are very broad and there is great
overlap of symptoms, but the importance of the specific symptoms is not
addressed. The revisions seen in the
DSM-IV are a partial solution to this problem. Nevertheless, two patients may
receive the same diagnosis and not have any of the same symptoms. This is true
for only some disorders that have long lists of symptoms from which the
diagnostic specialist must select the ones that apply to a certain patient.
There are 93 ways of getting a diagnosis of Borderline Disorder and two people
may have the diagnosis and have no common symptoms.
Too
many childhood behaviors have become psychiatric illnesses. Is Mathematics
Disorder more appropriate for the realm of psychiatry or for education?
2. Lack of Reliability:
This is
still a problem, but much less than it was before diagnosis became based on
research evidence, not clinical theory. There are several sources of the
problem. The percentages shown refer to estimates of how much of unreliability
is accounted for by this source. These estimates were based on DSM-II in 1962.
a.
Patient inconsistencies in telling of symptoms (5%).
b. Inconsistencies on the part of the examiner
(32.5%). The use of a standard, structured interviews helps correct this. Note
that it is not only psychiatric classification that presents this type of
problem. There are inconsistencies in fingerprint identification owing to
analyst errors. A recent FBI study of fingerprint identification in local law
enforcement agencies found an error rate of 20%. The FBI expert error rate was
only 0.03%.
c. Inadequacies of the diagnostic classification
system (62%). Much of this has been corrected in the DSM-IV.
The
DSM-IV has improved reliability. It is now moderately good for Axis I
disorders, but still not adequate for Axis II disorders
3. Lack of Validity
a.
etiological--same set of antecedents is assumed for many disorders;
e.g., early child abuse may be involved in dissociative disorder,
post-traumatic stress disorder, depression and anxiety.
b.
concurrent--are other symptoms part of the pattern; e.g., difficulty
holding job if schizophrenic. Why isn't this a symptom?
c.
predictive--does the diagnosis have anything to do with outcome or the
course of the illness? This is the issue of prognosis. What long-term outcome
is expected for this disorder?
DSM-IV
Multiaxial
(p. 84). The classification system calls for a recording of five kinds of
information.
I.
Major categories
II. Personality disorders and
childhood disorders
III.Physical disorders
IV. Severity of psychosocial
stressors
V.
Global assessment of Functioning Scale (GAF) The DSM-III, DSM-III-R and
DSM-IV show many changes from the DSM-II which was based on theory only. Since
DSM-III (in 1980, not 1990 as the text suggests) criteria have been assigned on
the basis of empirical research.
The
distinction between I. Major categories and II. Personality disorders and
childhood disorders seems artificial and will probably be dropped in the next
revision of the system.
The
category of neurosis was dropped because it was too vague.
Hysteria
was omitted because it was found to be a meaningless compendium of symptoms.
Homosexuality
as a psychiatric disorder was dropped by a vote of the members of the American
Psychiatric Association. It is now not seen as pathological. The psychoanalytic
faction of the Psychiatric Association wanted to keep it, insisting that it is
a pathological condition, but other, more biologically-oriented psychiatrists,
objected saying they could not find evidence of pathology.
The
concept "psychosomatic" was dropped in recognition of a psychological
role in all illnesses.
Some
added:
Many
child disorders.
Psychosexual
dysfunctions.
Improved
distinctions between types of depression.
Provided
warnings about cultural appropriateness
SCIENCE AND SCIENTIFIC
METHOD
This is related to chapter 4 in your textbook and it
provides a quick overview of the science of psychology. You have already had
something on this in the Introduction to Psychology course, but it bears
repeating. It is so easy to slip into non-scientific popular psychology.
About 50 years ago, Carl Rogers and his associates
advocated using client centered therapy to treat people with schizophrenia.
They believed that schizophrenia was caused by a lack of self-regard, a
terrible self-concept, and reasoned that their therapy would help restore or
build a positive view of self. Client centered therapy was adopted by many
clinicians. To Rogers' credit, he used scientific methods to study his
hypothesis (see textbook). He found that his therapy did nothing at all for
people with schizophrenia, and he abandoned its use with this population.
Apparently his theory was wrong; it was not supported by research results. Also
in the 1950s it appeared that two new drugs, reserpine and chlorpromazine,
might have positive effects on the symptoms of schizophrenia, but it was not
clear which might be better. Many papers were published based on the opinions
of psychiatrists who had used one drug or the other. Finally, a multi-site,
double-blind, multi-measure study was done within the VA Hospital system. The
results favored chlorpromazine by a large margin, and it was the standard drug
for many years.
Today, people are told in advertisements that herbal
remedies such as kava kava for anxiety and St. John's Wort for depression are
as effective as prescription drugs for the disorders mentioned. But are they?
What is the nature of the evidence. Most of the claims for kava kava are based
on the reports of individuals. I have not seen any controlled research. St.
John's Wort occupies a middle position. There are many controlled studies
showing positive effects on depression, with few side effects, but none of the
studies meets the standards of the Food and Drug Administration (FDA) which in
the United States rules on the efficacy of medications. A large study funded by
the National Institute of Mental Health (NIMH) was recently completed and the
results were mixed. St. John's Wort seems to help some people, but not all.
Claims can be made for the efficacy of many treatments,
but until they have been investigated with scientific procedures, we will not
know if they are efficacious or not. Note that in the section on assessment,
nothing was said about the use of astrology. It is a method of determining
something important about individuals that has been around for a very long
time, and is still popular. St. Augustine in the 400s AD said it was a false
method because he observed that twins have the same birth date and yet have
different fates. Francis Bacon (1561-1626) regarded astrology as a superstition
because it was not based on observations in the scientific method (as he was
then inventing it).
Case Study Method
This is also called the idiographic, or pertaining to the
individual, method. It is often used. It was Freud's basic method. It was also
used by Masters and Johnson in their research on the treatment of sexual
problems. Joseph Wolpe used it in his development of behavior therapy. It is
vital and interesting method, but has some problems:
How can
one generalize results from one person to others?
It does
not use the scientific method.
Coincidence
is not controlled.
I have had several discussions with friends about the
loss of the Taos Planned Parenthood center. One woman insists that it is gone
because Hispanics are Catholic and are opposed to contraception. How does she
know? She has a neighbor who is Hispanic and has repeatedly voiced her
opposition to birth control. In contrast, I have data from 600 Hispanic mothers
of very young children in San Antonio. I found that 92% were using contraception
and were highly in favor of effective means of controlling the size of their
families. My friend's method was case study or idiographic and mine was
nomothetic. Which is more persuasive?
The case study method is basic to most popular psychology
and most claims of cures via alternative medicine are based on case studies.
The same is true of many of the claims about new forms of psychotherapy.
It does have some value if it can:
Provide
an account of a rare phenomena.
Disconfirm
alleged universal aspects of a theory.
Generate
hypotheses, preliminary to science.
Help to
understand the link between variables
Help to
understand normative data as it relates to individual persons. Necessary
for the development of personality theories.
But, it is never a substitute for the controlled trial
using a large number of people.
Empirical
The more scientific method is nomothetic: pertaining to
the formulation of general laws and principles. It has to do with groups of
people, allowing the use of statistical procedures and experiments. Scientific
procedures also make use of mathematical formulations. In psychology the
mathematics most often used is statistical.
We begin with Lord Kelvin's dictum that all things that
exist exist to some degree and therefore can be measured. This is basic to everything that follows in these
lectures. For example, I was recently in a discussion group in which the topic
was "happiness." The discussion ranged broadly and I remarked that
having money is related to happiness, but only up to a point. That point was
where basic needs and expectations had been satisfied, but that there was no
evidence that the very rich were more happy than ordinary middle-class
people. The discussion continued and
one person asserted forcefully that the rich were happier because they had more
freedom. I repeated that empirical research has shown that the rich were not
happier. The other person would not recognize that degree of happiness could be
measured and that research supported my contention. We measure psychological
phenomena.
Some Useful Terms
Operational
Definitions
Terms used must be precisely defined in by means of
measuring them. For example, in a study designed to test whether anxiety is
relieved by a therapeutic method, systematic desensitization. G. Paul
hypothesized that with systematic desensitization anxiety would decrease. What
do we mean by anxiety? He had to define it and provide a way of measuring it in
terms of the definition. Paul used three measures--self-report, observations by
others, and physiological. He defined anxiety in terms of how it could be
measured.
Hypotheses
These are often stated in terms that
suggest the researcher knows the outcome. "Anxiety will be reduced by
systematic desensitization." However, philosopher Karl Popper points out
that scientists do not set out “to prove” something. They test hypotheses in
order to confirm whether a theory receives research support or not.
Testability
This is the researcher's ability to confirm or disconfirm
the hypothesis. The question asked is,
Has the hypothesis been stated in such a way that it can be proven wrong? Much
of psychoanalytic theory fails this test.
Dependent Variable
This is the outcome measure. In the
case of the anxiety study, it is the measure of anxiety after the intervention.
Independent Variable
This is a factors that has an
influence on the dependent variable in the anxiety study it was the systematic
desensitization treatment.
Internal and External
Validity
Confound What if some of the clients in Paul’s
anxiety study were taking an anti-anxiety medication such as Xanax. Could the
results be attributed to the psychotherapy or to the medication? Not clear.
This is a confound.
The degree to which confounds are
present is an index of internal validity. One can improve internal validity by
using:
1) a control group.
2) randomization to experimental or
control groups.
3) analog models. Replicating
aspects of the study under the controlled conditions in a laboratory.
Generalizability
The extent to which the results
apply to others. Try to think of exceptions--cultural or national
variation? For example, it is sometimes
said that obesity is common because of metabolic disorders. One must then ask,
why obesity not common in other nations where people are similar to Americans
genetically (and metabolically) such as Holland where there is little obesity.
Restrictive settings; e.g., prisons. Age? Gender?
Necessary, Sufficient and
Contributory Conditions
Necessary
This is something that must be
present for an event to occur. Is a genetic condition necessary for
schizophrenia to occur? Probably not
because it seems there may be other causes. A genetic condition is a necessary
and sufficient for the development of Huntington's Chorea because it is a genetic
dominant condition.
Sufficient
A condition that is enough by itself
to cause an abnormal reaction. A
genetic predisposition may be a necessary condition, but does not in itself
cause the condition. A sense of hopelessness may be enough in itself to cause
depression, but when would this ever exist by itself?
Contributory
Conditions that are neither
necessary nor sufficient, but may be involved. The death of a loved one may
contribute to depression.
Statistical Versus Clinical
Significance
Statistical significance is the
product of statistical tests that tell whether the probability of obtaining an
observed effect is small. It is possible to obtain statistically significant
effects that have little practical meaning.
Example: In one Houston Parent-Child
Development Center (H-PCDC) follow-up of effects on child behavior problems our
outcome measure was a teacher rating of problems. We obtained statistically
significant differences between groups and concluded that the prevention
program had lasting effects. However, one of editors of the journal to which we
sent the manuscript wanted to know how important the differences were between
groups, that is, to test for clinical significance. We found the measure we
used to assess behavior problems had been used in other research. In one study
the measure was used to assess behavior problems of children who had been
independently referred for clinical services. If children had a score above a
certain level they tended to have been referred for services. We used this as a
cut-off point and examined the scores of children in our study to indicate
clinical significance. We found that this method also showed statistical significance; i.e., fewer H-PCDC children
would have been referred than control group children.
Effect
Size
This measure is used much today to
compare the results of different studies of the same problem, such as how
effective is cognitive-behavior therapy for depression. This looks at
differences between mean scores of groups on some key outcome measure; e.g.,
the Beck Depression Inventory, divided by the pooled standard deviation. Thus,
it is a measure of how large the differences are between groups after taking
into consideration variation within the groups. It is a measure of the clinical
magnitude of a treatment effect.
Social Validity
Do other people think the person
treated has been improved? What about relatives, employers, therapists?
Interestingly, many studies have shown that the best predictor of later
performance is not from therapist, psychological assessors, employers, or the
person herself or himself, but from peers, that is, people like the person who
know the person well. They seem most free of biases that would interfere with
accurate prediction and perhaps they have the best real-world view of the
person's performance.
Replicability
If you do a study and get a result, can you do it again
and get the same result? This is at the heart of science. Whenever some chemist
makes a chemical discovery and it is reported, there will soon be a dozen or
more replications of the study. We are less likely to do this in psychology,
because it is more difficult to do replications, more expensive to find
subjects and to conduct the research. Nevertheless it must be done. My H-PCDC
experiment on primary prevention was done with one primary study and seven
replications. We got essentially the same outcome results each time. Thus, we demonstrated replicability.
This insistence on replicability limits the kinds of
things that can be studied. Whether UFOs exist or not is in question because
one cannot bring them forward (if they exist at all) for study at will. Dreams
pose difficulties for the scientist because they are so ephemeral. A person
dreams many times each night and only once in awhile are dreams remembered.
Even then, the memory fades quickly. What can be done? Perhaps the best method
is to observe the sleeping person, notice when eye movements indicate dreaming
is underway, wake the person and obtain a record of the recalled dream at that
time.
Role of Theory
Theory guides research and is essential to the scientific
process. It helps pull ideas together and structures subsequent research.
Philosophers of science have asked about how it works. Does science proceed
from a collection of facts and an examination of these to build a theory, or
from a general theory that leads to the collection of facts. Medawar (1969)
favors the latter. I am not so sure. In our work on otitis media, our
theorizing was based on facts available to us. We did have a general theory
before collecting data, but it was not specific and yet our research has been
productive. Perhaps now that we have a collection of facts a more specific
theory can be created.
The Scientific Method and
the Search for Answers
Does television watching enhance aggression in children?
Nearly everyone has an opinion on this matter, but what is the truth? This is a
question that has produced an enormous amount of research, and the answer is
pretty clear by now, but it has not always been clear. There has been a
sequence of studies.
One of the earliest was by Eron (1963) who found a
significant correlation between the amount of time children watched television
and the amount of aggression they showed.
This result leaves us with the question, does TV watching cause
aggression or does aggression cause TV watching? As it was a correlational
study, we do not know.
The next study on the topic was by Liebert and Baron
(1972) who did an experiment. They randomly assigned children to experimental and
control groups to watch aggressive TV programs or neutral programs. The
children were then to observe another child in a learning situation and were
told they could, if they wished, punish the other child by giving him or her a
small electric shock when learning mistakes were made (The child did not really
get a shock).
Sample
E C
X watch aggress
TV Watch neutral TV
O
"hurt" most
"hurt" least
Children who watched aggressive TV were more likely to
administer the shock. The authors concluded that TV watching of aggression
causes aggressive behavior; it increases the rate of aggressive behavior.
This was an experimental study and not subject to the
criticisms brought forward against the earlier study. But there is still the
question of external validity: "The extent to which the results of any
particular piece of research can be generalized beyond their immediate
experiment." There is also the question of ecological validity: this was
an artificial situation. Would the result be the same in a natural environment?
This led to the third study, a field
experiment.
Friedrich-Cofer (who was on
the faculty at UH for many years) and Stein (1973) carried this out with
children in Kansas. Here the measure of aggressive behavior was observation of
natural behavior on the preschool playground.
Sample
E
C
O
Baseline---------observe children at play
X Watch violent
TV Watch nonviolent TV
O Repeat baseline
observation
They found that watching violent TV resulted in an
increase in the aggressive behavior of children who were already quite
aggressive. Watching violent TV had little effect on nonaggressive children.
Conclusion to the series of studies: There is more
watching of violent TV by children, and playing violent computer games, than
could have been imagined in the 1960s and 1970s. The scientific results are in,
but they have had a very small effect on TV policy which is driven by
advertising revenue, not science.
Internal Validity
Research in which the results can be attributed with
confidence to the manipulation of the experimental variables.
Double Blind
This is commonly used in medical experiments, as in drug
studies. It is used in randomized clinical trials (RTCs). The "blind"
is that neither the patient nor the staff know who is receiving the drug in
question. Only the research staff know and they have nothing to do with
administering the medication or making prescription adjustments. An example is
the study of the efficacy of clozapine, an atypical anti-psychotic medication
used to treat the symptoms of schizophrenia.
The research began with a sample of patients who have not
benefited from a typical antipsychotic medication. All patients had been placed
on halolperidol (Haldol). This was tried for 6 weeks. Then the researchers took
the group that showed no improvement and assigned patients randomly to
E C
O baseline observations
X Clozapine Thorazine
O repeat observations
Double blind: Neither patient, nor persons administering
the drug knew who is E and who is C. Of course, the researchers knew, but not
the observers.
Result:
30% improved 4% improved
In addition, clozapine had
fewer side effects. The conclusion was that clozapine was a superior drug for
patients who had not benefited from typical anti-psychotic medication.
E = Experimental
C = Control
O = Observation
X = Treatment
Single-Subject Experiments
Reversal design
ABAB
Baseline,
intervention, observation, intervention.
Does
diet cause hyperactivity?
A:
Baseline observation
B:
Remove suspect diet
A: Observe
again
B:
Resume normal diet
A:
Observe
This method is useful for exploratory research, but still
cannot generalize because it is limited to one person. Many studies of this
kind showed that the Feingold diet has
positive effects for hyperactive children, but controlled experiments did not.
Perhaps diet is involved in some children, but which ones? One solution is to try the diet and see what
happens. However, in general the Feingold diet (no sugar, food dyes) is no
longer recommended because of the lack of evidence of efficacy in the
controlled group trials.
Longitudinal Research
It is finally being recognized that
it is necessary to follow people over long periods of time to understand the
effects of treatments. I have been an advocate for this point of view as a
member of the National Advisory Council of the NIMH. I argued that psychiatric
drug trials tell us too little because they cover only a few weeks or possibly
months of a person's life with the drugs. As an example, some patients with schizophrenia
continue to show benefits from clozapine several years after beginning
treatment. This is not recognized by brief drug trials.
Cross-sectional research tells us
what is happening at one point in time. Longitudinal research tells what
happens across many points in time. For example, we evaluated the efficacy of
the H-PCDC at the end of the program when children were 3 years of age and got
positive effects. Would they last? As a program designed to prevent school
failure and behavior problems we had to follow the children. We did, at ages 4
and 5, then again at 7-9, at 11-13, and finally at 13-17. We found lasting
effects, although only on some measures. We found that many effects that were
present soon after the program (such as increased IQ) dwindled with time. Other
researchers of child behavior problems have obtained similar results.
Recent research on the treatment of
depression has shown the value of follow-up or longitudinal research. The
effectiveness of both anti-depressive medication and psychotherapy has been
demonstrated many times. All of the studies were short term. Now there are a
number of longer follow-ups. They show very high relapse rates, especially for
medication. This has resulted in a revolution in thought about how to treat
depression. It is now regarded as a chronic condition in need of continued
observation and treatment.
Epidemiological Research
Epidemiological research began in London with Dr. John
Snow and the cholera epidemic of 1854. There were two water systems involved in
the cholera area: Lamberth and Vauxhall, one up-Thames and one down. Snow found
the Broad Street water pump the source of drinking water for people in the
area. He carefully charted the cases of cholera and where people lived. He then
derived that there were more cases associated with water from the down-stream
source. He did not know that cholera was caused by a water-borne germ because
Pasteur's germ theory was still to come, but he suspected that there was
something bad about the water. Snow carefully mapped the area, marking where
each case was located. His search of the area was thorough. He removed the pump
handle and cases of cholera decreased. His methods have been used with
virtually all diseases ever since. For
example, they were used in the successful world-wide eradication of small pox.
How much of a problem is mental illness? How could we know? In the United States
there have been two national studies. The first, the Epidemiological Catchment
Area study of the 1980s and Kessler’s study of about 10 years ago. These
samples of the American population have told us just how many people have each
kind of psychiatric disorder. We will review results as we go along.
There are no national studies of the rates of mental
disorder for children in the USA. We are must rely on data from Canada and
Holland to have some idea of prevalence.
Two concepts are of use.
Incidence. This is the number of new cases of an
illness in a given period of time. If there is an outbreak of measles in the
county each new case is a matter for the incidence record. Prevention research
attempts to reduce the incidence of illnesses. We have no national data on
psychiatric disorders in the USA. Again, we rely on research in Scandinavia and
other European countries.
Prevalence. This has to do with the number of
cases of an illness in the community at a given time. This is the count that is
most often used. We will refer to these figures as we go along. Measles would
not be included in prevalence records because it is not an illness that
lingers. Schizophrenia, on the other hand, lasts for a life-time and so a count
of the number of people in the community with the disorder is very important.