I. THE STRUCTURE
OF SCIENTIFIC INQUIRY
A. Science, Theory, and
Research
Research starts with the
researcher, the position where you stand, the world around you, your ethics,
etc. The conceptions of the researcher influence the research topic and
the methodology with which it is approached. Research is not just a matter
of technique or methods.
What is specific to social-science
research, as compared to say journalism, is the quest to examine and understand
social reality in a systematic way. What is observed is as important as
how it is observed.
General outline of a research:
theory, conceptualization of theoretical constructs into concepts, formalization
of relationships, operationalization, measurement or observation, data
analysis or interpretation, report.
1. Science and Reality
Science, as a system of propositions
on the world, is a grasp of reality; it is systematic, logical, and empirically
founded. Epistemology is the science of knowledge (what is knowledge?),
and methodology is the science of gathering knowledge (how to acquire knowledge?).
The inferences from science can be causal or probabilistic, and/or it seeks
to offer understanding of social processes. Factors that intervene in the
process of scientific inquiry include the available tradition of research
and the status of the researcher.
Scientific inquiry should
reduce errors in observations (mistakes, incorrect inferences), and avoid
over-generalizations (e.g. selective observations, only studying that which
conforms to a previously found pattern).
Mistakes include: a) ex-post
facto reasoning: a theory is made up after the facts are observed, which
is not wrong as such, but the derived hypothesis still needs to be tested
before it can be accepted as an hypothesis; b) over-involvement of researcher
(researcher bias); c) mystification: findings are attributed to supernatural
causes; in social-science research, while we cannot understand everything,
everything is potentially knowable.
Basically, the two necessary
pillars of science are logics and observation (to retrieve patterns in
social life, i.e. at the aggregate level). Note that people are not directly
researched: social-science research studies variables and the attributes
that compose them. A variable is a characteristic that is associated with
persons, objects or events, and a variable's attributes are the different
modalities in which the variable can occur (e.g. the attributes male and
female for the variable sex). Theories explain relationships between variables,
in terms of causation or understanding. Typically, this leads to identify
independent and dependent variables (cause and effect), or situation, actor,
and meaning (interpretation).
2. From Theory to Research
Different purposes of social-science
research can be identified: 1) to test a theoretical hypothesis, usually
a causal relationship (e.g. division of labor produces suicide); 2) to
explore unstructured interests, which usually involves a breaking through
of the empirical cycle, shifting from induction to deduction (e.g. what
is so peculiar about drug-abuse among young black females); 3) applied
research, for policy purposes (e.g. market-research).
The basic model of research
is: 1) theory, theoretical proposition, 2) conceptualization of the theoretical
constructs, and formalization of a model, the relationships between variables;
3) operationalization of the variables stated in the theory, so they can
be measured (indicators) and 4) observation, the actual measurements. The
inquiry can be deductive, from theoretical logic to empirical observations
(theory-testing), or inductive, from empirical observations to the search
for theoretical understanding of the findings of the observations (theory-construction).
(Note that, basically, it's always both, cf. Feyerabend, which is more
than just an alternation, it's rather an mutual constituency). The wheel
of science.
Deduction:
the logical derivation of testable hypotheses from a general theory
Induction:
the development of general principles on the basis of specific observations
B. Research Design, Measurement,
and Operationalization
1. Research Design
Research design concerns
the planning of scientific inquiry, the development of a strategy for finding
out something. This involves: theory, conceptualization, formalization,
operationalization of variables, preparations for observation (choice of
methods, selection of units of observation and analysis), observation,
data analysis, report (and back to theory).
a) Purposes of Research
The purposes of research
are basically three-fold:
1) Exploration: to investigate
something new of which little is known, guided by a general interest, or
to prepare a further study, or to develop methods. The disadvantage of
most exploratory studies is their lack of representativeness and the fact
that their findings are very rudimentary.
2) Description: events or
actions are observed and reported (what is going on?). Of course, the quality
of the observations is crucial, as well as the issue of generalizability.
3) Explanation: this is research
into causation (why is something going on?). This is extremely valuable
research of course, but note that most research involves some of all three
types.
b) Units of Analysis
The units of analysis refer
to the what or who which is being studied (people, nation-states). Units
of analysis can be (and often are) the units of observation, but not necessarily
(e.g. we ask questions to individuals about their attitudes towards abortion,
but analyze the religious categories they belong to). Units of analysis
in social-science research typically include: individuals within a certain
area at a given period of time; groups (e.g. the family); organizations
(e.g. social movements); products of human action (e.g. newspapers in a
content-analysis); and so on.
Two common problems are:
the ecological fallacy, i.e. making assertions about individuals on the
basis of findings about groups or aggregations (e.g. higher crime rates
in cities with a high percentage of blacks are attributed to blacks, but
could actually be committed by the whites in those areas); and reductionism,
i.e. illegitimate inferences from a too limited, narrow (individual-level)
conception of the variables that are considered to have caused something
broader (societal), (e.g. Durkheim does not explain any individual's suicide,
but only the suicide-rates among certain categories of people).
c) Focus and Time of Research
The focus in a research can
be on: 1) characteristics of states of being (e.g. sex of an individual,
number of employees in a company); 2) orientations of attitudes (e.g. prejudice
of an individual; the political orientation of a group), and 3) actions,
what was done (e.g. voting behavior of individuals; the riot participation
of a group).
Research, considered in its
time dimension, can be 1) cross-sectional at any given point in time; 2)
longitudinal over a period of time to trace change or stability (e.g. panel
study of the same people after two elections to see if and how their voting
behavior changed); 3) quasi-longitudinal by investigating certain variables
in a cross-sectional study (e.g. a comparison of older and younger people
indicates a process over time).
2. Conceptualization and
Measurement
a) Conceptualization
Theories are comprised of
statements that indicate relationships between constructs, i.e. particular
conceptions which are labeled by a term. These constructs should be conceptualized,
i.e. the meaning of the constructs must be specified, as a working agreement,
into clearly defined concepts (which are still mental images). Then we
can operationalize those concepts, i.e. specify indicators that measure
the concept in terms of its different dimensions (e.g. the action or the
ideas that are referred to by the concept of crime). Note that this process
reminds us that terms should not be reified into things.
Concepts, then, should be
defined in two steps: first, a nominal definition of the concept gives
a more precise meaning to the term, but it can not yet be observed as such,
therefore, second, the operational definition of the concept spells out
how it is to be measured or observed, so that the actual measurement can
be undertaken. Example: theoretical construct = social control; nominal
definition of concept = social control as the individual's bonding to society;
operational definition = attachment to primary institutions, which can
be high or low; measure = years of education. Note that these specifications
are absolutely necessary in explanatory research.
b) Measurement Quality
Measurements should best
be precise, and reliable and valid. Reliability and validity refer to the
relationship between measure and concept!!!
1) Reliability: does the
replication of a measurement technique lead to the same results?
This refers to the consistency
of the measurement techniques. Reliability can be achieved through the
test-retest method, i.e. the replication of a method on a phenomenon that
could not, or should not, have changed, or of which the amount of expected
change is known (e.g. asking for age, and asking again the next year, should
lead to a difference of one year). Another technique for reliability check
is the split-half method, e.g. if you have ten indicators for a phenomenon,
then use five randomly chosen in one questionnaire, and the other five
in the other one, apply to two random-samples, then their should be no
differences in the distribution of attributes on the measured variable
between the two. Other reliability techniques are the use of established
methods, and training of researchers.
2) Validity: does the method
of measurement measure what one wants to measure?
This means different things:
first, face validity is based on common-sense knowledge (e.g. the number
of children is an invalid measure of religiosity); second, criterion or
predictive validity is based on other criteria that are related to the
measurement (e.g. racist actions should be related to responses to racist
attitude scales); third, construct validity is based on logical relationships
between variables (e.g. marital satisfaction measurements should correlate
with measurements of marital fidelity); finally, content validity refers
to the degree to which a measure covers all the meanings of a concept (e.g.
racism as all kinds of racism, against women, ethnic groups, etc.).
Note that reliability is
all in all an easier requirement, while on validity we are never sure.
Note also the tension between reliability and validity, often there is
a trade-off between the two (e.g. compare in-depth interviewing with questionnaire
surveys).
3. Operationalization
Operationalization is the
specification of specific measures for concepts in a research (the determination
of indicators). Some guidelines: be clear about the range of variation
you want included (e.g. income, age), the amount of precision you want,
and about the dimensions of a concept you see relevant.
In addition, every variable
should have two qualities: 1) exhaustive: all the relevant attributes of
a variable must be included (e.g. the magical 'other' category is best
not too big), and 2) attributes should be mutually exclusive (e.g. whether
a person is unemployed or employed is not exclusive, since some people
can be part-time employed and part-time unemployed).
Variables are 1) nominal,
when there attributes indicate different, mutually exclusive and fully
exhausted qualities (e.g. sex: male or female); 2) ordinal, when the attributes
can also be ranked in an order (e.g. type of education); 3) interval, when
the distance between attributes in an order is precise and meaningful (e.g.
IQ test); and 4) ratio, when, in addition, these attributes have a true
zero-point (e.g. age). Note that variables do usually not in and by themselves
indicate whether they are nominal, ordinal, etc., or that you can convert
them from one type to another (e.g. dummy-variables, from nominal to metric).
Finally, note that you can
use one or multiple indicators for a variable; sometimes even, a composite
measurement is necessary. (note: see questionnaire design for an application
of operationalization).
4. Indexes, Scales and Typologies
There are commonalities between
indexes and scales: they both typically involve ordinal variables, and
they are both composite measures of variables.
An index is constructed by
accumulating scores assigned to individual attributes. The requirements
of scales are: face validity (each item should measure the same attribute),
unidimensionality (only one dimension should be represented by the composite
measure). Then you consider all the bivariate relationships between the
items in the scale, the relationship should be high, but not perfect
A scale is constructed by
accumulating scores assigned to patterns of attributes. The advantage is
that it gives an indication of the ordinal nature of the different items,
one item is in a sense included in the other (higher ranked).
A typology is a break-down
of a variable into two or more. As dependent variables this is a difficult
thing, since any one cell in the typology can be under-represented (it's
best then to undertake a new analysis, making sure each cell is well represented).
C. Causal Modelling
1. Assumptions of Causal
Inquiry
The first step in causal
modelling involves conceptualization: what are the relevant concepts, and,
second, how to operationalize these concepts. The next step is formalization,
i.e. specification of the relationships between the variables. This seems
to destroy the richness of the theory, but it helps to achieve comprehensibility
and avoids logical inconsistencies. Note that this model is ideally based
on a deductive approach, but it does not exclude a more dynamic approach
which moves back and forth (from theory to data).
The causal model itself specifies
not only the direction (from X to Y) but also the sign of the relationship
(positive or negative). A positive relationship means that when X goes
up, Y goes up; a negative relationship between X and Y means that as X
goes up why goes down. between different paths, the signs should be multiplied
to determine the net-effect. A causal system is consistent when all the
causal chains push the relationship in the same direction (indicated by
the fact that all the signs are the same). When some signs are positive,
others negative, the system is inconsistent (suppressors).
Please note that the causality
is not in reality (perhaps it is), but it is above all put into the model
by virtue of the theory. This involves a notion of determinism (for the
sake of the model), and that we stop some place in looking for any more
causes or effects. Also note that the variables in a causal model are all
at the same level of abstraction (ideally).
Causal explanations can be
idiographic or nomothetic: 1) idiographic explanations seek to explain
a particular events in terms of all its caused (deterministic model); 2)
nomothetic explanations seek to explain general classes of actions or events
in terms of the most important causes (probabilistic model).
2. Causal Order: Definitions
and Logic
Prior (unknown or not considered)
variables precede the independent variable. Intervening variables are located
in between the independent and dependent variable. Consequent variables
are all variables coming after the dependent variable (unknown or not considered).
Note that the identification of prior, independent, intervening, dependent,
and consequent variables is relative to the model at hand.
The causal order between
a number of variables is determined by assumptions that determine the causal
system that determines the relationship between those variables. (note
that variables in a loop have no order, i.e. when the path from X away
to other variables returns from those variables back to X).
The following possibilities
can be distinguished:
- X causes Y
- X and Y influence eachother
- X and Y correlate
Variable X causes variable Y,
when change in X lead to change in Y, or when fixed attributes of X are
associated with certain attributes of Y. This implies, of course, that
we talk about certain tendencies: X is a (and not the) cause. And this
implies correlation as a minimum, necessary condition (the causation itself
is theoretical).
3. Minimum-Criteria for Causality
Rule 1: Covariation
Two variables must be empirically
correlated with one another, they must co-vary, or one of them cannot have
caused the other. This leads to distinguish direct from indirect effects.
Rule 2: Time-order
When Y appears after X, Y
cannot have caused X, or in other words, the cause must have preceded the
effect in time. Derivative from this is the rule that when X is relatively
stable, hard to change, and fertile (it produces many other effects), it
is likely to be the independent variable.
Rule 3: Non-Spuriousness
When the observed correlation
between two variables is the result of a third variable that influences
both of those two separately, then the correlation between the two is spurious.
This is indicated by a variable having a causal path to the two variables
that correlate.
Basic to causality is the
control of variables. Most ideally, this is done by randomization in experiments,
then the attributes of any prior variables are randomly distributed over
the control and the experimental group. We can also purposely control for
prior variables when we select the ones we consider relevant. In bivariate
relationships, no variables are controlled, while in partial relationships,
one or more of the prior and intervening variables, that might interfere,
are controlled. It is better still to identify the necessary and sufficient
causes of certain effects but usually we are pleased with either one.
Some common errors are: biased
selection of variables to be included in the model, unwarranted interpretation,
suppression of evidence, and so on. It is interesting to see the different
steps involved in a typical causality-type research and what can go wrong
at each step. First, from theory to conceptualization, this step is rarely
clear-cut. Second, the step into operationalization is in a way always
arbitrary (since the concept indicates more than any measurement). Third,
the empirical associations found between measured variables is rarely,
if ever, perfect. Finally, any measurement therefore requires additional
studies, and any conclusion is in principle falsifiable (variables are
shown to be associated, but then the question is how they are associated).
Strategies for causal analysis:
- When a bivariate non-zero relationship between X and Y is reduced to
zero under control of a third variable, then the third variable explains
the bivariate relationship, or the relationship is spurious (causality
can never be proven by data analysis); - Check out for the effect of prior
variables; - Path analysis.
D. Sampling Procedures
Sampling refers to the systematic
selection of a limited number of elements (persons, objects or events)
out of a theoretically specified population of elements, from which information
will be collected. This selection is systematic so that bias can be avoided.
Observations are made on observation units, which can be elements (individuals)
or aggregations of elements (families). A population is theoretically constructed
and is often not directly accessible for research. Therefore, the study
population, the set of elements from which the sample is actually selected,
can (insignificantly) differ from the population. In multi-stage samples,
the sampling units refer to elements or sets of elements considered for
selection at a sampling stage. The sampling frame is the actual list of
sampling units from which the samples are selected.
The sampling procedures are
designed to best suit the collection of data, i.e. to measure the attributes
of the observation units with regard to certain variables. Depending on
theoretical concerns and choice of method, probability or non-probability
sampling designs are appropriate in research.
1. Probability Sampling
Probability sampling is based
on principles of probability theory which state that increasing the sample
size will lead the distribution of a statistic (the summary description
of a variable in the sample) to more closely approximate the distribution
of the parameter (the summary description of that variable in the population).
The standard error, inversely related to sample size, indicates how closely
a sample statistic approximates the population parameter. These conditions
are only met when samples are randomly selected out of a population, i.e.
when every element in the population has an equal chance of being selected
in the sample.
A randomly selected sample
of sufficiently large size (absolute size, not size proportionate to the
population) is assumed to be more representative for the population because
the relevant statistics will more closely approximate the parameters, or
the findings in the sample are more generalizable to the population. Representativeness
of samples, or generalizability of sample findings, both matters of degree,
are the main advantages of probability sampling designs. The accuracy of
a sample statistic is described in terms of a level of confidence with
which the statistic falls within a specified interval from the parameter
(the broader the interval, the higher the confidence). The main disadvantage
of probability sampling is that the theoretical assumptions (of infinity)
never "really" apply.
a) Simple Random Sampling
In simple random sampling,
each element is randomly selected from the sampling frame. Example: in
an alphabetical list of all students enrolled at CU-Boulder, each student
is given a number ascending from 1, and 400 students are selected using
a table of random numbers.
b) Systematic Sampling
In systematic sampling, every
kth element in a list is selected in the sample, the distance k indicating
the sampling interval. The systematic sample has a random start when the
first element is randomly chosen (out of numbers between 1 and k). Systematic
sampling has the advantage of being more practical but about as (sometimes
more) efficient than simple random sampling. A disadvantage is the danger
of an arrangement of elements forming a pattern that coincides with the
sampling interval. Example: in a list of all students enrolled at CU-Boulder,
each 100th student, starting with the randomly chosen 205th, is selected.
Later it turned out that every other student in the list was female (and
the entire sample female), since the composer of the list though "perfect
randomness" would lead to perfect probability samples.
c) Stratified Sampling
Stratified sampling is a
modification to the use of simple random and systematic sampling. It is
based on the principle that samples are more representative when the population
out of which they are selected is homogeneous. To ensure samples to be
more representative, strata of elements are created that are homogeneous
with respect to the (stratification) variables which are considered to
correlate with other variables relevant for research (the standard error
for the stratification variable equals zero). Example (stratified &
systematic): luckily we know how stupid composers of student lists are,
so we stratify students by sex (taking every other student in our "perfectly
randomized" list); we thus get two strata of students based on sex, and
select every 40th student in each stratum.
d) Cluster Sampling
In cluster sampling, clusters
of groups of elements are created, and out of each group, elements are
selected. This method is advantageous since often complete lists of the
population are unavailable. Cluster sampling is multi-stage when first
clusters are selected, then clusters within clusters (on the basis of simple
random or systematic sampling, stratified or not), and so on, up until
elements within clusters. While cluster sampling is more efficient, the
disadvantage is that there are sampling errors (of representativeness)
involved at each stage of sampling, a problem which is not only repeated
at each stage, but also intensified since sample size grows smaller at
each stage. However, since elements in clusters are often found to be homogeneous,
this problem can be overcome by selecting relatively more clusters and
less elements in each cluster (at the expense of administrative efficiency).
When information is available on the size of clusters (the number of elements
it contains), we can decide to give each cluster a different chance of
selection proportionate to its size (then selecting a fixed number within
each cluster). This method has the advantage of being more efficient: since
elements in clusters are typically more homogeneous, only a limited number
of elements for each cluster has to be selected. Finally, disproportionate
sampling can be useful to focus on any one sample separately, or for the
comparison of several samples. In this case, generalizability of sample
findings to the entire population should not and cannot be considered.
Example (multi-stage cluster,
proportionate to size, stratified): for research on political attitudes
of students in the USA, no list of all students are available, but we have
a list of all US states; we select a number of states (clusters); they
are given a chance of selection proportionate to the "size" of (number
of universities in) each state, because, for instance, there are more universities
in the north-eastern states (probability proportionate to size); out of
the selected states, we select cities (again proportionate to size, since
metropolitan areas have more universities), select universities out of
each selected city, take the student lists of each selected university,
and select a relatively small number of students (assuming homogeneity
among them since we know all students in Harvard are conservative and everybody
at CU-Boulder is a liberal).
2. Non-Probability Sampling
The choice between probability
or non-probability design is dependent on theoretical premises and choice
of method. While probability sampling can avoid biases in the selection
of elements and increase generalizability of findings (these are the two
big advantages), it is methodologically sometimes not feasible or theoretically
inappropriate to undertake them. Then non-probability samplings can be
used.
a) Quota Sampling
In quota sampling, a matrix
is created consisting of cells of the same attributes of different variables
known to be distributed in the population in a particular way. Elements
are selected having all attributes in a cell relative to their proportion
in the population (e.g. take 90% white and 10% black because based on census
data that is the racial composition of the entire population). Although
the information on which the proportionate distribution of elements is
based can be inaccurate, quota sampling does strive for representativeness
(but it is not based on probability theory).
b) Purposive Sampling
Purposive or judgmental sampling
can be useful in explorative studies or as a test of research instruments.
In explorative studies, elements can purposively be selected to disclose
data on an unknown issue, which can later be studied in a probability sample.
Questionnaires and other research instruments can be tested (on their applicability)
by purposively selecting "extreme" elements (after which a probability
sample is selected for the actual research).
c) Sampling by Availability
When samples are being selected
simply by the availability of elements, issues of representativeness about
the population cannot justifiably be addressed. A researcher may decide
to just pick any element that s/he bumps in to. As such, there is nothing
wrong with this method, as long as it is remembered that the selection
of samples may be influenced by dozens of biases and cannot be assumed
to represent anything more than the selected elements.
d) Theoretical Reasons for
Non-Probability Sampling
The previous non-probability
sampling designs are related to methodological concerns. In fact, the issue
of representativeness does matter in the background of these designs but
is conceived not feasible or, worse, purported as feasible but not founded
on probability theory. However, more interesting and scientifically valuable
are the non-probability sampling designs based on theoretical insight.
In some theoretical models, it is unwise to conceive the world in terms
of probability, sometimes even not as something to be sampled. (this is
a kind of purposive sampling, but now because of theoretical concerns).
First, in field research,
the researcher may be interested in acquiring a total, holistic understanding
of a natural setting. As such, there is no real sampling of anything at
all. However, since observations on "everything" or "everybody" can in
effect never be achieved, it is best to study only those elements relevant
from a particular research perspective (sometimes called "theoretical sampling"
or "creative sampling").
Second, when the elements
in a natural setting clearly appear in different categories, quota sampling
"in the field" can be used. This is the same as regular quota sampling,
but the decisions on relevant cells and proportions of elements in cells
are based on field observations.
Snowball sampling is used
when access to the population is impossible (methodological concern) or
theoretically irrelevant. The selection of one element leads to the identification
and selection of others and these in turn to others, and so on. (The principle
of saturation, indicating the point when no more new data are revealed,
determines when the snowball stops). Example (cluster and snowball): in
a study of drug-users in the USA, a number of cities (clusters) is randomly
selected, a drug-user is selected in each city (e.g. through clinics),
is interviewed and asked for friends that use drugs too, and so on. Example
(snowball): a researcher is interested in African-American HIV infected
males in Hyde Park, Chicago; the research aims at in-depth understanding
of this setting, and inferences about other HIV infected males are trivial
(apart from being impossible).
Third, the sampling of deviant
cases can be interesting to learn more about a general pattern by selecting
those elements that do not conform to the pattern. Example: 99% of the
students at CU voted for Clinton, so I select those that did not, to find
out why they are "deviant".
These samples are purposive
samples with a theoretically founded purpose. As long as that is the case,
their use may be perfectly justified and, according to some theories, even
the only applicable ones. The main disadvantage of non-probability sampling
designs is the lack of representativeness for a wider population. But again,
based on some theories, these difficulties can precisely be advantages
(as long as the methodological and theoretical positions are clearly stated,
both probability and non-probability sampling designs can be equally "scientific").
II. METHODS OF OBSERVATION
A full research design is
not just a matter of determining the right methods of observation, there
is always (or there better be) theory first. The following procedure can
be suggested.
First, there should be a
theory that states what is to be researched, and how this connects to the
already available body of literature (to ensure, or strive towards, cumulative
knowledge). There is no "naked" or mind-less observation.
Second, the theory has to
be conceptualized, so that the different variables of the theory are clearly
defined and identified. This may also involve acknowledgment of the limitations
of the approach.
Third, the research topic
and methodology is formalized into observable phenomena. This involvers
specification of the research topic (where, when) and the methods of observation
(how) as well as the way in which the data are to be analyzed, and what
the anticipated findings are.
Finally, after the research
is conducted, a report is drawn up, indicating theory, methodology, as
well as findings.
A. Experimental Designs
The most important issue
in an experiment is randomization (as a matter of internal validity). There
are issues of internal and external validity, and the problems and solutions
of external validity. Note the strength and limitations with regard to
the control of variables, i.e. all the variables we know might interfere.
1. The Structure of Experiments
A classical experiment involves
four basic components.
1) An experiment examines
the effect of an independent variable on a dependent variable. Typically,
a stimulus is either absent or present. In this way, a hypothesis on the
causal influence between two variables can be tested (see logic of causal
modelling). Both variables are, of course, operationalized.
2) An experiment involves
pretesting and posttesting, i.e. the attributes of a dependent variable
are measured, first before manipulation of the independent variable, and
second after the manipulation. Of course, applied to one group, this may
affect the validity of the results, since the group is aware of what is
being measured (research affects what is being researched).
3) Therefore, it is better
to work with experimental groups and control groups. We select two groups
for study, then apply the pretesting-posttesting, and thus conclude that
any effect of the tests themselves must occur in both groups. There can
indeed be a Hawthorne effect, i.e. the attention given to the group by
the researchers affects the group's behavior. Note that there can also
be an experimenter bias, which calls for accurate observation techniques
of the expected change in the dependent variable.
4) Selecting Subjects:
Note that there can always
be some bias because often students are selected (problem of generalizability).
Also, note that samples of 100 or not very representative, and that experiments
often have fewer than 100 subjects.
Randomization refers to the
fact that the subjects (which are often non-randomly selected from a population)
should be randomly assigned to either the experimental or the control group.
This does not ensure that the subjects are representative of the wider
population from which they were drawn (which they usually are not), but
it does ensure that the experimental and the control group are alike, i.e.
the variables that might interfere with the results of the experiment will,
based on the logic of probability, be equally distributed over the two
groups. Note that randomization is related to random-sampling only in the
sense that it is based on principles of probability (the two groups together
are a "population", and the split into two separate groups is a random-sampling
into two samples that mirror eachother and together constitute this "population").
Matching refers to the fact
that subjects are purposely assigned by the researcher to either the control
or the experimental group on the basis of knowledge of the variables that
might interfere with the experiments. This is based on the same logic as
quota sampling. Matching has the disadvantage that the relevant variables
for matching decisions are often not all known, and that data analysis
techniques assume randomness (therefore, randomization is better).
Finally, the experiments
should be conducted in such a way that the only difference between the
experimental and the control group is the manipulation of a variable during
the experiment.
Taken together, randomization
or matching, and the fact that the manipulation during experimentation
is the only difference between the two groups, these techniques allow for
the control of all variables, other than the manipulated one, to interfere
in the outcome of the experiment (internal validity!).
Note on the One-Shot Case
Study:
A single group is manipulated
on an independent variable, and then measured on a dependent variable.
This method must involved pretest and posttest to be of any significance
(otherwise there is nothing to compare), i.e. the one-group pretest-posttest
design, but then we are not sure if it was the manipulated variable that
caused the observed difference.
2. Internal Validity and
External Validity
a) Internal Validity: did
the experimental treatment cause the observed difference?
The problem of internal validity
refers to the logic of design, the fact whether other variables that may
intervene were controlled, i.e. the integrity of the study. The problem
can be that the conclusions of an experiment are not warranted based on
what happened during the experiment. This can come about because of: a)
accident: historical events can have occurred during the experiment and
affected its outcome; b) time: people change, mature, during the period
of experimentation; c) testing: the groups are aware of what is being researched;
d) instrumentation: the techniques to measure pretest and posttest results
are not identical (reliability); e) statistical regression: results are
biased because the subjects started with extreme values on a variable;
and f) other problems include, that the relationships are temporal but
not causal, and that the control group may be frustrated or stuff.
Randomization of subjects
into an experimental and a control group (to ensure that only the experimental
manipulation intervened, while other variables are controlled), and reliable
measurements in pretest and posttest are guards against problems of internal
validity.
b) External Validity: are
the results of the experiment generalizable?
The problem of external validity
refers to the issue of generalizability: what does the experiment, even
when it is internally valid, tell us about the real, i.e. non-manipulated,
world?
A good solution is a four-group
experimental design, i.e. first an experimental and a control group with
pretest and posttest, and second, an experimental and a control group with
posttest only. And better than anything else is a two-group design with
posttest only when there is good randomization, since randomization ensures
that all variables are evenly distributed between experimental and control
group so that we do not have to do a pretest.
An experimental manipulation
as close as possible to the natural conditions, without destroying internal
validity, are the best methods to ensure external validity.
c) Note on Ex-Post Facto
Experiment
This is not a true experiment
since there is (was) no control group. The manipulation of the independent
variable has naturally occurred (e.g. earthquake). We are of course not
sure, say when we compare with a group were the natural "manipulation"
did not take place, that there are (or are not) other variables involved
(very bad on the control of variables).
3. Advantages and Disadvantages
of Experiments
The isolation of the one
crucial variable, when all others are controlled, is the main advantage
of experiments (it can lead to hypothesis falsification). Experiments are
well-suited for projects with clearly defined concepts and hypotheses,
thus it is the ideal model for causality testing. It can also be used in
the study of small-group interaction, possibly in a field research, i.e.
as a natural experiment. Experiments can also be repeated.
The big disadvantage is the
artificial character of the research, and, in the social sciences, they
often involve ethical difficulties, or can simply not be executed.
B. Survey Research
Note on quantification, which
is quite essential in survey research, that numbers are representations
of..., they are created, they represent something, so do not reify them
(e.g. they are limited to the sample, and therefore to the sampling procedure
- typically a probability sample design). You have to know the process
that created the numbers or you cannot make any inferences. The powers
of the analytical tools (quantitative data analysis) should not be abused.
Note that quantitative methods are generally better on matters of reliability,
while qualitative methods are better on validity.
The main advantage of survey
research is of course the generalizability of its findings because of the
representativeness of the sample (see sampling - as a matter of external
validity). Note that a pre-test of the questionnaire is always, I said
always, necessary (as a matter of validity).
1. The Questionnaire
Survey research typically
involves administering a questionnaire to a sample of respondents to draw
conclusions on the population from which the sample is drawn. The questionnaire
is standardized to ensure that the same observation method is used on all
respondents. This involves considerations of questionnaire construction,
question wording, and the way in which the questionnaire is administered
to the respondents.
a) Questionnaire Construction
In the construction of the
questionnaire, attention is devoted to increase the respondents' cooperation
and avoid misunderstanding of the questions. First, the questionnaire format
should be presentable, not too densely packed, and clear. This involves
using intelligible contingency ("if no/yes go to...") questions, or matrix
questions that contain al the items or response options to a question.
Second, the effects of question order have to be considered, and this can
be pre-tested with different questionnaires, and by being sensitive to
the research problem. Third, clear instructions on how to answer the questions
should be given, and it is best to divide the questionnaire into different
sections that are each preceded with instructions.
b) Question Wording
The question wording should
equally enhance the unambiguous nature of the questionnaire. Several options
are available depending on the research perspective: attitudes, for instance,
can be measured with Likert scale questions (variation from strongly disagree
to strongly agree). Questions can also be open-ended (and coded by the
researcher for analysis) or closed-ended (an exhaustive list of mutually
exclusive alternatives). Note that open-ended questions may pose problems
for analysis (too many responses), while closed-ended questions may impose
too rigid a framework on the respondents. Also, each statement should not
be too long, not negatively phrased, and posed in neutral, unambiguous
terms to avoid social desirability effects and bias in any one (pro/con)
direction. Also avoid double-barreled questions, and make sure to ask comprehensible
and relevant questions.
2. The Administration of
a Questionnaire
Questionnaires can be administered
in a variety of ways.
a) Self-Administered Questionnaire
In this type of survey, respondents
fill out a questionnaire delivered to them by mail, taking precautions
to ensure a sufficiently high response rate, or they can be delivered "on
the spot", e.g. in a factory or school. The basic problem is the monitoring
of returns, which have to be identified, i.e. you have to make up a return
graph to indicate the response rate (over 50%), and you have to send follow-up
mailings to non-respondents.
b) Interview Survey
In a (more time-consuming
and expensive) interview survey, sensitive and complicated issues can be
explored face-to-face. This method also ensures a higher response rate,
and a reduction of "don't know" answers. The interviewer has more control
over the data collection process (note that observations can be made during
the interview) and can clarify, in a standardized way, unclear questions.
Since the questionnaire is the main measurement instrument, the interviewer
must make sure that the questions have identical meaning to all respondents:
interviewers should (and are trained to) be familiar with the questionnaire,
dress like the respondents, behave in a neutral way during the interview,
follow the given question wording and order, record the answers exactly,
and probe for answers. Interview surveys typically have a higher response
rate (affecting generalizability).
c) Telephone Survey
A questionnaire conducted
by telephone is a cheaper and less time-consuming method, one moreover
in which the researcher can keep an eye on the interviewers, but one on
which the respondents can also hang up.
3. Advantages and Disadvantages
of Survey Research
Survey research generally
has the advantage that, depending on the research objective, it can serve
descriptive, explanatory, as well as exploratory purposes. But more important
than anything else, depending on sampling techniques, it can generalize
findings to large populations, while the standardization of the questionnaire
(and the way it is administered) ensures reliability of the measurement
instrument. In addition, many respondents can be researched, relatively
many topics can be asked about them (flexibility), and statistical techniques
allow for accurate analysis. Note that pre-collected data can also be analyzed
for a different purpose (secondary data-analysis).
The main weakness of survey
research is its rather superficial approach to social life: because all
subjects are treated in a unified way, the particularities of each cannot
be explored in any great detail, and no knowledge is acquired of the social
context of the respondents' answers. Also, surveys measure only answers,
and not what this actually refers to (you know whether a person has responded
to be "conservative" but not whether s/he is). Next, surveys are not so
good in measuring action, but rather thoughts about action. This raises
questions of validity: perhaps the questionnaire does not reveal anything
"real", that is, anything of genuine concern for the respondents themselves.
C. Field Research
While surveys typically produce
quantitative data, field research yields qualitative data. Also notice
how field-research often not only produces data but also theory (alternation
of deduction and induction).
1. Entering the Field
Depending on sampling procedure,
a research site is selected and observations will be made and questions
asked within the natural setting.
a) The Role of the Field
Researcher
1) complete participant:
the researcher is covertly present in the field and fully participates
as if he is a member of the community under investigation; the problems
are ethical, your mere presence might affect what goes on, and there are
practical problems (e.g. when and how to leave the field?); 2) participant-as-observer:
the
researcher participates yet his identity is known; 3) observer-as-participant:
the researcher observes and his identity is known; the latter two, since
identity is known, may affect what's going on in the field, and it could
cause the researcher to be expelled from the field; 4) complete observer:
the researcher merely observes and his identity is not known.
b) Preparing for the Field
and Sampling in the Field
Start with a literature review
(as always), then research yourself, why are you interested?, what will
you bring to the field?, etc. Then search for informants, gate-keepers,
and make a good impression (or simply join the group you want to study).
Establishing rapport is very important, and if your identity is known,
it is important to tell them what you are there for (although you may choose
to lie). Then sample in the field (see above). Remember that the overall
goal of field research is to acquire the richest possible data.
2. In-Depth Interviewing
a) In-Depth Interviewing
versus Questionnaire
While standardized questionnaires
are typically, though not necessarily, employed in quantitative research,
in-depth or unstructured interviewing is closely associated with qualitative
field research. Like any interview, an in-depth interview can be defined
as a "conversation with a purpose": an interview involves a talk between
at least two people, in which the interviewer always has some control since
s/he wants to elicit information. In survey interviews, the purpose of
the conversation is dominant, especially when it involves the testing of
hypotheses (a relationship between two or more variables). In-depth interviewing,
in comparison, takes the "human element" more into account, particularly
to explore a research problem which is not well defined in advance of the
observation process. In-depth interviewing does not use a questionnaire,
but the interviewer has a list of topics (an interview-guide) which are
freely explored during the interview, allowing the respondent to bring
up new issues that may prove relevant to the interviewer. The in-depth
interviewer is the central instrument of investigation rather than the
interview guide.
b) Procedure of In-Depth
Interviewing
The procedure of in-depth
interviewing first involves establishing a relationship with the respondent:
even more than is the case with questionnaires, it is crucial that the
interviewer gains the trust of the respondent, otherwise the interview
will hardly reveal in-depth insight into the respondent's knowledge of,
and attitudes towards, events and circumstances. Since the kind of information
elicited in the interview is not pre-determined in a questionnaire, tape-recording
(and negotiation to get permission) is appropriate. The role of the in-depth
interviewer involves a delicate balance between being active and passive:
active because s/he guides the respondent tactfully to reveal more information
on an issue considered relevant, passive because the interviewer leaves
the respondent free to bring up issues that were unforeseen but nevertheless
turn out to be relevant. Since the interviewer should talk, listen, and
think during the interview, his/her experience and skill greatly contributes
to the quality of the research findings. Note that in a field research,
the interview can be formal or informal: in formal in-depth interviewing
the researcher's identity is known and the respondent knows that an interview
is going on, while an informal in-depth interview appears to be (to the
respondent) just a conversation with someone (who is actually a covert
researcher).
c) Characteristics of In-depth
Interviewing
In-depth interviewing has
the advantage of being able to acquire a hermeneutic understanding of the
knowledge and attitudes specific to the respondent (without an "alien",
super-imposed questionnaire). It is often called a more valid research
method. However, this assertion needs qualification: both in-depth and
survey interviews approach human subjects with a perspective in mind, but
only in in-depth interviewing is this perspective amenable to change (given
the quest for what is unique to the person being interviewed), while in
surveys it is not allowed to change (given the quest for generalizability
of the findings). During a research process involving several in-depth
interviews, the "big wheel of science" can freely rotate between induction
and deduction (finding new things and asking about them, cf. grounded theory).
In addition, the method is beneficial for explorative research on a (sociologically)
new issue. The main weakness of in-depth interviewing is its lack of reliability:
without a fixed questionnaire, the interviewer's flexibility, while allowing
for new information, may affect the research findings, not because of respondents'
characteristics, but because of the different ways in which they were interviewed.
Since in-depth interviewing often does not rely on random sampling of respondents,
issues of generalizability cannot (but often do not have to) be addressed.
Finally, the results of in-depth interviews are harder to analyze than
survey questionnaire findings, since they cannot easily be transferred
into numbers (allowing for statistical analysis) but have to be brought
together comprehensively in meaningful categories that do not destroy the
uniqueness of the findings (the recent use of computerized techniques of
qualitative data-analysis is helpful in this regard).
3. Making Observations
In your observations, be
sure to see as much as you can and to remain open-minded on what you see;
you want to understand, not to condemn or approve. Once you have taken
up your role, do not get over-involved, nor completely disengaged.
Very important is to record
what you observe accurately, and best as soon as possible after the event
occurred. Therefore, you should keep a field journal (or tape). Field notes
include what is observed and interpretations of what is observed. Also,
keep notes in stages, first rather sketchy and then more in detail. Finally,
keep as many notes as you can (anything can turn out to be important).
Apart from that, a separate file can be kept on theoretical and methodological
concerns, as well as reports of the researcher's own personal experiences
and feelings.
As an initial step for analysis,
the notes must be kept in files (with multiple entries), to discover patterns
of behavior or practices, instances of attitudes and meanings of events
for the observed, encounters of people in interaction, episodes of behavior
(in which a sudden event can be crucial), and roles, lifestyles and hierarchies.
These analytically conceived files should keep the chaos of observation
together. Be flexible about your files.
The analysis itself can then
proceed to discover similarities and differences: what re-appears in the
field, which events seem to indicate the same pattern of behavior or thought,
as well as what is "deviant" in the research site, and so on. Note, of
course, that it is typical for field research that observing, formulating
theory, evaluating theory, and analyzing data, can all occur throughout
the research process.
Important tools to avoid
problems of mis-interpretation or biased observations include: add quantitative
findings to your field observations (triangulation), keep in touch with
a supervisor, and ensure your self-awareness (introspection).
In writing up the report,
an account of the method of observation and/or participation, as well as
reflections of the researcher's experiences and motives are inevitable.
4. Advantages and Disadvantages
of Field Research
Field research is especially
appropriate if you want to research a social phenomenon as completely as
possible (comprehensiveness), within its natural setting, and over some
period of time. Also, the method is flexible and can move freely from induction
to deduction, it is relatively inexpensive.
With regard to validity,
field research is generally stronger than survey research. But as a matter
of reliability, the method may be too much tied up to the person that did
the research (which is why their methods and experiences have to be reported
and evaluated). Finally, field research lacks generalizability, because
of the uniqueness of the researcher's investigative qualities, because
the comprehensiveness of research essentially excludes generalizability,
and because of selectivity in observations and question asking. Therefore,
the findings of field research are suggestive (not definitive).
D. Unobtrusive Research
Survey research and in-depth
interviewing affect their object of study in at least (and hopefully only)
one way: people are confronted with social-science research! Unobtrusive
methods of inquiry, on the other hand, have no impact on what is being
studied. There are three methods of unobtrusive research: content analysis,
analysis of statistics, and historical analysis.
1. Content and Document Analysis
Content analysis refers to
the quantitative study of written and oral documents. This requires sampling
of the units of analysis in a source (best probability sampling), codification
of the units, and finally classification of the units to reveal their manifest
and latent content.
Document analysis refers
to the qualitative study of traces of the past: it involves the in-depth
investigation of sources and aims at hermeneutic understanding.
2. Historical Analysis
Historical research refers
to the study of the past through an examination of the traces the past
has left behind (written documents, oral histories, and artefacts). The
procedure of historical research typically involves: 1) selection of sources
relevant for research; 2) identification and registration of sources according
to formal and substantial criteria; 3) confrontation and (internal/external)
critique of sources; 4) interpretation and analysis of sources to determine
who said what to whom, why, how, and with what effect.
Three methods of data collection
can be used in historical research (note that these methods do not have
to be, but can be historical): content analysis, document analysis, and
historical study of statistics. The historical investigation of statistics
can trace a pattern over time (e.g. crime reports). Of course, you are
again stuck to what you found (validity!).
See Comparative
and Historical Sociology: Lecture Notes for much more.
3. Advantages and Disadvantages
of Unobtrusive Research
The unobtrusive nature of
research is the main advantage of the method: the researcher cannot affect
what has happened. Several topics can be studied from this perspective,
particularly forms of communication (who says what to whom, why and with
what effect). Note that the techniques can be very rigidly applied (good
on reliability). Also, it has the advantage that it saves time and money,
and you can study long periods of time. Moreover, unobtrusive historical
research can fulfill several purposes: 1) the parallel testing of theories,
to apply a theory to several historical cases; 2) the interpretation of
contrasting contexts, to reveal the particularities of historical events;
and 3) analyzing causalities, to explain why historical events took place.
The main weakness of historical
research is the historical fact that it is probably the least developed
method of social-science research. Although many reputed sociologists used
historical research methods (e.g. Durkheim on the division of labor, Marx
and Weber on capitalism, Merton on science and technology), the idea that
a study of the past can be meaningful in and by itself, or to grasp the
present, only rarely inspires research. In addition, historical research
can only reveal the past inasmuch as it is still present today: important
documents, for instance, may be lost or destroyed (bad on validity). Finally,
because of the often less rigid nature of this method of inquiry, the researcher
can (invalidly) affect his/her picture of what has happened. Therefore,
corroboration, the cross-checking of various sources, is helpful.
E. Evaluation Research
Evaluation research is intended
to evaluate the impact of social interventions, as an instance of applied
research, it intends to have a real-world effect.
Just about any topic related
to occurred or planned social intervention can be researched. Basically,
it intends to research whether the intended result of an intervention strategy
was produced.
1. Measurement in Evaluation
Research
The basic question is coming
to grips with the intended result: how can it be measured, so the goal
of an intervention program has to be operationalized for it to be assessed
in terms of succes (or failure).
The outcome of a program
has to be measured, best by specifying the different aspects of the desired
outcome. The context within which an outcome occurred has to be analyzed.
The intervention, as an experimental manipulation, has to be measured too.
Other variables that can be researched include the population of subjects
that are involved in the program. Measurement is crucial and therefore
new techniques can be produced (validity), or older ones adopted (reliability).
The outcome can be measured
in terms of whether an intended effect occurred or not, or whether the
benefits of an intervention outweighed the costs thereof (cost/benefit
analysis). The criteria of success and failure ultimately rest on an agreement.
The evaluation can occur
by experiment, or by quasi-experiment. Time-series analysis, for instance,
can analyze what happened for a longer period before and after an intervention,
and with the use of multiple time-series designs, we can also compare with
a pseudo control group.
2. The Context in Evaluation
Research
There are a number of problems
to be overcome in evaluation research. First, Logistical problems refer
to getting the subjects to do what they are supposed to do. This includes
getting them motivated, and ensuring a proper administration. Second, ethical
problems include concerns over the control group (which is not manipulated,
and whose members may experience deprivation).
It is hard to overlook what
is done with the findings of an evaluation research, for instance, because
the findings are not comprehensible to the subjects, because they contradict
'intuitive' beliefs, or because they run against vested interests.
Note social indicators research
as a special type of evaluation research. This is the analysis of social
indicators over time (pattern of evolution) and/or across societies (comparison).
These indicators are aggregated statistics that reflect the condition of
a society or a grouping.
3. Advantages and Disadvantages
of Evaluation Research
The main advantage is that
evaluation research can reveal whether policies work, or at least identify
when they do not work (pragmatism), right away (when we use experiments)
or over a long period of time and across societies (indicators). (different
research instruments can be used in evaluation research)
The disadvantages include
the special logistic and administrative problems, as well as the ethical
considerations. Also, it can usually only measure the means, given certain
program goals, but cannot go into questioning those goals themselves.