Validity
A measure is valid if it measures
what is supposed to measure. In case of direct measure is valid if it measures
what it is supposed to measure. In case of direct measures, the validity is
self-evident, while it is only approximate in case of direct measures- indexes
and scales. In fact, there is no way to guarantee that an indirect measure is
valid for measuring a concept. However, the researchers have devised way to
deal with the issue.
To examine the validity of a measure,
at least three different types of validity based on different perspectives are
considered: face validity, predictive validity, and construct validity. Face validity
implies that the items chosen to measure a variable are logically related to
it. Suppose that the researcher is measuring the variable ‘religiosity’. In this
case, items such as “Did you get your children vaccinated during the last six
months?” or “Do you know the average family size in Guatemala?” do not seem, at
least apparently, to be related logically to religiosity. A scale or index
based on such items would not have face validity for measuring religiosity,
although the same items may give rise to measure with a very high face validity
for measuring some concept related to health status or population policy. The basic
problem with using face validity to judge whether a measure is valid is that it
is highly subjectively determined. It is possible that one researcher finds a
measure as possessing high face validity while another researcher may find the
same measure as possessing low face validity.
To examine whether a measure is valid
or not, predictive validity is the most useful. A measure is said to have
predictive validity if a high correlation can be demonstrated between the
behavior predicted by the measure, and the behavior subsequently exhibited. For
example, supposed that a researcher has developed a composite measure for the
variable ‘religiosity’ on the basis of ten items derived from a universe of items
though to reflect religiosity. Each individual in the sample has score on this
composite measure that locates his position in the religiosity continuum. An individual
with a high score is thought to have a high degree of religiosity. The religiosity
of the same individuals is observed subsequently in practice. If it is observed
that the individuals who scored high on the measure are also more religious in reality
than those who scored low, the measure has predictive validity. It is to be
noted that simply a high numerical association does not guarantee that the
measure is measuring the variable: it simply provides support to the contention
that is may be valid measure. Another problem is that the identification of a
subsequent behavior as the predicted behavior of a respondent is subjective
since many difference interpretations of the same behavior are possible, and
the choice of one as the predicted behavior is a matter of subjectivity.
The third basic method of validity
determination is the construct validity. In construct validity the researcher
specifies the kinds of relationships he expects on the basis of theoretical
considerations between the measure and other variables. He then correlates the
scores based on the measure with those variables and compares those observed
correlations with his expected relationships. A small difference between the
observed and expected relationship would boost the confidence in the validity
of the measure. For example, social status is expected to be positively
correlated with occupation and education and negatively correlated with the
number of children ever born. The correlations of the scores on social status
scale with these variables in the expected directions will provide evidence in
support of the validity of the measure of social status.
Reliability
If a measure, when applied repeatedly
to the same object under constant conditions, produces the same result each time,
it called reliable. The measure is unreliable if it produces different results.
For example, suppose that the researcher is interested to study the attitude
towards democracy of a number of newspapers. To measure this attitude, he can
follow a number of procedures. One of these procedures may be that he reads the
editorials of all the newspapers for a specific number of days, and on the
basis of his judgement, rank-orders the newspapers according to the degree of prodemocracy
attitude they possess. This strategy has
problems of reliability inherent in it. If several evaluators read the same
editorials, they may draw conclusions different from each other: a newspaper
that appears prodemocracy to one evaluator may not be so to the other. Thus,
this procedure --- reading newspapers editorials --- of measurement of attitude
towards democracy, when applied repeatedly may produce difference results. In other
words, the conclusions are heavily dependent on subjectivity, and the measurement
procedure is not reliable.
Another procedure may be to count the
number of times the words “democracy”, “freedom”, and “liberal”, for example,
appear in the newspaper editorials for a specified number of days. The assumption
is that the greater this number in a newspaper editorial, the more prodemocracy
the newspaper. This measurement procedure is more reliable since several
evaluators may count this number over and over and still will make the same
conclusions. It is to be noted that whether this number really measures the prodemocracy
attitude or not, is a question of validity discussed earlier.
There are a number of techniques for determining
the reliability of measure. These are: (1) test-retest, (2) parallel forms, and
(3) split-half.
In the test-retest technique individuals
are scored on the measure at a certain time. These are the test scores. The same
individuals are scored for the same measure at some later date, to obtain
retest scores. A high correlation (0.90 or more) between the two sets of scores
would imply that the measure is reliable, on the assumption that no intervening
variables interfered significantly with the retest scores --- an assumption
that hardly holds in practice. There are always some interest events that may
influence the retest scores.
