Difference between Validity and Reliability

When the variables embedded in hypotheses or otherwise relevant for a particular research are operationally defined, their meanings may undergo changes. If these changes are significantly large, tests of the relevant hypotheses may be rendered meaningless. It is, therefore, important that the operationally defined measures, whether direct or indirect, satisfy certain stipulated properties. Two most important properties against which the success or failure of the measures are judged are validity and reliability.

Validity

A measure is valid if it measures what is supposed to measure. In case of direct measure is valid if it measures what it is supposed to measure. In case of direct measures, the validity is self-evident, while it is only approximate in case of direct measures- indexes and scales. In fact, there is no way to guarantee that an indirect measure is valid for measuring a concept. However, the researchers have devised way to deal with the issue.

To examine the validity of a measure, at least three different types of validity based on different perspectives are considered: face validity, predictive validity, and construct validity. Face validity implies that the items chosen to measure a variable are logically related to it. Suppose that the researcher is measuring the variable ‘religiosity’. In this case, items such as “Did you get your children vaccinated during the last six months?” or “Do you know the average family size in Guatemala?” do not seem, at least apparently, to be related logically to religiosity. A scale or index based on such items would not have face validity for measuring religiosity, although the same items may give rise to measure with a very high face validity for measuring some concept related to health status or population policy. The basic problem with using face validity to judge whether a measure is valid is that it is highly subjectively determined. It is possible that one researcher finds a measure as possessing high face validity while another researcher may find the same measure as possessing low face validity.

To examine whether a measure is valid or not, predictive validity is the most useful. A measure is said to have predictive validity if a high correlation can be demonstrated between the behavior predicted by the measure, and the behavior subsequently exhibited. For example, supposed that a researcher has developed a composite measure for the variable ‘religiosity’ on the basis of ten items derived from a universe of items though to reflect religiosity. Each individual in the sample has score on this composite measure that locates his position in the religiosity continuum. An individual with a high score is thought to have a high degree of religiosity. The religiosity of the same individuals is observed subsequently in practice. If it is observed that the individuals who scored high on the measure are also more religious in reality than those who scored low, the measure has predictive validity. It is to be noted that simply a high numerical association does not guarantee that the measure is measuring the variable: it simply provides support to the contention that is may be valid measure. Another problem is that the identification of a subsequent behavior as the predicted behavior of a respondent is subjective since many difference interpretations of the same behavior are possible, and the choice of one as the predicted behavior is a matter of subjectivity.

The third basic method of validity determination is the construct validity. In construct validity the researcher specifies the kinds of relationships he expects on the basis of theoretical considerations between the measure and other variables. He then correlates the scores based on the measure with those variables and compares those observed correlations with his expected relationships. A small difference between the observed and expected relationship would boost the confidence in the validity of the measure. For example, social status is expected to be positively correlated with occupation and education and negatively correlated with the number of children ever born. The correlations of the scores on social status scale with these variables in the expected directions will provide evidence in support of the validity of the measure of social status.

Reliability

If a measure, when applied repeatedly to the same object under constant conditions, produces the same result each time, it called reliable. The measure is unreliable if it produces different results. For example, suppose that the researcher is interested to study the attitude towards democracy of a number of newspapers. To measure this attitude, he can follow a number of procedures. One of these procedures may be that he reads the editorials of all the newspapers for a specific number of days, and on the basis of his judgement, rank-orders the newspapers according to the degree of prodemocracy attitude they possess.  This strategy has problems of reliability inherent in it. If several evaluators read the same editorials, they may draw conclusions different from each other: a newspaper that appears prodemocracy to one evaluator may not be so to the other. Thus, this procedure --- reading newspapers editorials --- of measurement of attitude towards democracy, when applied repeatedly may produce difference results. In other words, the conclusions are heavily dependent on subjectivity, and the measurement procedure is not reliable.

Another procedure may be to count the number of times the words “democracy”, “freedom”, and “liberal”, for example, appear in the newspaper editorials for a specified number of days. The assumption is that the greater this number in a newspaper editorial, the more prodemocracy the newspaper. This measurement procedure is more reliable since several evaluators may count this number over and over and still will make the same conclusions. It is to be noted that whether this number really measures the prodemocracy attitude or not, is a question of validity discussed earlier.

There are a number of techniques for determining the reliability of measure. These are: (1) test-retest, (2) parallel forms, and (3) split-half.

In the test-retest technique individuals are scored on the measure at a certain time. These are the test scores. The same individuals are scored for the same measure at some later date, to obtain retest scores. A high correlation (0.90 or more) between the two sets of scores would imply that the measure is reliable, on the assumption that no intervening variables interfered significantly with the retest scores --- an assumption that hardly holds in practice. There are always some interest events that may influence the retest scores.

continue-------------