When a test measures what it was designed to measure and what it purports to measure this test is said to be?

Whenever a test or other measuring device is used as part of the data collection process, the validity and reliability of that test is important. Just as we would not use a math test to assess verbal skills, we would not want to use a measuring device for research that was not truly measuring what we purport it to measure. After all, we are relying on the results to show support or a lack of support for our theory and if the data collection methods are erroneous, the data we analyze will also be erroneous.

Test Validity.

Validity refers to the degree in which our test or other measuring device is truly measuring what we intended it to measure. The test question “1 + 1 = _____” is certainly a valid basic addition question because it is truly measuring a student’s ability to perform basic addition. It becomes less valid as a measurement of advanced addition because as it addresses some required knowledge for addition, it does not represent all of knowledge required for an advanced understanding of addition. On a test designed to measure knowledge of American History, this question becomes completely invalid. The ability to add two single digits has nothing do with history.

For many constructs, or variables that are artificial or difficult to measure, the concept of validity becomes more complex. Most of us agree that “1 + 1 = _____” would represent basic addition, but does this question also represent the construct of intelligence? Other constructs include motivation, depression, anger, and practically any human emotion or trait. If we have a difficult time defining the construct, we are going to have an even more difficult time measuring it. Construct validity is the term given to a test that measures a construct accurately and there are different types of construct validity that we should be concerned with. Three of these, concurrent validity, content validity, and predictive validity are discussed below.

Concurrent Validity. Concurrent Validity refers to a measurement device’s ability to vary directly with a measure of the same construct or indirectly with a measure of an opposite construct. It allows you to show that your test is valid by comparing it with an already valid test. A new test of adult intelligence, for example, would have concurrent validity if it had a high positive correlation with the Wechsler Adult Intelligence Scale since the Wechsler is an accepted measure of the construct we call intelligence. An obvious concern relates to the validity of the test against which you are comparing your test. Some assumptions must be made because there are many who argue the Wechsler scales, for example, are not good measures of intelligence.

Content Validity. Content validity is concerned with a test’s ability to include or represent all of the content of a particular construct. The question “1 + 1 = ___” may be a valid basic addition question. Would it represent all of the content that makes up the study of mathematics? It may be included on a scale of intelligence, but does it represent all of intelligence? The answer to these questions is obviously no. To develop a valid test of intelligence, not only must there be questions on math, but also questions on verbal reasoning, analytical ability, and every other aspect of the construct we call intelligence. There is no easy way to determine content validity aside from expert opinion.

Predictive Validity. In order for a test to be a valid screening device for some future behavior, it must have predictive validity. The SAT is used by college screening committees as one way to predict college grades. The GMAT is used to predict success in business school. And the LSAT is used as a means to predict law school performance. The main concern with these, and many other predictive measures is predictive validity because without it, they would be worthless.

We determine predictive validity by computing a correlational coefficient comparing SAT scores, for example, and college grades. If they are directly related, then we can make a prediction regarding college grades based on SAT score. We can show that students who score high on the SAT tend to receive high grades in college.

Test Reliability.

Reliability is synonymous with the consistency of a test, survey, observation, or other measuring device. Imagine stepping on your bathroom scale and weighing 140 pounds only to find that your weight on the same scale changes to 180 pounds an hour later and 100 pounds an hour after that. Base don the inconsistency of this scale, any research relying on it would certainly be unreliable. Consider an important study on a new diet program that relies on your inconsistent or unreliable bathroom scale as the main way to collect information regarding weight change. Would you consider their results accurate?

A reliability coefficient is often the statistic of choice in determining the reliability of a test. This coefficient merely represents a correlation (discussed in chapter 8), which measures the intensity and direction of a relationship between two or more variables.

Test-Retest Reliability. Test-Retest reliability refers to the test’s consistency among different administrations. To determine the coefficient for this type of reliability, the same test is given to a group of subjects on at least two separate occasions. If the test is reliable, the scores that each student receives on the first administration should be similar to the scores on the second. We would expect the relationship between he first and second administration to be a high positive correlation.

One major concern with test-retest reliability is what has been termed the memory effect. This is especially true when the two administrations are close together in time. For example, imagine taking a short 10-question test on vocabulary and then ten minutes later being asked to complete the same test. Most of us will remember our responses and when we begin to answer again, we may just answer the way we did on the first test rather than reading through the questions carefully. This can create an artificially high reliability coefficient as subjects respond from their memory rather than the test itself. When a pre-test and post-test for an experiment is the same, the memory effect can play a role in the results.

Parallel Forms Reliability. One way to assure that memory effects do not occur is to use a different pre- and posttest. In order for these two tests to be used in this manner, however, they must be parallel or equal in what they measure. To determine parallel forms reliability, a reliability coefficient is calculated on the scores of the two measures taken by the same group of subjects. Once again, we would expect a high and positive correlation is we are to say the two forms are parallel.

Inter-Rater Reliability. Whenever observations of behavior are used as data in research, we want to assure that these observations are reliable. One way to determine this is to have two or more observers rate the same subjects and then correlate their observations. If, for example, rater A observed a child act out aggressively eight times, we would want rater B to observe the same amount of aggressive acts. If rater B witnessed 16 aggressive acts, then we know at least one of these two raters is incorrect. If there ratings are positively correlated, however, we can be reasonably sure that they are measuring the same construct of aggression. It does not, however, assure that they are measuring it correctly, only that they are both measuring it the same.

In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation.

Free

10 Questions 10 Marks 12 Mins

Test/question paper works as a tool in order to measure students' level of knowledge. It also helps to adjust the learning material accordingly. Validity, reliability, and objectivity are the characteristic of tests.

Key Points

The above-mentioned statement is related to the test's validity as:

A measuring instrument possesses validity when it actually measures what it claims to measure.
For example, if a test is designed to measure aptitude, then it must measure aptitude and not personality, intelligence, or any other traits.
Validity generally refers to how accurately a conclusion, measurement, or concept corresponds to what is being tested.
It is defined as the extent to which an assessment accurately measures what it is intended to measure.

Hence, it could be concluded that it is the extent to which a test measure it purports to measure is validity.

Additional Information

Other qualities of a test:

Reliability:

One of the most important criteria for the quality of measurement is the reliability of the measuring instrument.
Reliability means consistency with which an instrument yields similar results.

Objectivity

Objectivity is also referred to as rater reliability. It affects both the validity and reliability of test scores.
The objectivity of a measuring instrument means the degree to which different persons scoring the answer receipt arrives at the same result.

India’s #1 Learning Platform

Start Complete Exam Preparation

Daily Live MasterClasses

Practice Question Bank

Mock Tests & Quizzes

Get Started for Free Download App

Trusted by 3.2 Crore+ Students