When researchers describe study results as being generalizable they are referring to a study that has high?

In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation.

The goal of scientific research is to increase our understanding of the world around us. To do this, researchers study different groups of people or populations. These populations can be as small as a few individuals from one workplace or as large as thousands of people representing a cross-section of Canadian society. The results of this research often provide insights into how work and health interact in those groups. But how do we know if a study's results can be applied to another group or population?

To answer this question, we first need to understand the concept of generalizability.

In its simplest form, generalizability can be described as making predictions based on past observations.

In other words, if something has often happened in the past, it will likely occur in the future. In studies, once researchers have collected enough data to support a hypothesis, they can develop a premise to predict the outcome in similar circumstances with a certain degree of accuracy.

Two aspects of generalizability

Generalizing to a population. Sometimes when scientists talk about generalizability, they are applying results from a study sample to the larger population from which the sample was selected. For instance, consider the question, “What percentage of the Canadian population supports the Liberal party?” In this case, it would be important for researchers to survey people who represent the population at large. Therefore they must ensure that the survey respondents include relevant groups from the larger population in the correct proportions. Examples of relevant groups could be based on race, gender or age group.

Generalizing to a theory. More broadly, the concept of generalizability deals with moving from observations to scientific theories or hypotheses. This type of generalization amounts to taking time- and place-specific observations to create a universal hypothesis or theory. For instance, in the 1940s and 1950s, British researchers Richard Doll and Bradford Hill found that 647 out of 649 lung cancer patients in London hospitals were smokers. This led to many more research studies, with increasing sample sizes, with differing groups of people, with differing amounts of smoking and so on. When the results were found to be consistent across person, time and place, the observations were generalized into a theory: “cigarette smoking causes lung cancer.”

Requirements for generalizability

For generalizability we require a study sample that represents some population of interest — but we also need to understand the contexts in which the studies are done and how those might influence the results.

Suppose you read an article about a Swedish study of a new exercise program for male workers with back pain. The study was performed on male workers from fitness centres. Researchers compared two approaches. Half of the participants got a pamphlet on exercise from their therapist, and half were put on an exercise program led by a former Olympic athlete. The study findings showed that workers in the exercise group returned to work more quickly than workers who received the pamphlet.

Assuming the study was well conducted, with a strong design and rigorous reporting, we can trust the results. But to what populations could you generalize these results?

Some factors that need to be considered include: How important is it to have an Olympian delivering the exercise program? Would the exercise program work if delivered by an unknown therapist? Would the program work if delivered by the same Olympian but in a country where he or she is not well-known? Would the results apply to employees of other workplaces that differ from fitness centres? Would women respond the same way to the exercise program?

To increase our confidence in the generalizability of the study, it would have to be repeated with the same exercise program but with different providers in different settings (either worksites or countries) and yield the same results.

Source: At Work, Issue 45, Summer 2006: Institute for Work & Health, Toronto

Internal and external validity are concepts that reflect whether or not the results of a study are trustworthy and meaningful. While internal validity relates to how well a study is conducted (its structure), external validity relates to how applicable the findings are to the real world.

Internal validity is the extent to which a study establishes a trustworthy cause-and-effect relationship between a treatment and an outcome. Internal validity also reflects that a given study makes it possible to eliminate alternative explanations for a finding.

For example, if you implement a smoking cessation program with a group of individuals, how sure can you be that any improvement seen in the treatment group is due to the treatment that you administered?

Internal validity depends largely on the procedures of a study and how rigorously it is performed.

Internal validity is not a "yes or no" type of concept. Instead, we consider how confident we can be with the findings of a study, based on whether it avoids traps that may make the findings questionable.

The less chance there is for "confounding" in a study, the higher the internal validity and the more confident we can be in the findings. Confounding refers to a situation in which other factors come into play that confuses the outcome of a study. For instance, a study might make us unsure as to whether we can trust that we have identified the above "cause-and-effect" scenario.

In short, you can only be confident that your study is internally valid if you can rule out alternative explanations for your findings. As a brief summary, you can only assume cause-and-effect when you meet the following three criteria in your study:

The cause preceded the effect in terms of time.
The cause and effect vary together.
There are no other likely explanations for this relationship that you have observed.

If you are looking to improve the internal validity of a study, you will want to consider aspects of your research design that will make it more likely that you can reject alternative hypotheses. There are many factors that can improve internal validity.

Blinding: Participants—and sometimes researchers—who are unaware of what intervention they are receiving (such as by using a placebo in a medication study) to avoid this knowledge biasing their perceptions and behaviors and thus the outcome of the study
Experimental manipulation: Manipulating an independent variable in a study (for instance, giving smokers a cessation program) instead of just observing an association without conducting any intervention (examining the relationship between exercise and smoking behavior)
Random selection: Choosing your participants at random or in a manner in which they are representative of the population that you wish to study
Randomization: Randomly assigning participants to treatment and control groups, and ensures that there is not any systematic bias between groups
Study protocol: Following specific procedures for the administration of treatment so as not to introduce any effects of, for example, doing things differently with one group of people versus another group of people

Just as there are many ways to ensure that a study is internally valid, there is also a list of potential threats to internal validity that should be considered when planning a study.

Attrition: Participants dropping out or leaving a study, which means that the results are based on a biased sample of only the people who did not choose to leave (and possibly who all have something in common, such as higher motivation)
Confounding: A situation in which changes in an outcome variable can be thought to have resulted from some third variable that is related to the treatment that you administered.
Diffusion: This refers to the treatment in a study spreading from the treatment group to the control group through the groups interacting and talking with or observing one another. This can also lead to another issue called resentful demoralization, in which a control group tries less hard because they feel resentful over the group that they are in.
Experimenter bias: An experimenter behaving in a different way with different groups in a study, which leads to an impact on the results of this study (and is eliminated through blinding)
Historical events: May influence the outcome of studies that occur over a period of time, such as a change in the political leader or natural disaster that influences how study participants feel and act

Instrumentation: It's possible to "prime" participants in a study in certain ways with the measures that you use, which causes them to react in a way that is different than they would have otherwise.
Maturation: This describes the impact of time as a variable in a study. If a study takes place over a period of time in which it is possible that participants naturally changed in some way (grew older, became tired), then it may be impossible to rule out whether effects seen in the study were simply due to the effect of time.
Statistical regression: The natural effect of participants at extreme ends of a measure falling in a certain direction just due to the passage of time rather than the effect of an intervention
Testing: Repeatedly testing participants using the same measures influences outcomes. If you give someone the same test three times, isn't it likely that they will do better as they learn the test or become used to the testing process so that they answer differently?

External validity refers to how well the outcome of a study can be expected to apply to other settings. In other words, this type of validity refers to how generalizable the findings are. For instance, do the findings apply to other people, settings, situations, and time periods?

Ecological validity, an aspect of external validity, refers to whether a study's findings can be generalized to the real world.

While rigorous research methods can ensure internal validity, external validity, on the other hand, may be limited by these methods.

Another term called transferability relates to external validity and refers to a qualitative research design. Transferability refers to whether results transfer to situations with similar characteristics.

What can you do to improve the external validity of your study?

Consider psychological realism: Make sure that participants are experiencing the events of a study as a real event by telling them a "cover story" about the aim of the study. Otherwise, in some cases, participants might behave differently than they would in real life if they know what to expect or know what the aim of the study is.
Do reprocessing or calibration: Use statistical methods to adjust for problems related to external validity. For example, if a study had uneven groups for some characteristic (such as age), reweighting might be used.
Replicate: Conduct the study again with different samples or in different settings to see if you get the same results. When many studies have been conducted, meta-analysis can also be used to determine if the effect of an independent variable is reliable (based on examining the findings of a large number of studies on one topic).
Try field experiments: Conduct a study outside the laboratory in a natural setting.
Use inclusion and exclusion criteria: This will ensure that you have clearly defined the population that you are studying in your research.

External validity is threatened when a study does not take into account the interactions of variables in the real world.

Pre- and post-test effects: When the pre- or post-test is in some way related to the effect seen in the study, such that the cause-and-effect relationship disappears without these added tests
Sample features: When some feature of the particular sample was responsible for the effect (or partially responsible), leading to limited generalizability of the findings
Selection bias: Also considered a threat to internal validity, selection bias describes differences between groups in a study that may relate to the independent variable (once again, something like motivation or willingness to take part in the study, specific demographics of individuals being more likely to take part in an online survey).
Situational factors: Time of day, location, noise, researcher characteristics, and how many measures are used may affect the generalizability of findings.

Internal and external validity are like two sides of the same coin. You can have a study with good internal validity, but overall it could be irrelevant to the real world. On the other hand, you could conduct a field study that is highly relevant to the real world, but that doesn't have trustworthy results in terms of knowing what variables caused the outcomes that you see.

What are the similarities between internal and external validity? They are both factors that should be considered when designing a study, and both have implications in terms of whether the results of a study have meaning. Both are not "either/or" concepts, and so you will always be deciding to what degree your study performs in terms of both types of validity.

Each of these concepts is typically reported in a research article that is published in a scholarly journal. This is so that other researchers can evaluate the study and make decisions about whether the results are useful and valid.

The essential difference between internal and external validity is that internal validity refers to the structure of a study and its variables while external validity relates to how universal the results are. There are further differences between the two as well.

Internal Validity

Conclusions are warranted
Controls extraneous variables
Eliminates alternative explanations
Focus on accuracy and strong research methods

External Validity

Findings can be generalized
Outcomes apply to practical situations
Results apply to the world at large
Results can be translated into another context

Internal validity focuses on showing a difference that is due to the independent variable alone, whereas external validity results can be translated to the world at large.

An example of a study with good internal validity would be if a researcher hypothesizes that using a particular mindfulness app will reduce negative mood. To test this hypothesis, the researcher randomly assigns a sample of participants to one of two groups: those who will use the app over a defined period, and those who engage in a control task.

The researcher ensures that there is no systematic bias in how participants are assigned to the groups, and also blinds his research assistants to the groups the students are in during experimentation.

A strict study protocol is used that outlines the procedures of the study. Potential confounding variables are measured along with mood, such as the participants socioeconomic status, gender, age, among other factors. If participants drop out of the study, their characteristics are examined to make sure there is no systematic bias in terms of who stays in the study.

An example of a study with good external validity would be in the above example, the researcher also ensured that the study had external validity by having participants use the app at home rather than in the laboratory. The researcher clearly defines the population of interest and choosing a representative sample, and he/she replicates the study for different technological devices.

Setting up an experiment so that it has sound internal and external validity involves being mindful from the start about factors that can influence each aspect of your research.

It's best to spend extra time designing a structurally sound study that has far-reaching implications rather than to quickly rush through the design phase only to discover problems later on. Only when both internal and external validity are high can strong conclusions be made about your results.