Best Practice: Small-N Research

BioSource Faculty
Jun 5
24 min read

We have based our Best Practice series on Dr. Beth Morling's Research Methods in Psychology (5th ed.). We encourage you to purchase it for your bookshelf. If you teach research methods, consider adopting this best-of-class text for your classes.

Dr. Beth Morling is a distinguished Fulbright scholar and was honored as the 2014 Professor of the Year by the Carnegie Foundation for the Advancement of Teaching.

With more than two decades of experience as a researcher and professor of research methods, she is an internationally recognized expert and a passionate advocate for the Research Methods course. Morling's primary objective is to empower students to become discerning critical thinkers, capable of evaluating research and claims presented in the media.

This post explores one of the most fascinating and flexible approaches in psychological research: small-N designs. These methods, covered in Chapter 13, emphasize the systematic study of just one or a few individuals, rather than drawing conclusions from large groups.

We will examine the rationale behind single-participant experiments, describe the mechanics and logic of small-N designs, and assess the scientific contributions of a particularly famous case study, Henry Molaison, also known as H.M.

Our discussion will continue with a critical analysis of the practical challenges and trade-offs in case study research, the necessity of experimental control, and the scientific value of studying special cases. We will carefully consider the disadvantages of small-N designs and survey three foundational structures used in small-N research: the stable-baseline, multiple-baseline, and reversal designs.

We will also highlight examples from classic and contemporary research that illustrate the power and limitations of small-N designs, including work by Jean Piaget, Hermann Ebbinghaus, and others.

Lastly, we will evaluate small-N research through the lens of four key types of validity: internal, external, construct, and statistical. Each of these components will be explained in depth and contextualized through practical examples, with the aim of equipping you to understand, critique, and possibly implement small-N research in your own academic or clinical work.

What is the value of an experiment with just one participant?

At first glance, conducting an experiment with just one participant may seem scientifically limited or even anecdotal. However, small-N or single-case experiments are rooted in methodological rigor and are capable of yielding precise and informative results. One key advantage is the ability to observe and measure changes within the same individual over time.

Instead of comparing a treatment group to a control group, researchers observe a single individual across different conditions, enabling them to isolate the impact of the intervention more clearly. This within-subjects design enhances experimental control because it eliminates variability between subjects that often confounds large-group designs. For example, differences in age, personality, or background do not interfere with results when the same individual is used as their own comparison.

Such designs are invaluable when the target behavior is rare or when ethical concerns prevent random assignment.

Another strength of single-participant experiments is their adaptability. Researchers can make real-time adjustments to protocols, increasing the practical utility of these designs in clinical settings. They can respond to emerging trends in the data, ensuring that the research remains both scientifically sound and clinically meaningful.

Moreover, these designs are often preferred in applied behavioral analysis, where continuous monitoring and individualized intervention are necessary. Through frequent measurement and graphing of outcomes, patterns emerge that can substantiate causal claims even without a comparison group.

Visual analysis plays a central role in small-N research, helping to identify trends, level changes, and stability within phases. These visual tools are complemented by statistical techniques such as randomization tests or effect size estimation, providing further analytical depth. Overall, experiments with just one participant provide a high-resolution lens on behavior, yielding insights that are both scientifically valid and practically relevant.

Small-N Designs: Studying Only a Few Individuals

Small-N designs refer to a class of research methods that focus intensively on a small number of subjects, often ranging from one to five individuals. Unlike large-N studies, which aim to generalize findings across populations, small-N studies seek to identify causal relationships and behavioral mechanisms within individual subjects. This makes them particularly well-suited to clinical psychology, neuropsychology, and behavior analysis, where individualized understanding is crucial.

A defining feature of small-N research is its reliance on repeated measures. Researchers observe the same behavior under baseline conditions, during intervention, and often during a withdrawal or follow-up phase. This continuous measurement allows for the identification of treatment effects and ensures that observed changes are attributable to the intervention rather than to random variation.

These designs are particularly powerful when the treatment effect is large and immediate, allowing researchers to infer causality with confidence. Another advantage is their ability to detect subtle patterns and behavioral changes that might be lost in group averages.

Since data are collected with high frequency, researchers can observe moment-to-moment changes and identify both short-term and long-term effects. These insights are especially valuable when evaluating behavioral treatments that need to be personalized to each client.

In terms of implementation, small-N studies are often more feasible and cost-effective than large-scale trials. They require fewer resources, can be conducted in naturalistic settings, and provide rapid feedback for treatment adjustments. Despite these strengths, small-N designs are not immune to criticism. Concerns about generalizability, statistical inference, and bias must be carefully addressed. Nevertheless, when conducted with rigor, small-N studies offer a powerful tool for psychological research and practice.

Research on Human Memory (Henry Molaison case)

The case of Henry Molaison, known to the scientific community as H.M., is one of the most iconic examples of a small-N design in neuroscience and psychology. After undergoing surgery in 1953 to alleviate severe epilepsy, H.M. lost his ability to form new declarative memories, a condition known as anterograde amnesia. The surgery removed large portions of his medial temporal lobes, including the hippocampus, which later research identified as crucial for memory consolidation.

Over the following decades, H.M. participated in hundreds of studies that collectively shaped our understanding of memory systems. His case revealed that memory is not a single, unitary function but comprises multiple systems, including declarative (explicit) and procedural (implicit) memory. For instance, H.M. was able to learn new motor skills, such as mirror drawing, despite having no conscious recollection of practicing the task. This finding supported the idea that procedural memory operates independently of the hippocampus, a revolutionary insight at the time.

The studies conducted with H.M. were characterized by careful experimental control, repeated measures, and cross-validation with other neuropsychological cases and animal studies. Each task was meticulously designed to isolate specific components of memory, and findings were replicated both within H.M. and across other patients with similar lesions. Importantly, the longitudinal nature of his participation enabled researchers to observe the stability and consistency of his deficits over time, a level of detail rarely possible in group-based studies.

His case also set ethical standards for small-N research, as his identity was protected during his lifetime, and his brain was preserved and mapped in high resolution after his death. Overall, the study of H.M. not only advanced theoretical models of memory but also demonstrated the immense scientific value of small-N research when executed with precision and ethical care.

Balancing Priorities in Case Study Research

Conducting small-N research, particularly with rare or special cases, requires careful balancing of scientific, practical, and ethical priorities. One of the most important considerations is the trade-off between experimental control and ecological validity. Researchers must design studies that isolate causal relationships while ensuring that procedures are realistic and ethically acceptable. For instance, while randomized control trials offer high internal validity, they may be impractical or unethical in individualized therapeutic contexts.

Small-N researchers often rely on repeated observations, structured phases, and within-subject comparisons to draw valid conclusions without sacrificing participant welfare. Measurement intensity is another key concern. High-frequency data collection increases reliability and interpretive clarity but also demands significant time and energy from both researchers and participants.

Fatigue and compliance issues can threaten data quality, so protocols must be designed to be sustainable over extended periods. Ethical responsibility is especially salient in small-N research due to the close interaction between researchers and participants. Informed consent must be ongoing, and participants should be made aware of their rights and the goals of the study throughout their involvement.

Long-term collaborations, such as that with H.M., highlight the importance of maintaining ethical standards over time as technologies, expectations, and methodologies evolve. Another consideration is generalizability. Although small-N designs do not aim for statistical generalization, they can support what is known as analytic generalization. This means the results inform theoretical principles that can be tested in broader contexts. Triangulation is also crucial: using multiple measures, observers, or tasks can help validate findings and enhance their robustness. Ultimately, case study research demands a nuanced approach that balances methodological rigor with contextual sensitivity and ethical care.

Experimental Control

A central strength of small-N research is its capacity for experimental control, even in the absence of random assignment. Experimental control refers to the ability to manipulate independent variables while holding other factors constant, thereby establishing a causal link with the dependent variable.

In small-N designs, this control is often achieved through structured phases, such as baseline, intervention, and withdrawal or reversal. Researchers observe whether changes in the independent variable correspond with predictable changes in behavior, thus supporting causal inference.

One powerful strategy is the use of consistent measurement tools and procedures across conditions. For instance, keeping the testing environment, instructions, and materials identical ensures that any observed changes are due to the intervention rather than external factors.

Repetition is another tool for control. If the same effect is observed repeatedly within the same participant, confidence in the causal relationship increases. This form of intra-subject replication can serve as a substitute for the inter-subject replication used in large-N designs.

Multiple-baseline strategies offer another layer of control by staggering the introduction of the intervention across different behaviors, settings, or individuals. If behavior changes only when and where the intervention is introduced, researchers can more confidently attribute effects to the treatment rather than to time or other external variables.

Visual analysis of graphed data also contributes to experimental control. Researchers look for changes in level, trend, and variability that coincide with intervention phases. These patterns provide strong inferential evidence, even when formal statistical testing is not feasible. In some cases, researchers also use statistical tools like randomization tests or permutation analyses to bolster their conclusions. Ultimately, experimental control in small-N research stems not from sample size but from careful design, consistent procedures, and rigorous observation.

Studying Special Cases

Small-N designs are especially powerful when they are used to examine individuals who present with rare, exceptional, or neurologically unusual conditions. These "special cases" provide a unique window into cognitive or behavioral mechanisms that may otherwise go unobserved in large-N research. For example, studying individuals with split-brain surgery or those who have acquired savant syndrome after brain injury allows researchers to isolate brain-behavior relationships in highly specific ways.

One hallmark of studying special cases is the design of customized experimental paradigms. These tasks are carefully crafted to isolate one or more psychological functions, such as working memory, language processing, or motor learning.

In Henry Molaison's case, mirror-tracing tasks provided compelling evidence of preserved procedural memory despite profound declarative memory deficits. Such studies illuminate how functions dissociate under neurological damage, providing vital clues about underlying architecture.

The second strength of special case research lies in its ability to reveal dissociations—that is, cases where one ability is preserved while another is lost. These dissociations offer direct support for theories proposing modular or functionally segregated brain systems. They also provide data that challenge or refine existing models. For example, a patient may retain reading comprehension but lose the ability to write, forcing researchers to reconsider the structure of language processing.

Third, studying special cases can yield insights into neural plasticity. Individuals who adapt to neurological impairments often show compensatory strategies that provide data on how the brain reorganizes functions. These findings have clinical implications for rehabilitation and cognitive retraining.

Fourth, special cases help researchers test the boundaries of psychological generalizations. What is assumed to be a universal principle may not hold under all conditions, and exceptions provide necessary complexity to our models.

Fifth, these studies also highlight individual variability in response to interventions, showcasing the necessity for personalized assessment.

Finally, it is critical to remember that special case studies, while powerful, must be interpreted with care. Findings should be validated through replication, triangulation with other methods, and convergence with broader theoretical frameworks. When studied thoughtfully, special cases enhance our understanding of cognition, behavior, and the human experience.

Disadvantages of Small-N Studies

Despite their precision and adaptability, small-N studies are not without notable drawbacks.

One of the most pressing concerns is the issue of generalizability. Because findings are derived from such a small number of individuals, often with unique characteristics, it becomes difficult to claim that the results would apply to a wider population.

For example, while a treatment may work exceptionally well for one individual with autism, this does not mean it will be equally effective for others with different backgrounds or symptom profiles. Researchers must therefore be cautious in framing their conclusions, emphasizing analytic rather than statistical generalization.

A second limitation involves the vulnerability to idiosyncratic influences. In small-N research, there is no averaging effect to cancel out anomalies; individual quirks, environmental fluctuations, or motivational states can distort results. This makes it essential to design studies with strong internal controls and clear phase distinctions. Moreover, while repeated measurement helps stabilize observations, it does not eliminate the risk that a participant’s behavior may be influenced by factors unrelated to the treatment, such as fatigue or external stressors.

Third, the statistical analysis of small-N data can be challenging. Traditional inferential tests often require larger sample sizes to produce valid results. Although visual inspection and nonparametric methods offer useful alternatives, these approaches may be perceived as less rigorous by some reviewers and journals, potentially limiting publication opportunities.

Fourth, small-N studies can be resource-intensive despite involving fewer participants. Because data collection is so frequent and detailed, researchers must invest substantial time in designing tasks, managing sessions, and analyzing data. Participants, too, may experience fatigue or disengagement over long study periods, especially if interventions are demanding.

Fifth, ethical concerns must be carefully managed. The close relationship between researcher and participant in small-N designs necessitates heightened attention to informed consent, confidentiality, and the potential for undue influence. When studies involve sensitive topics or vulnerable populations, these concerns are magnified.

Finally, small-N research often faces institutional or funding barriers. Granting agencies and ethics boards may be less familiar with the methodology or skeptical of its scientific merit, making it harder to secure resources or approval. For these reasons, researchers must be well-prepared to justify their approach and communicate its advantages clearly. Despite these limitations, small-N designs remain invaluable when used appropriately and interpreted with care.

Three Small-N Designs

Small-N research is distinguished by its reliance on specific design structures that allow for strong causal inference even with very few participants. Among the most widely used are the stable-baseline, multiple-baseline, and reversal designs. Each design brings its own strengths and limitations but shares a common goal: to isolate the effect of an intervention by controlling the timing and context in which it is introduced.

A hallmark of these approaches is their use of repeated measurements and careful phase transitions. Researchers begin by observing a stable baseline period during which no intervention is applied, establishing a pattern of behavior that serves as a reference.

Changes that occur only after the intervention is introduced—and not before—support the hypothesis that the treatment, and not some extraneous factor, is responsible for the observed effect. Each of the three designs provides a different strategy for establishing this causal link.

In the stable-baseline design, behavior is tracked over a long period prior to any treatment. This ensures that the baseline is truly stable and any subsequent change can be more confidently attributed to the intervention. This design is particularly effective when behaviors are predictable and unlikely to change spontaneously. For example, if a child with a speech delay consistently fails to produce certain phonemes across ten consecutive sessions, and then improves immediately following a phonetic intervention, the causal inference is strengthened. However, if behavior is unstable during baseline, interpreting the effect of treatment becomes more difficult.

The multiple-baseline design introduces the intervention at different times across different individuals, settings, or behaviors. This staggered approach helps control for external influences such as maturation or environmental changes. If improvement consistently occurs only when the intervention is introduced—regardless of timing—it strengthens the argument that the treatment, rather than outside factors, caused the improvement. For example, if three students each begin a self-monitoring intervention for homework completion on different weeks and each shows improvement only after their respective start date, this provides compelling evidence of treatment efficacy.

The reversal design (also known as A-B-A or A-B-A-B) involves introducing and then withdrawing the intervention to see whether the behavior returns to baseline. If behavior worsens when the treatment is removed and improves again when it is reintroduced, the researcher can be more confident that the intervention is having an effect. This method provides very strong internal validity but is only appropriate when withdrawal of the treatment is ethical and practical. For example, in treating disruptive classroom behavior with a token economy, the intervention could be briefly removed to observe if misbehavior resurges, then reintroduced to confirm its effect. This design is less suitable for treatments that have lasting effects or for populations where removing treatment may cause harm.

While all three designs share the strength of establishing clear temporal links between intervention and outcome, each is best suited to different contexts. The choice of design should be based on the research question, the nature of the behavior, ethical considerations, and practical feasibility. Researchers may also combine elements of these designs or supplement them with statistical tools like randomization tests and effect size estimates to bolster their conclusions. Ultimately, understanding these foundational designs equips researchers to conduct small-N studies with rigor and confidence.

Other Examples of Small-N Studies

In addition to well-known clinical cases like H.M., the history of psychology is rich with small-N studies that have advanced theory and practice across subfields. One of the earliest examples is the work of Jean Piaget, the Swiss developmental psychologist who closely studied his own children to develop a stage theory of cognitive development. Through systematic observation and ingenious tasks designed to test logical reasoning, conservation, and egocentrism, Piaget identified four major stages of development: sensorimotor, preoperational, concrete operational, and formal operational. While his sample was limited, the theoretical contributions from these intensive observations have guided decades of developmental research and educational practice.

Similarly, Hermann Ebbinghaus conducted pioneering studies on memory using himself as the sole participant. By memorizing and recalling lists of nonsense syllables, Ebbinghaus was able to quantify learning curves, forgetting rates, and the effects of overlearning and spacing. His findings gave rise to foundational concepts like the forgetting curve and the spacing effect, both of which remain central to memory research today. These studies demonstrated that carefully controlled self-experimentation could yield replicable, generalizable psychological principles.

Another significant small-N study was carried out by Anders Ericsson and his colleagues with a participant known as "S.F.," a college student who underwent extensive training to increase his memory span for digits. Over several years, S.F. improved from an average digit span of seven to nearly eighty by developing sophisticated chunking strategies. This case challenged the assumption that short-term memory capacity is fixed and suggested that expertise and strategy play a critical role in cognitive performance. The study of S.F. contributed to the development of theories about the relationship between working memory, long-term memory, and deliberate practice.

In applied behavioral settings, small-N studies have been instrumental in validating interventions. A notable example is the work of Fox and colleagues (1987) who implemented a token economy to improve safety behavior in a mining operation. Using a reversal design across multiple mine sites, they were able to demonstrate significant reductions in injury rates, supporting the practical utility of reinforcement-based systems in industrial psychology.

Even perceptual psychology has benefited from small-N methods. Psychophysical research often uses individual subjects tested over many trials to establish sensory thresholds and discrimination abilities. The high internal consistency of data obtained from such designs allows researchers to detect fine-grained perceptual effects that might be lost in larger, noisier datasets. These findings are frequently replicated across subjects to ensure generality, but the precision of the original small-N measurements is what allows such insights to emerge in the first place.

Taken together, these examples illustrate that small-N designs are not merely fallback options when large samples are unavailable. Instead, they are powerful methodological tools that have shaped psychological science across domains.

Their utility lies not just in what they reveal about individuals but in how those insights, when replicated and theoretically integrated, inform our broader understanding of the human mind.

Evaluating the Four Validities in Small-N Designs

When assessing the quality of small-N research, it is crucial to apply the same criteria used in evaluating larger group studies—namely, the four types of research validity: internal, external, construct, and statistical validity. These dimensions provide a comprehensive framework for determining whether the results of a small-N study can be trusted and how they can be interpreted. Internal validity refers to the extent to which the observed changes in the dependent variable can be confidently attributed to the manipulation of the independent variable rather than to extraneous variables.

Small-N designs typically achieve high internal validity through structured designs like baseline, treatment, and withdrawal phases. Because the same individual serves as both control and experimental subject, many confounding variables related to individual differences are inherently controlled. Reversal and multiple-baseline designs further strengthen internal validity by demonstrating that changes occur only when the treatment is applied and revert when it is withdrawn or withheld.

External validity, on the other hand, pertains to the generalizability of findings beyond the specific context of the study. This is where small-N research often faces its greatest challenge. Since results are derived from a limited number of individuals, often with unique traits or circumstances, applying those results to larger or different populations must be done with caution. Nonetheless, external validity can be supported through replication across similar cases and through analytic generalization, where findings inform broader theories that can be tested in new contexts. It’s also enhanced when small-N studies are conducted in naturalistic or applied settings that mirror real-life environments.

Construct validity addresses whether the variables under study are accurately and meaningfully operationalized. In small-N designs, this often involves using standardized instruments, well-defined behaviors, and reliable data collection methods to ensure that the constructs being studied truly reflect the theoretical concepts of interest. For example, when studying memory, researchers might use validated tasks like digit span or recall tests to ensure they are measuring cognitive retention and not confounded abilities like attention or motivation. Given the intensive nature of small-N data collection, researchers are usually well-positioned to ensure construct validity through repeated trials and fine-tuned instrumentation.

Statistical validity, the final domain, refers to the appropriateness and reliability of the statistical conclusions drawn from the data. While traditional inferential statistics may not be feasible due to small sample sizes, alternative methods such as visual analysis, effect size estimation, and randomization tests provide robust tools for evaluating change.

Visual inspection remains central to small-N research, allowing researchers to detect level changes, trends, and stability across phases. These visual insights can be supplemented with statistical techniques that account for autocorrelation and trend direction. Although critics may question the rigor of such approaches, they have been well-validated and provide meaningful ways to support causal inference.

Together, these four validities form a balanced lens through which small-N studies can be evaluated. Researchers must be intentional about meeting each criterion through careful design, rigorous implementation, and transparent reporting.

When internal, external, construct, and statistical validity are all reasonably addressed, small-N research offers conclusions that are both credible and useful.

These designs, while sometimes seen as methodologically narrow, actually embody many of the most stringent scientific principles when applied thoughtfully. In the final analysis, evaluating small-N studies with the same critical tools used for large-N research reaffirms their value and solidifies their role in the broader psychological science enterprise.

Time Series Analysis

A time series is a set of data points collected in a specific order across time—often at regular intervals such as daily, weekly, or per session. In psychological research, time series designs are commonly used to monitor how symptoms or behaviors change across time, particularly in response to an intervention. For example, a clinician might record a client’s anxiety rating before, during, and after a neurofeedback protocol. These repeated observations allow the clinician to examine whether there is a meaningful pattern of change that coincides with the timing of the intervention.

A central feature of time series data is that each observation is not independent of the ones that came before it. This dependency is known as autocorrelation—a statistical phenomenon in which current data points are influenced by previous values in the series. For instance, a client’s anxiety rating today is likely to be similar to their rating yesterday, especially in the absence of dramatic environmental changes.

This temporal dependency violates the assumption of independence that underlies many traditional statistical tests such as the t-test or ANOVA. Failing to account for autocorrelation can lead to inflated Type I error rates, meaning researchers might incorrectly conclude that a treatment was effective when the observed pattern could simply be due to natural momentum or trend in the data.

To analyze symptom change across a time series, the number of observations required depends largely on the complexity of the pattern being studied and the statistical approach being used. In general, when using statistical models that explicitly account for autocorrelation—such as interrupted time series analysis or autoregressive models—a minimum of 30 to 50 data points is recommended.

This quantity of data allows for the detection of level shifts (abrupt changes) or slope changes (gradual trends) that occur in response to an intervention. In contrast, for smaller datasets with perhaps 6 to 12 observations per phase, alternative methods such as randomization tests or visual analysis can still yield useful insights. These methods are especially suitable in clinical settings where lengthy observation periods may be impractical or unethical.

Among the appropriate statistical tools, randomization tests are particularly well suited to small-N time series designs. In these tests, the timing of intervention phases is randomly permuted many times to determine how likely the observed change is to occur by chance. Because randomization tests do not assume a specific distribution of the data and are unaffected by autocorrelation, they offer a powerful option for analyzing phase-based changes in short time series.

When more data are available—typically 30 or more time points—researchers often use interrupted time series analysis or piecewise regression, both of which model changes in level and trend following the introduction of an intervention. These approaches explicitly model autocorrelation, making them more statistically valid for analyzing serial data. For longer series with more complex temporal dynamics, ARIMA models (AutoRegressive Integrated Moving Average) may be used to model and remove trends and dependencies, providing robust estimates of treatment effects while accommodating autocorrelation, seasonality, and noise.

In sum, time series analysis is a powerful method for assessing symptom change over time, particularly in small-N or clinical contexts.

Understanding and addressing autocorrelation is essential to ensure valid inferences. Whether using randomization tests for short series or autoregressive models for longer ones, researchers must choose their statistical tools carefully based on the number of observations and the structure of their data.

Incorporating Small-N Research into Neurofeedback Practice

For neurofeedback professionals, conducting randomized controlled trials (RCTs) within a clinical setting is often impractical or ethically constrained. Instead, small-N designs offer a scientifically rigorous and clinically feasible way to evaluate the efficacy of neurofeedback interventions on an individual basis.

These designs allow practitioners to collect repeated measures of client performance, track progress over time, and implement structured intervention protocols without the need for large sample sizes or control groups. For example, a clinician treating a client with generalized anxiety disorder could use a stable-baseline design to monitor baseline EEG patterns and anxiety symptom ratings for several weeks prior to beginning training.

Once neurofeedback is introduced, any measurable improvements in brainwave patterns or self-reported anxiety—particularly if they occur shortly after the start of training and not before—can support a causal interpretation.

Similarly, a multiple-baseline design could be employed across clients or across different symptoms for the same individual. Suppose a neurofeedback practitioner is treating three clients for attention difficulties using a beta-theta ratio protocol. By introducing the training at different times for each client, the practitioner can rule out external events or seasonal influences as the cause of improvement. If gains in attentional focus and EEG normalization consistently occur only after training begins for each client, the evidence in favor of the protocol’s effectiveness becomes stronger.

This same principle can be applied to different symptom domains within one client—such as using neurofeedback first to target sleep problems and later to address irritability—thus establishing treatment-specific effects.

Reversal designs can also be adapted to neurofeedback, provided that ethical standards allow for the withdrawal of treatment. For instance, a practitioner working with a client experiencing migraine symptoms could initiate neurofeedback, observe symptom reduction, then briefly pause training to see whether symptoms return. If headaches re-emerge during the withdrawal phase and resolve again upon reintroduction of the training, this pattern bolsters the causal link between the neurofeedback intervention and the observed effects.

Visual analysis plays a crucial role in interpreting results. Neurofeedback practitioners can chart session-by-session changes in EEG metrics and behavioral or self-report outcomes, allowing for the identification of trends, level shifts, and phase stability. These graphical depictions can be supplemented with effect size estimations or randomization tests to enhance the credibility of findings. Additionally, clients often appreciate seeing tangible, visual representations of their progress, which can foster engagement and adherence.

Importantly, using small-N methods does not preclude collaboration or generalization. By replicating protocols across similar cases and sharing de-identified data within professional networks, neurofeedback practitioners contribute to a growing practice-based evidence base.

These replications add weight to individual findings and help establish norms, refine protocols, and identify client characteristics that moderate treatment response.

In sum, small-N research designs empower neurofeedback professionals to conduct meaningful, individualized research in real-world settings. They support ethical, flexible, and scientifically sound inquiry when RCTs are not feasible. Through structured protocols, careful measurement, and critical analysis, clinicians can demonstrate the efficacy of their interventions and contribute to the advancement of the field one case at a time.

Conclusion

Small-N designs represent a rigorous and versatile methodological framework in psychological science, emphasizing intensive study of one or a few individuals rather than statistical generalization from large samples. This approach is particularly valuable in contexts where individualized analysis, ethical constraints, or rare conditions preclude large-scale trials.

The foundational rationale is to achieve high internal validity by using repeated measures, structured phase changes, and within-subject comparisons to establish causal inferences.

Iconic cases like Henry Molaison (H.M.) demonstrate how small-N research can lead to major theoretical breakthroughs, while historical examples from Ebbinghaus, Piaget, and Ericsson underscore its enduring value across subfields.

These methods are not only historically influential but are also adaptable to contemporary clinical applications. In neurofeedback, for example, small-N designs provide a means to rigorously evaluate interventions when randomized controlled trials are unfeasible. Multiple-baseline and reversal designs help isolate treatment effects, while visual analysis and nonparametric statistics provide strong interpretive tools.

Despite their strengths, small-N studies face challenges in generalizability, statistical inference, and institutional recognition. However, by carefully addressing internal, external, construct, and statistical validity, small-N research can yield findings that are both credible and actionable.

Ultimately, small-N designs serve as a powerful methodological alternative for both scientific discovery and clinical evaluation. When executed with rigor, ethical care, and analytic transparency, they not only contribute to theory development but also enhance practice-based evidence in fields such as neuropsychology, behavioral therapy, and applied neuroscience.

Key Takeaways

Small-N designs enable causal inference within individuals through repeated measures, structured phases, and high-frequency data collection, enhancing experimental control without requiring large sample sizes.

Classic case studies—such as H.M. and Piaget’s children—demonstrate that small-N research can yield foundational insights, particularly when longitudinal, theoretically grounded, and methodologically rigorous.

Three foundational small-N structures—stable-baseline, multiple-baseline, and reversal designs—provide flexible, robust tools for isolating treatment effects and testing behavioral interventions.

Despite limitations in generalizability and statistical power, small-N research can achieve strong internal, construct, and statistical validity when triangulated with replication and visual or nonparametric analysis.

Neurofeedback professionals can integrate small-N designs into clinical practice to track client progress, evaluate protocol efficacy, and contribute to a replicable, ethically responsible evidence base without the need for randomized trials.

Glossary

analytic generalization: a form of inference in which findings from a case study inform broader theoretical principles rather than being statistically generalized to a population.

ARIMA models (Autoregressive Integrated Moving Average): a family of statistical models used to analyze and forecast time series data by accounting for autocorrelation, trends, and seasonal patterns. ARIMA combines three components: autoregression (AR), differencing to remove trends (I for integration), and moving averages (MA), making it suitable for complex, long-term time series analysis.

baseline: a phase in small-N designs where behavior is measured before any intervention is introduced, serving as a reference for evaluating treatment effects.

causal inference: the process of concluding that a change in one variable is responsible for a change in another, typically achieved through experimental control.

confounding variables: factors other than the independent variable that may influence the dependent variable, potentially distorting causal conclusions.

construct validity: the extent to which the variables in a study accurately represent the theoretical concepts they are intended to measure.

Ebbinghaus: a pioneering German psychologist who conducted the first systematic studies of memory using himself as a participant. he is best known for discovering the forgetting curve, the spacing effect, and for introducing the use of nonsense syllables to study memory retention and recall.

effect size estimation: a quantitative measure of the strength or magnitude of a treatment effect, often used to supplement visual analysis in small-N research.

Ericsson: a Swedish psychologist best known for his research on expertise and deliberate practice. he conducted influential small-n studies, including work with the subject "S.F.," who dramatically expanded his memory span through structured training. ericsson's work emphasized that expert performance results from sustained, goal-directed practice rather than innate talent.

experimental control: the practice of systematically manipulating the independent variable while keeping all other variables constant to establish causal relationships.

external validity: the extent to which the findings of a study can be generalized to other people, settings, or times.

generalizability: the applicability of study findings beyond the specific context or participants of the original research.

Henry Molaison (H.M.): an American patient (1926–2008) who became the most extensively studied case in the history of cognitive neuroscience after undergoing bilateral medial temporal lobe resection in 1953 to treat intractable epilepsy. the surgery resulted in severe anterograde amnesia, rendering him unable to form new declarative memories. studied under the initials H.M. during his lifetime, his case provided crucial evidence for the role of the hippocampus in memory consolidation and demonstrated the dissociation between declarative and procedural memory systems.

independent variable: the factor that is manipulated by the researcher to observe its effect on the dependent variable.

internal validity: the degree to which observed changes in the dependent variable are caused by manipulations of the independent variable, not by other factors.

interrupted time series analysis: a statistical method used to evaluate the effect of an intervention by analyzing data collected at multiple time points before and after the intervention. it models changes in level and trend while accounting for pre-existing patterns, enabling causal inference when randomized control is not feasible.

inter-subject replication: the repetition of an effect across different participants, often used in large-N designs to confirm generalizability.

intra-subject replication: the repetition of an effect within the same individual across different conditions or time points in a small-N design.

multiple-baseline design: a small-N research design in which an intervention is introduced at different times across different subjects, settings, or behaviors to control for confounding variables.

nonparametric methods: statistical techniques that do not assume a normal distribution of the data, often used in small-N research due to limited sample sizes.

operationalized: defined in a measurable and observable way that allows for empirical testing of a theoretical construct.

Piaget: a Swiss developmental psychologist who used detailed observations of his own children to formulate a stage theory of cognitive development. through small-N studies, he identified distinct stages—sensorimotor, preoperational, concrete operational, and formal operational—each characterized by qualitatively different modes of thinking.

piecewise regression: a form of regression analysis that fits separate linear models to different segments of a dataset, typically before and after an intervention point. it allows researchers to detect shifts in intercept (level) and slope (trend), providing a structured way to analyze changes across phases in time series data.

randomization tests: a type of statistical test that involves comparing the observed effect to a distribution of effects obtained by randomly reassigning treatment conditions, useful in small-N designs.

reversal design: a small-N research design (also called A-B-A or A-B-A-B) that introduces and then withdraws an intervention to assess whether behavioral changes are dependent on the treatment.

stable-baseline design: a small-N research design in which a long and consistent pre-treatment phase establishes behavioral stability, strengthening causal claims when a treatment is introduced.

statistical inference: the process of drawing conclusions about a population based on data collected from a sample, often using hypothesis testing or confidence intervals.

statistical validity: the extent to which the conclusions drawn from statistical analyses are accurate and reliable, including whether the study has enough power and appropriate analysis techniques.

triangulation: the use of multiple methods, observers, or measures to cross-validate findings and enhance the credibility of research conclusions.

visual analysis: the examination of graphed data to identify patterns such as trends, level changes, and variability across phases in small-N research.

References

Borckardt, J. J., Nash, M. R., Murphy, M. D., Moore, M., Shaw, D., & O’Neil, P. (2008). Clinical practice as natural laboratory for psychotherapy research: A guide to case-based time-series analysis. American Psychologist, 63(2), 77–95. https://doi.org/10.1037/0003-066X.63.2.77

Edgington, E. S., & Onghena, P. (2007). Randomization tests (4th ed.). Chapman and Hall/CRC.

Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings (2nd ed.). Oxford University Press.