Learning to Learn: How Better Reinforcement Design Makes Neurofeedback and Biofeedback Work (and Stick)
- Fred Shaffer
- 2 days ago
- 15 min read

This post summarizes the practical findings from Kerson, Sherlin, and Davelaar's (2025) open-access article, "Neurofeedback, Biofeedback, and Basic Learning Theory: Revisiting the 2011 Conceptual Framework," published in Applied Psychophysiology and Biofeedback.
Why This Paper Matters In The Clinic
Learning theory can feel like a first-year textbook problem, but this paper treats it like a clinical performance problem. The authors’ central claim is simple: neurofeedback and biofeedback outcomes improve when we stop treating feedback as “information” and start treating it as an engineered learning environment. In practice, your protocol is not only a physiological target. It is a training ecosystem that either facilitates learning and stability or confuses and destabilizes it. The paper makes this point across signals, whether you are reinforcing an EEG feature, a heart rhythm pattern, a muscle activation level, or a sympathetic arousal marker, because all of them become “behaviors” once they are tied to consequences.
That framing is useful because it explains some familiar clinical puzzles. Why does a client “do great” in session but cannot reproduce the state at home? Why do improvements plateau even when the client is motivated and compliant?
Why do some clients improve even when the protocol is not perfectly matched, while others need a very tight setup? The paper answers that these patterns often reflect basic learning variables such as reinforcement consistency, timing, and the client’s ability to recognize and re-enter the trained state on demand.
It also explains why providers who move fluidly between neurofeedback and biofeedback often recognize the same failure modes in different costumes: the signal changes, but the learning problem stays the same.
The Core Reframe: Self-Regulation Is Learned Behavior
The paper revisits the idea that neurofeedback and biofeedback are forms of operant learning, meaning the brain and body change because consequences follow successful patterns. “Operant” here matters because it shifts responsibility from “tell the client what to do” to “shape what the nervous system repeats.” Your feedback becomes a consequence that strengthens whatever state immediately preceded it.
In that sense, EEG training and modalities such as HRV, EDA, temperature, and sEMG share the same foundation: the client is not “trying harder” so much as learning which internal state reliably elicits reward.
Clinically, you can think of training as running on two tracks at once. One track is explicit learning, meaning the client intentionally employs strategies such as softening the gaze, relaxing the jaw, pacing the breath, or focusing attention in a particular way. The other track is implicit learning, meaning the nervous system gradually discovers stable patterns that elicit reward without the client being able to explain what changed fully.
Many clients improve through a combination of both, and the balance can vary by age, cognitive style, symptom profile, and even fatigue level on that day.
Biofeedback modalities often make this dual-track process easier to coach because the “strategy space” is more tangible. A client can deliberately adjust the breathing rhythm during HRV training or release frontalis tension during sEMG training. At the same time, deeper learning continues to occur implicitly as the autonomic and motor systems converge on a steadier set point.
A practical implication is that it is not always a red flag when a client says, “I don’t know how I did it.” That can be normal implicit learning. The red flag is when the client’s success appears random to them and in the data, such as reward rates that fluctuate widely despite similar effort. That pattern suggests a learning-environment problem rather than a motivational problem.
This is where integrated practice helps: if “random success” shows up in EEG, a brief shift to sEMG or HRV can reveal whether the client can learn cleanly when the contingency is easier to perceive, which often clarifies whether the barrier is the client or the feedback loop.
Reinforcement Schedules Are Clinical Levers, Not Software Settings
A reinforcement schedule is the rule that determines when feedback is delivered. Many systems operate under continuous reinforcement, meaning the client receives an immediate reward whenever the signal meets the criteria. Other systems operate as interval schedules, meaning they evaluate performance at fixed intervals and reward only if the criteria are met at that instant.
The paper’s point is not that one schedule is always superior, but that schedule choice determines what the nervous system practices across EEG and peripheral channels alike.
This difference sounds technical, but it changes what clients experience. If reinforcement is essentially continuous, clients tend to feel a smoother, more “responsive” system. They can experiment and notice what increases reward. If reinforcement is sampled at intervals, brief moments of doing it right may be missed, creating the subjective experience that the system is stingy or inconsistent, even when the client is intermittently producing the target state.
That same “missed moment” problem can happen in HRV training when breathing and heart rhythm briefly synchronize between sample checks, or in EDA training when a brief downshift in arousal is not captured as reinforcement.
In clinical terms, you can treat reward feel as a diagnostic signal. If a competent, engaged client reports that the reward seems disconnected from their efforts, you should suspect a schedule, threshold, or artifact issue before assuming resistance or poor insight. You can test this by temporarily simplifying the task, raising reward probability, or switching feedback modalities, for example, pairing a clear auditory tone with a simple visual indicator. If the client suddenly “finds it,” the learning environment was probably too hard or too noisy.
Crucially, modality switching is not a detour from neurofeedback. Within the learning-theory framework, it is a way to preserve operant momentum by training a cleaner or more controllable signal while refining the original target.
When “Sham-Like” Learning Happens In Real Clinics
Random reinforcement, sometimes used as a sham condition in research, means feedback is not tightly linked to the client’s actual physiological state. The paper’s discussion is clinically relevant because a session can unintentionally drift toward “sham-like” conditions whenever reinforcement loses contingency. This risk is not unique to EEG. Any modality can become effectively noncontingent when artifacts, drift, or poorly chosen thresholds cause the reward to detach from the intended state.
This can occur in routine practice for reasons that appear mundane. EEG artifact can inflate or suppress the very metrics you are rewarding. Muscle tension can masquerade as high-frequency activity. Eye movements can contaminate frontal sites. Poor electrode contact can introduce slow drift. Line noise can add rhythmic contamination that a client cannot control. In all of these cases, the system may deliver rewards that are unrelated to the intended brain state.
From the client’s perspective, the task becomes a slot machine. Some clients continue to improve because they learn relaxation, attentional stability, or expectancy-driven control. Other clients stagnate because the nervous system cannot reliably discover what earns reward.
Biofeedback shows parallel problems with different faces. In EDA work, skin hydration, temperature shifts, or sensor instability can imitate “improvement.” In HRV analysis, breath holds or exaggerated breathing effort can produce transient patterns that appear to indicate success but do not reflect healthy regulation. In sEMG work, posture changes can reduce the signal while the client is still bracing elsewhere, so the system rewards a workaround instead of true release.
A practical way to operationalize this is to track reinforcement integrity as a routine clinical variable. Ask, in plain language, “Is the client being rewarded for the thing we think we are rewarding?” If you are unsure, make your uncertainty visible in the workflow. Increase artifact monitoring, reduce complexity, tighten signal quality checks, or temporarily shift training to a simpler physiological channel, such as respiration pacing or peripheral temperature, to restore a clean contingency while you troubleshoot the EEG.
Because the paper treats modalities as variations on the same learning architecture, this move is not “adding biofeedback.” It is using the full menu of controllable signals to keep the learning environment honest.
Timing And Latency: Reward That Arrives Late Teaches The Wrong Thing
The paper emphasizes that feedback is not only about whether the reward occurs, but also about when it occurs. Reinforcement learning depends on a tight coupling between a successful response and its consequence.
If the consequence arrives late, the nervous system may strengthen the response that occurred closer in time to the reward, which might be a compensatory strategy, an artifact, or a brief change unrelated to the intended target. In plain clinical language, late feedback can accidentally teach whatever the client did most recently, not what you intended to shape, regardless of whether the signal is EEG, HRV, EDA, or sEMG.
For neurofeedback providers, this becomes a practical engineering question. Your platform has a sequence of delays that may include signal acquisition, filtering, artifact handling, feature computation, threshold comparison, and visual or auditory display. Even when each step is fast, total latency can creep upward. The paper’s discussion encourages clinicians to treat latency as clinically meaningful rather than an invisible detail. The same operational thinking applies to biofeedback: smoothing windows, peak detection, coherence calculations, and display refresh rates can all introduce delay that alters what is reinforced.
You can use simple behavioral signs to detect a latency problem. If clients report that the feedback feels “behind,” they may be noticing a real mismatch. If they can increase reward only by performing abrupt, effortful actions, such as tensing, blinking, or holding their breath, your system may be rewarding short-lived transients or artifacts that are temporally aligned with reward.
When you correct latency or simplify processing, you often see a qualitative change: the client can succeed with calmer, steadier strategies, and the reward becomes easier to sustain. Across modalities, that shift often looks like healthier regulation: smoother breath pacing without strain in HRV training, quieter baseline arousal with fewer spikes in EDA work, and genuine muscle release rather than rigid stillness in sEMG training.
Expectancy, Meaning, And The Ethics Of Using Them Well
The paper treats expectancy as part of the learning environment. In clinical reality, clients do not arrive as blank slates. They arrive with hope, skepticism, fear, and stories about what their symptoms mean. Those expectations shape attention, effort, and persistence, all of which are learning-relevant behaviors. This is true whether the client is viewing a brain-based display or a peripheral physiology display, because expectancy influences what they attempt, how long they attempt it, and how they interpret sensations during learning.
For providers, the goal is not to eliminate expectancy.
The goal is to harness it ethically and keep it aligned with skill acquisition. You can do this by offering a credible rationale, making the task understandable, and explaining that progress is often nonlinear.
You can also set expectations around what learning typically looks like, such as early gains in state control that later need to consolidate into trait change. Integrated practice helps here because biofeedback can make the rationale feel concrete. When clients observe changes in heart rhythm patterns with breathing or detect muscle release reflected in sEMG, expectancy becomes grounded in visible cause-and-effect rather than hype, which often strengthens adherence without inflating claims.
A helpful clinician stance is to treat hope as fuel and data quality as steering. You can be encouraging while still insisting on clean signals, good thresholds, and honest interpretation. This prevents the common failure mode in which optimism masks a weak learning environment, leading to long stretches of training that appear busy but do not yield stable skills.
The Clinician Becomes A Learning Coach, Not A Technician
A distinctive practical thread in this paper is the emphasis on cultivating the client’s awareness of the trained state. Phenomenological awareness entails the client's learning to notice and describe what the target state feels like, how it arises, and how to return to it. In clinical terms, this is how you reduce “I can do it in session but not in life.”
The paper’s integrated message is that awareness is not a luxury add-on. It is a bridge between operant learning in the chair and self-regulation in the real world, across neurofeedback and biofeedback alike.
You can implement this without turning sessions into therapy talk.
After a good run, ask the client to describe the state in sensory language. What changed in breathing, muscle tone, gaze, posture, or emotional tone? What was the smallest action that helped?
If the client cannot describe it, provide options rather than forcing introspection. Over time, the client builds a personal map of the state. Biofeedback can accelerate this mapping because clients can often feel and label the shift more easily, such as a warmer hand during temperature training, a softer belly during breathing work, or a quieter jaw during sEMG, then carry that felt signature back into EEG work.
This also helps you detect when learning is happening through an unhelpful route. For example, a client might increase reward by becoming rigidly focused, which may appear as “success” in the signal but feel subjectively like strain. If you capture phenomenology, you can redirect learning toward a calmer strategy that supports generalization. When you treat modalities as a unified learning toolkit, you can also “triangulate” the strategy. If EEG reward rises while sEMG tension rises, you may be reinforcing effort. If HRV reward rises while the client reports air hunger, you may be reinforcing overbreathing. The point is to protect the learning target from being hijacked by compensations.
Practical Ways To Upgrade Your Training Environment
One useful way to apply the paper is to treat each session as a learning experiment with controllable variables.
Start with reward probability. If the task is too hard at the beginning, the client does not get enough successful trials to learn.
Many clients benefit from an initial phase in which rewards are frequent enough to allow the nervous system to discover the pathway, followed by a gradual tightening of criteria.
This is shaping, meaning you reinforce successive approximations toward the final target rather than demanding it immediately. Shaping is also where integrated biofeedback and neurofeedback practice becomes clinically elegant. You can begin with a modality that offers clear, controllable wins, such as reducing frontalis sEMG, stabilizing breathing for HRV training, or lowering tonic arousal in EDA, then layer EEG goals once the client has learned how “success” feels in the body.
Next, stabilize what “success” means. Threshold drift can make success unpredictable. If thresholds auto-adjust too aggressively, the client experiences a moving target. If thresholds never adjust, the client can reach a ceiling at which improvement no longer changes the reward. A clinically sensible approach is to adjust thresholds deliberately, in small steps, and to explain the purpose: “We are making it slightly harder, so your brain keeps learning,” or “We are making it slightly easier so you can find the pattern again.”
This same transparency works in biofeedback. Clients tend to tolerate tighter criteria when they understand that you are training a skill, not chasing a number.
Then, build transfer on purpose. Transfer trials reduce feedback, so the client must reproduce the state without constant external cues. You can do transfer inside the session by turning feedback off for brief intervals, then turning it back on to check whether the client can regain the state. You can also prescribe between-session practice that mirrors the trained state, such as short blocks of attention training, paced breathing within a comfortable range, or relaxation routines tied to the same cues used in session.
If you integrate modalities, transfer becomes even more practical. A client who cannot “do EEG” on demand can still practice the embodied correlates of the trained state at home, such as a specific breathing cadence, a jaw-release routine, or a downshift script that stabilizes arousal, then return to the session and test whether the EEG target becomes easier to access.
Finally, reduce accidental training. If the client discovers that jaw tension increases the reward, they may continue to do it. If they discover that breath holding increases reward, they may overuse it. If they discover that staring harder increases reward, they may train strain rather than self-regulation. You can prevent this by monitoring posture, facial tension, breathing, and effort level, and then explicitly reinforcing strategies that are sustainable and healthy.
If you integrate modalities, transfer becomes even more practical. A client who cannot “do EEG” on demand can still practice the embodied correlates of the trained state at home, such as a specific breathing cadence, a jaw-release routine, or a downshift script that stabilizes arousal, then return to the session and test whether the EEG target becomes easier to access.
Integrative Summary
Kerson, Sherlin, and Davelaar update the learning-theory foundation of neurofeedback by emphasizing that neurofeedback is not merely measurement; it is the construction of a learning environment where reinforcement schedules, timing, and signal integrity determine what the nervous system actually learns.
Their clinical message is that outcomes improve when providers treat reward contingency as a central variable, minimize conditions that make reinforcement effectively random, and optimize timing so feedback follows the target state closely enough to strengthen it.
They also emphasize that lasting benefits depend on bridging implicit operant learning with the client’s ability to recognize and re-enter the trained state, which can be supported by brief, structured attention to phenomenological awareness and deliberate transfer practice.
A central strength of this paper is that it refuses to silo neurofeedback and biofeedback. Instead, it treats EEG, HRV, EDA, temperature, and sEMG as different windows into the same underlying project, teaching self-regulation through well-designed consequences. That integrated stance invites clinicians to select the cleanest teachable signal at that moment, verify that the reward is truly contingent, and deliberately build transfer so that the learned state becomes a usable skill outside the clinic.
Key Takeaways
Neurofeedback is most effective when it is designed as a learning environment rather than as a physiological intervention.
Reinforcement schedules and thresholds can accelerate or impede learning; therefore, adjust them as clinical variables.
Artifact and poor signal quality can make reinforcement effectively random, undermining stable skill acquisition.
Feedback timing matters because delayed reward can strengthen the wrong response or an artifact-linked strategy.
Biofeedback modalities and neurofeedback are best treated as a unified learning toolkit, allowing you to select the clearest signal, prevent accidental training, and strengthen transfer into daily life.
Generalization improves when clients can describe and re-create the target state, with support from transfer practice.


Glossary
accidental training: an unintended learning process in which the client is reinforced for artifacts, compensatory strategies, or non-target behaviors because the feedback is not tightly contingent on the intended physiological state.
artifact: a signal component that contaminates recorded data and can distort what is reinforced during training.
biofeedback: a method for learning voluntary control over physiological processes by receiving real-time information about a biological signal.
classical conditioning: a form of learning in which a neutral cue becomes associated with a meaningful event and begins to elicit a learned response.
continuous reinforcement: a reinforcement pattern in which a reward is delivered whenever the target criterion is met.
expectancy: a client’s belief about what will happen in treatment, which can shape attention, motivation, perceived control, and symptom experience, thereby influencing learning and outcomes.
explicit learning: a conscious learning process involving deliberate strategy selection and awareness of what is being practiced.
fixed interval schedule: a reinforcement schedule in which the system checks performance at set time intervals and delivers a reward only when criteria are met at those instants.
fixed ratio schedule: a reinforcement schedule in which a reward is delivered after a set number of successful responses.
generalization: a transfer of a learned skill from the training context into daily-life situations and demands.
implicit operant learning: an unconscious learning process in which neural patterns are strengthened through reinforcement without the learner being able to articulate how the change occurred.
interoception: a sense of internal bodily states such as heart rhythm, breathing, muscle tension, and autonomic arousal.
latency: a time delay between a physiological event and the delivery of feedback that can affect what is learned.
neurofeedback: a form of biofeedback that uses measures of brain activity, commonly EEG, to reinforce targeted neural patterns.
operant conditioning: a form of learning in which behavior changes because consequences follow it, strengthening responses that produce reward.
phenomenological awareness: a clinician-supported ability to notice, describe, and make meaning of the subjective experience of the target state.
placebo effect: a change in symptoms or performance driven by expectancy, meaning, and perceived control rather than a specific active ingredient alone.
prediction error: a difference between what is expected and what occurs that drives learning in reinforcement learning models.
random reinforcement reward: a noncontingent feedback pattern in which rewards are only weakly related to the learner’s real-time signal.
reinforcement learning: a learning framework in which behavior is shaped by feedback and prediction errors that update expectations.
reinforcement schedule: a rule that determines when and how feedback rewards are delivered in response to performance.
reward feel: a client’s subjective sense of how responsive and predictable the feedback is, meaning whether rewards seem clearly tied to their efforts and internal state changes.
shaping: a learning method in which reinforcement is delivered for successive approximations that move progressively toward the target behavior.
threshold setting: a parameter selection process that defines what counts as “success” for reward delivery in a feedback system.
transfer: a deliberate process for carrying the trained self-regulation skill into daily life by practicing it with reduced or no feedback and in contexts that resemble real-world demands.
transfer trials: a practice method that reduces or removes feedback so the learner must achieve the target state under more naturalistic conditions.
variable interval schedule: a reinforcement schedule in which reward opportunities occur at varying time intervals.
variable ratio schedule: a reinforcement schedule in which reward occurs after a varying number of successful responses.
References
Davelaar, E. J. (2018). Mechanisms of neurofeedback: A computational-theoretic approach. Neuroscience, 378, 175–188. https://doi.org/10.1016/j.neuroscience.2017.05.052
Hróbjartsson, A., & Gøtzsche, P. C. (2010). Placebo interventions for all clinical conditions. Cochrane Database of Systematic Reviews, 2010(1), CD003974. https://doi.org/10.1002/14651858.CD003974.pub3
Kerson, C., Sherlin, L. H., & Davelaar, E. J. (2025). Neurofeedback, biofeedback, and basic learning theory: Revisiting the 2011 conceptual framework. Applied Psychophysiology and Biofeedback. https://doi.org/10.1007/s10484-025-09756-4
Lehrer, P. M., Kaur, K., Sharma, A., Shah, K., Huseby, R., Bhavsar, J., Sgobba, P., & Zhang, Y. (2020). Heart rate variability biofeedback improves emotional and physical health and performance: A systematic review and meta-analysis. Applied Psychophysiology and Biofeedback, 45(3), 109–129. https://doi.org/10.1007/s10484-020-09466-z
Lubianiker, N., Paret, C., Dayan, P., & Hendler, T. (2022). Neurofeedback through the lens of reinforcement learning. Trends in Neurosciences, 45(8), 579–593. https://doi.org/10.1016/j.tins.2022.03.008
Luft, C. D. (2014). Learning from feedback: The neural mechanisms of feedback processing facilitating better performance. Behavioural Brain Research, 261, 356–368. https://doi.org/10.1016/j.bbr.2013.12.043
Schwartz, M. S., & Andrasik, F. (2017). Biofeedback: A practitioner’s guide (4th ed.). Guilford Press.
Sherlin, L. H., Arns, M., Lubar, J., Heinrich, H., Kerson, C., Strehl, U., & Sterman, M. B. (2011). Neurofeedback and basic learning theory: Implications for research and practice. Journal of Neurotherapy, 15(4), 292–304. https://doi.org/10.1080/10874208.2011.623089
Sitaram, R., Sanchez-Corzo, A., Vargas, G., Cortese, A., El-Deredy, W., Jackson, A., & Fetz, E. (2024). Mechanisms of brain self-regulation: Psychological factors, mechanistic models and neural substrates. Philosophical Transactions of the Royal Society B: Biological Sciences, 379(1915), 20230093. https://doi.org/10.1098/rstb.2023.0093
Thibault, R. T., Veissière, S., Olson, J. A., & Raz, A. (2018). Treating ADHD with suggestion: Neurofeedback and placebo therapeutics. Journal of Attention Disorders, 22(8), 707–711. https://doi.org/10.1177/1087054718770012
Valentin, V. V., Maddox, W. T., & Ashby, F. G. (2014). A computational model of the temporal dynamics of plasticity in procedural learning: Sensitivity to feedback timing. Frontiers in Psychology, 5, 643. https://doi.org/10.3389/fpsyg.2014.00643
About the Author
Fred Shaffer earned his PhD in Psychology from Oklahoma State University. He earned BCIA certifications in Biofeedback and HRV Biofeedback. Fred is an Allen Fellow and Professor of Psychology at Truman State University, where has has taught for 50 years. He is a Biological Psychologist who consults and lectures in heart rate variability biofeedback, Physiological Psychology, and Psychopharmacology. Fred helped to edit Evidence-Based Practice in Biofeedback and Neurofeedback (3rd and 4th eds.) and helped to maintain BCIA's certification programs.

Support Our Friends








Comments