# PSTH (peri-stimulus time histogram)

## Origin in electrophysiology

PSTH is a term inherited from electrophysiology. In the spike-counting world, you take a long extracellular recording, find every spike, line up a window around each behavioural event of interest (cue, reward, lever press), and count spikes in fine time bins relative to that event. Average across events and you get a histogram that says: "in the 200 ms after the cue, the neuron fires at roughly 30 Hz on average; before the cue, it fires at about 10 Hz." The histogram shape is the neuron's average response, time-locked to the event.

In [fiber photometry](fiber_photometry.md) the operation is identical in spirit but the underlying signal is different. Instead of discrete spikes you have a continuous fluorescence trace (typically z-scored dF/F) that proxies bulk calcium activity in the recorded region. There is nothing to count, only to slice. The "histogram" becomes a continuous amplitude trace: an average of the z-score waveform locked to the event.

The "histogram" name is therefore a vestige of the electrophysiology origin. In photometry it is more accurate to call it an *event-aligned average*, but the field has standardised on PSTH and that is what GuPPy uses.

## Constructing and reading a PSTH

A PSTH (peri-stimulus time histogram) is the across-event average of a continuous signal in a fixed window around each event. Its purpose is to isolate the event-locked component of the signal: the part that consistently appears at the same time relative to each event, separated from spontaneous activity, drift, and noise that do not. Operationally, it is computed by taking a z-scored signal, a list of event timestamps, and a pre and post window in seconds; extracting the corresponding window around each event; and averaging the extracted windows point by point.

PSTHs are computed on the z-scored trace rather than on raw fluorescence or dF/F because z-score puts different recordings on a comparable scale, with values expressed in standard-deviations-of-noise units. See the [explainer on z-score normalization](zscore.md) for more information.

![Four-panel figure walking through the PSTH operation. Panels 1 and 2 span the full figure width; panels 3 and 4 share the bottom row, each taking half the width. A small legend strip between panels 1 and 2 names the four colors used throughout: blue for the z-score trace, green for event timestamps, orange for the pure-noise event, and purple for the artifact event. Panel 1: full session z-score trace with event timestamps as green vertical lines. Panel 2: same trace with each pre/post extraction window shaded blue, and the pure-noise and artifact event windows shaded orange and purple. Panel 3: every extracted window overlaid on a shared event-relative time axis, showing both the consistent event-locked peak and the event-by-event variability around it. Panel 4: the event-averaged PSTH with SEM band, isolating the event-locked component.](../_static/images/psth_explainer/fig1_psth_walkthrough.svg)

A single event-aligned trace is dominated by noise and by signal unrelated to the event. The panel-3 overlay makes both visible: alongside the typical responses sit a pure-noise trace (orange) with no event-locked activity and an artifact trace (purple) with a spurious post-event peak. Averaging across events recovers the event-locked component because it is the only thing that lines up. Spontaneous transients, slow drift, and the purple trace's artifact each land at a random offset relative to `t = 0`, so they are averaged out across the window and contribute negligibly to the mean. Panel 4 is what survives that smearing: the part of the signal that consistently appears at the same time across events.

The recovery only works with enough events. With small event counts a few loud individual traces can dominate the average, and the SEM (which shrinks as `1/sqrt(N)`) is itself unreliable as an uncertainty estimate. Where "too few" begins depends on the recording's SNR and the across-event variability, but it is a study-design constraint that the analysis cannot rescue.

The flip side is that PSTH cannot reveal activity that is not consistently locked to the chosen event; transient detection and [cross-correlation](cross_correlation.md) are tools for those questions.

### Correction for long-term drift

Long photometry sessions are not flat. The trace drifts slowly up and down across the recording for reasons unrelated to the events of interest, including residual photobleaching, gradual changes in the animal's general state, and slow shifts at the rig. Z-scoring the whole session does not remove this drift. So an event-aligned trace extracted from early in the session has a different starting height than one extracted from late in the session, and the average of those traces inherits the spread: the mean PSTH no longer sits cleanly at zero before the event, and the SEM band is widened by event-to-event differences in starting height that have nothing to do with the response.

![Three-panel figure. Top: a 600-second z-score trace with a strong monotonic drift carrying the baseline from roughly z = +1.5 at the start of the session down to roughly z = -1.5 at the end; ten event windows are spread across the session, three of them highlighted in blue, red, and green at the start, middle, and end while the other seven are shaded pale gray. Middle: the three highlighted event-aligned traces extracted on a shared event-relative time axis; each has the same event-locked peak and undershoot, but their pre-event baselines sit at clearly different levels (positive z for the start-of-session trace, near zero for the middle, negative z for the end), reflecting where in the session each event happened to occur. Bottom: the same three event-aligned traces after per-event pre-event mean subtraction; every trace now starts at zero pre-event and the response itself is unchanged.](../_static/images/psth_explainer/fig2_session_drift.svg)

Per-event baseline correction takes the simplest possible approach: rather than modelling drift across the whole session, it removes whatever the drift contributed to each event-aligned trace's own pre-event window. The middle panel of the figure makes the problem concrete. The three highlighted traces have effectively identical event responses, yet their pre-event baselines sit at noticeably different levels: positive for the start-of-session trace (blue), near zero for the middle (red), negative for the end (green). Each baseline level matches where the slow drift in the top panel happens to be when that event fires. The fix is to compute the mean of each trace over the shaded yellow baseline window and subtract it from that trace. The bottom panel shows the result: every trace is anchored at zero pre-event regardless of where in the session the event happened, and the event response itself is unchanged.

This correction sits on top of the session-wide z-score rather than replacing it: the two together give a y-axis in noise units (from z-score) that starts cleanly at zero pre-event (from baseline correction).

Why a separate subtraction step at all? You might wonder whether better z-scoring could absorb the drift directly. It cannot, and the reason is structural. The mean $\mu$ and standard deviation $\sigma$ in $z = (x - \mu) / \sigma$ are *single numbers* computed over the entire recording, not functions of time. Subtracting $\mu$ from every sample shifts the whole trace down by the same amount, so it removes only the constant part of any drift, not its time-varying part. Concretely: if the drift is linear, $\text{drift}(t) = a \cdot t + b$, then $\mu = a \cdot T/2 + b$, and subtracting $\mu$ leaves $a \cdot t - a \cdot T/2$, still linear in $t$ with the slope $a$ unchanged. More generally, z-score is an affine map applied uniformly in time: it can shift and scale any input but not change its shape. Whatever time-varying structure was in $x(t)$ is also in $z(t)$. Removing drift therefore requires a time-aware operation that uses a different baseline at each time point, which is exactly what per-event baseline correction does, one event at a time.

### Summary statistics

Once a PSTH has been built and corrected for drift, scalars summarise its response into single numbers that can be compared across conditions, sessions, or animals. Three are commonly reported:

- **Peak amplitude**: the largest value of the PSTH inside a chosen post-event window.
- **Peak latency**: the time at which that maximum occurs, relative to the event.
- **AUC** (area under the curve): the integral of the PSTH over the same window.

For responses that go below baseline (suppressions, omitted-reward dips), the relevant version of peak amplitude is the signed minimum rather than the maximum, and peak latency is the time of that minimum.

The three scalars answer two different kinds of question. Peak amplitude and AUC both measure *how big* the response was, and they can disagree in ways that reveal what each is actually measuring. Peak latency measures *when* the response was largest; two recordings can have the same peak amplitude but different latencies, which generally tells a different biological story (early sensory vs late cognitive, for instance). Latency does not interact with response shape the same way the magnitude pair does, so we focus first on peak amplitude and AUC to understand them in a comparative sense.

![3x2 grid of PSTH plots. Each row is labelled with a centered bold header above the row: "canonical", "broad sustained", and "real + artifact". Each row shows the same response shape twice: peak amplitude on the left panel (red marker at the maximum, dotted vertical line dropping to zero, peak value labelled in the corner) and AUC on the right panel (green fill under the curve across the post-event window, AUC value labelled in the corner). Canonical row: a clean Gaussian peak; peak ~3.0 and AUC ~4.5 give consistent moderate-magnitude readings. Broad sustained row: a low-amplitude long-duration response; peak ~1.2 and AUC ~5.4 because the trace stays elevated for several seconds. Real + artifact row: a small real response with a tall narrow artifact spike; peak ~4.5 latches onto the artifact while AUC ~2.2 stays close to the real response.](../_static/images/psth_explainer/fig3_peak_vs_auc.svg)

The top row is the canonical case: a single clean event-locked peak. Peak and AUC tell the same story (peak ~3.0, AUC ~4.5) and reporting either one would be defensible. The lower two rows are where the metrics begin to disagree, and the comparison is what reveals what each metric is actually measuring.

The broad sustained response in the middle row peaks at less than half the height of the canonical case (peak ~1.2 vs ~3.0), and yet its AUC is *higher* (~5.4 vs ~4.5) because the trace stays elevated for several seconds. Peak collapses on the lower amplitude; AUC adds across the window and rewards the duration. The opposite asymmetry shows up in the bottom row, where a small real response is contaminated by a tall narrow artifact spike. Peak latches onto the spike and reports a value (~4.5) much larger than the real response, while AUC stays close to what the real response alone would contribute (~2.2) because the spike is too narrow to add much area. AUC's robustness here is asymmetric, though: a *broad* artifact (a slow drift bump that survived correction, a contaminating long-duration nuisance signal) would inflate AUC the same way a narrow spike inflates peak.

The structural reason behind both disagreements is that peak is the maximum, a *single sample*, while AUC is an *integral* over the window. A tall narrow shape and a short broad shape can have the same peak but very different AUCs, or the same AUC but very different peaks. Reporting both is therefore standard practice: together they distinguish "taller in this condition" from "taller-and-broader in this condition", and they protect against the artifact-spike failure mode where a single noisy sample dominates the summary.

Peak latency tells a different kind of story. Where peak amplitude and AUC are competing answers to *how big*, latency answers *when*. Two responses can have nearly identical peak amplitude and AUC, and yet peak at very different times relative to the event — and that distinction is biology, not noise. Primary sensory regions peak in tens of milliseconds; downstream associative regions in hundreds; striatal dopamine in the few-hundred-millisecond range. A predicted reward peaks earlier than a surprising one. Peak amplitude alone cannot separate any of these.

![Two-panel figure illustrating peak latency. Both panels show a Gaussian PSTH with the same peak amplitude (~3.0). Left panel: an early peak at t = 1.0 s, labelled "early peak". Right panel: a late peak at t = 3.5 s, labelled "late peak". A red dot marks the maximum in each panel and a corner annotation reports peak latency. The two panels share the y-axis. The visual point is that peak amplitude alone cannot distinguish the two responses; latency is the scalar that does.](../_static/images/psth_explainer/fig4_peak_latency.svg)

Latency does inherit one limitation from peak amplitude. Both are read off a single sample — the maximum — so a narrow contamination spike will hijack both: not just the magnitude reading but also the timing reading. AUC, by contrast, has no analogous timing scalar; the integral has no preferred time point.

Both metrics depend on the post-event window itself, which is a user-set modelling decision rather than a fact of the data. A window that is too short truncates broad sustained responses, leaving AUC understated and possibly clipping a late peak entirely. A window that is too long dilutes a sharp response with post-event activity that is no longer event-locked, which inflates AUC without changing peak. Different windows on the same PSTH produce different but equally valid summaries that mean slightly different things, the same caveat that applies to the burst-rejection threshold below.

## Event rejection

Not every event timestamp produces a usable event-aligned trace. Some sit too close to the start or end of the recording for the extraction window to fit; some arrive in clusters tight enough that adjacent windows overlap and contaminate each other. Both cases would distort the average if left in. Two filters address these failure modes.

**Edge rejection** drops events whose pre or post extraction window would extend past the recording bounds. There is no data outside the recording, so the missing samples cannot be averaged honestly, and keeping such an event would distort the event-aligned average and any downstream statistics. The figure below shows this visually: the extraction window of an event near the recording start clips past t = 0, into a "no recording" zone where no data exists.

![Single-panel figure illustrating edge rejection on a 30-second synthetic session. A rejected event at t = 2 s is marked with a red vertical line because its pre-event extraction window would extend into a gray "window out of recording" zone outside the recording start, drawn as a solid black vertical line at t = 0. A second event at t = 17 s is marked with a green vertical line because its window fits inside the recording. The recording end at t = 30 is also marked with a black line. A legend below the panel identifies red as rejected events and green as accepted events.](../_static/images/psth_explainer/fig5_edge_rejection.svg)

**Burst rejection** drops events that fall closer to a previous kept event than a user-set inter-event threshold. This matters for behaviours that come in bursts (rapid licks, repeated lever presses) where adjacent extraction windows would otherwise overlap and contaminate each other. The threshold is task-dependent and is a modelling choice rather than a tunable with a formal optimum: the same recording with two different thresholds produces two valid PSTHs that mean slightly different things.

![Single-panel figure illustrating burst rejection on a synthetic session. A normal event at t = 5 s is kept (green vertical line). A cluster of three events follows in the middle of the recording: the first cluster event at t = 15 s is kept (green) and the next two (t = 16.8 s and t = 18.6 s) are marked in red because they fall within the user-set inter-event threshold of the first kept event. A final normal event at t = 30 s is kept (green). The burst sits between two well-spaced normal events, making the compressed cluster visible by contrast. A legend below the panel identifies red as rejected events and green as accepted events.](../_static/images/psth_explainer/fig6_burst_rejection.svg)