artifact_detection¶
Iterative z-score artifact detection for continuous time-series data (primarily sleep EEG). Produces a logical mask identifying samples that should be excluded from downstream analysis.
Entry point¶
artifacts = detect_artifacts(data, Fs, 'Name', Value, ...)
data—Nx1 double, single-channel time series (required)Fs— scalar double, sampling frequency in Hz (required)Returns
1xN logical—trueat samples flagged as artifact
See the docstring in detect_artifacts.m for every name-value pair. The defaults are tuned for lab-standard sleep EEG at 100–500 Hz.
Algorithm overview¶
The pipeline runs in four stages. The input is a 1-D time series; the output is a logical mask the same length as the input.
+---------------------------+
raw data ------> | Stage 1: Pre-filter |
| - NaN / Inf samples |
| - Flat stretches |
| - ±10σ outliers |
| - Spectral slope test |
| - Interpolate bad spots |
+------------+--------------+
|
data_fixed (no NaN/gaps)
|
+--------------------+--------------------+
| |
v v
+---------+---------+ +----------+---------+
| Stage 2: HF band | | Stage 3: BB band |
| high-pass 35 Hz | | high-pass 0.1 Hz |
| envelope -> log | | envelope -> log |
| iterative z-score | | iterative z-score |
+---------+---------+ +----------+---------+
| |
+--------------------+--------------------+
v
+---------------------------+
| Stage 4: Union + buffer |
| hf | bb | pre-flagged |
| optional ±buffer secs |
+------------+--------------+
|
artifacts
Stage 1 — Preliminary bad-index detection¶
Before the main detection loop runs, the input is screened for gross problems. Flagged samples become “seeds” that are (a) excluded from z-score statistics and (b) always marked artifact in the final output. The data is then gap-filled so the downstream filters see a continuous signal.
1a. NaN / Inf samples — flagged directly.
1b. Flat regions — get_chunks(data, Fs) returns, among other things, a logical mask is_flat marking stretches of identical consecutive values (constant signal, typically disconnected electrode or saturation). Exact threshold for “flat” is defined inside get_chunks.
1c. Outlier samples — after the above are excluded, compute μ and σ of the remaining good data. Flag any sample outside μ ± 10σ. Hard-coded multiplier (outlier_scalar = 10) handles gross amplitude glitches the iterative z-score might dilute.
1d. Spectral slope test (enabled by default, slope_test=true)
Healthy EEG has a negative log–log power slope (roughly 1/f). Broken channels, amplifier saturation, or electrical contamination flatten or reverse this slope.
Compute a multitaper spectrogram of the whole recording:
Frequency range:
[1, min(55, Fs/2)]HzTime–bandwidth product: 10, 19 tapers (maximum recommended for TW=10)
Window: 10 s, step 5 s
Uses
multitaper_spectrogram_mex(compiled MEX for speed)
For each time window, fit
log(power) = a + b·log(freq)bypolyfit(..., 1).A window is “bad” when
b > slope_crit(default-0.5) — i.e., the slope is not steep enough negative.Interpolate the per-window bad flag to sample rate with
interp1(..., 'nearest', ...). NaN tails (before first / after last window center) default to bad.
1e. Gap interpolation — any of the above mark a sample bad_inds(i) = true. Linear interpolation fills those holes into a separate data_fixed array so Stages 2–3 see a continuous signal without filter transients from NaN-filled gaps.
Stage 2 & 3 — Band-specific envelope detection¶
The same iterative pipeline runs twice, once per frequency band, with different high-pass cutoffs and thresholds. Implemented in the local compute_artifacts(...) helper.
Filter design — zero-phase 4th-order Butterworth IIR high-pass, designed via designfilt('highpassiir', ...) with PassbandRipple=0.2. Two variants by default:
High-frequency (HF): cutoff
hf_pass = 35 Hz, thresholdhf_crit = 5.5. Targets muscle (EMG) contamination and electrical noise.Broadband (BB): cutoff
bb_pass = 0.1 Hz, thresholdbb_crit = 5.5. Targets slow drifts and large-amplitude motion/sweat artifacts that survive DC removal.
Filters can be prebuilt and passed via 'hpFilt_high' / 'hpFilt_broad' to avoid re-designing on repeated calls (e.g., in a per-subject loop). Passing 'return_filts_only', true returns [hpFilt_high, hpFilt_broad] and exits — useful for caching.
Per-band pipeline:
Zero-phase filter the full
data_fixedwithfiltfilt. This removes group delay so artifact timing is accurate.Envelope:
abs(hilbert(y))— instantaneous amplitude.Smooth:
movmeanoversmooth_duration = 2 s.Log transform:
log(y)— converts the approximately log-normal envelope to an approximately normal distribution, so z-score statistics are meaningful.Detrend (on by default): subtract
movmedian(y, detrend_duration = 300 s). Moving median is chosen over mean because it’s robust to the outliers we’re about to detect — using the mean would let big artifacts pull the trend toward themselves. 300 s (5 min) is a good compromise: short enough to track sleep-stage-related envelope changes, long enough to be insensitive to transient spikes.Iterative z-score:
Seed
detected_artifacts = bad_inds(Stage 1 flags).Compute
medianandMAD(ormean/stdifzscore_method='standard') over~detected_artifacts & ~isexcluded.Z-score the whole envelope:
z = (y - μ) / σ.Flag
|z| > crit.Loop: recompute statistics from the remaining unflagged samples, re-z-score, re-threshold. Repeat until no new samples exceed the criterion.
The iterative refinement is the key idea — a single pass would have artifact energy polluting the statistics. Each iteration purges the current outliers, tightens the distribution, and exposes the next tier of artifacts. Typical recordings converge in 3–6 iterations.
Why robust statistics (default) — with zscore_method='robust', the median is ~0.77× efficient relative to the mean for Gaussian data, but it is not dragged around by outliers. MAD (median absolute deviation) is similarly robust vs std. For sleep EEG, where 1–10% of a full night can be artifact, robust converges to the correct distribution faster and is the lab-standard default.
Stage 4 — Combine, optionally buffer¶
Union:
artifacts = hf_artifacts | bb_artifacts | bad_inds
Any single criterion is sufficient to flag a sample.
Exclusion mode (exclude_mode):
'data'(default) — samples markedisexcludedare ignored when computing z-score statistics, but they are not forced into the output. This is for cases where you want to suppress (e.g.) a known stimulus period from biasing the z-score without classifying it as artifact per se.'artifact'— theisexcludedmask is OR’d into the output. Use this when you want the excluded region flagged as artifact downstream.
Buffer (buffer_duration, default 0) — extend each contiguous artifact run by ±buffer_duration seconds on both sides. Detected via consecutive_runs(artifacts), then each run’s start/end indices get stretched and the buffer block is set true. Useful when filtering or spectrogram analysis can have edge effects up to a window half-length around a real artifact.
Defaults at a glance¶
Parameter |
Default |
What it controls |
|---|---|---|
|
|
median/MAD vs mean/std |
|
5.5 |
HF z-score threshold |
|
35 Hz |
HF high-pass cutoff |
|
|
remove moving median from HF envelope |
|
5.5 |
BB z-score threshold |
|
0.1 Hz |
BB high-pass cutoff |
|
|
remove moving median from BB envelope |
|
|
enable 1/f slope check |
|
-0.5 |
min acceptable log-log slope |
|
2 s |
envelope smoothing window |
|
300 s |
moving-median detrend window |
|
0 s |
edge buffer around detected artifacts |
Legacy / pre-lab-standard settings¶
Earlier versions used zscore_method='standard', hf_crit=bb_crit=4.5, slope_test=false. To reproduce results from that era:
artifacts_old = detect_artifacts(data, Fs, ...
'zscore_method', 'standard', ...
'hf_crit', 4.5, 'bb_crit', 4.5, ...
'slope_test', false);
Diagnostic modes¶
'verbose'→ prints parameter summary and per-band iteration counts.'diagnostic_plot'→ opens a stacked-axes figure per band showing filtered signal, Hilbert magnitude, smoothed magnitude, log-smoothed magnitude, and (if*_detrend=true) the detrended log-smoothed magnitude, all linked on x.'histogram_plot'→ shows the z-score histogram redrawing at each iteration of the iterative loop. Useful to visualize convergence.
Output shape¶
artifacts is always returned as a 1xN row vector (N = length(data)), regardless of whether data was row or column. Use ~artifacts to index the good samples.
Example¶
Fs = 200; % lab-standard sleep EEG rate
data = load_my_eeg(subject_id); % Nx1 double
% Basic call — robust defaults, both bands, slope test on
artifacts = detect_artifacts(data, Fs);
% Verbose with a 1-second buffer around each artifact
artifacts = detect_artifacts(data, Fs, ...
'verbose', true, ...
'buffer_duration', 1);
% Cache filters for per-epoch looping
filts = detect_artifacts([], Fs, 'return_filts_only', true);
hpFilt_high = filts(1);
hpFilt_broad = filts(2);
for i = 1:n_epochs
ep_artifacts = detect_artifacts(epochs{i}, Fs, ...
'hpFilt_high', hpFilt_high, ...
'hpFilt_broad', hpFilt_broad);
...
end
Dependencies¶
multitaper_spectrogram_mex— from the lab’smultitapersubmodule (required whenslope_test=true)get_chunks— fromutils/data_processing/consecutive_runs— fromutils/data_processing/(required whenbuffer_duration > 0)Signal Processing Toolbox:
designfilt,filtfilt,hilbert,polyfit,movmean,movmedian
All lab dependencies are committed as siblings in this repo; get_chunks and consecutive_runs are small, self-contained helpers.
Files in this directory¶
detect_artifacts.m— the entry point and all supporting local functionsREADME.md— this filerust/— pure-Rust port (band-detection path; optionalslope_testfeature pulls inmultitaper_rsfor the MTS-based branch). Validated against MATLAB R2025a on a synthetic-EEG test battery. Seerust/README.md.
Rust port¶
A Rust port lives in rust/. It implements the band-detection
pipeline (broadband + high-frequency envelope detection with iterative
robust z-score), the flat-run mask, and the ±10σ outlier-noise
backstop. The slope_test Cargo feature additionally pulls in
multitaper_rs to provide the slope-test branch.
Cross-version parity is gated by a synthetic-EEG fixture battery
(pink noise + slow oscillation + spindle bursts + line noise +
motion-artifact spikes + flat runs) generated by
DYNAM-O_DesktopApp/scripts/sim_eeg.m and dumped to .npy. Per-variant
Rust↔MATLAB-no-slope sample-disagreement budgets:
Variant |
Budget (samples) |
|---|---|
|
≤ 5 |
|
≤ 30 |
|
≤ 5 |
|
≤ 30 |
|
≤ 50 |
Build and test:
cd rust
cargo test # band-detection parity + ground-truth
cargo test --features slope_test # also runs the slope-test branch
cargo check --target wasm32-unknown-unknown