read_EDF

read_EDF(edf_fname, varargin)

READ_EDF Load EDF / EDF+ / EDF.gz / EDF.zst file with full metadata, annotations, and MEX acceleration

READ_EDF reads European Data Format (EDF / EDF+) files using a compiled MEX reader when available, and a pure MATLAB fallback otherwise. Plain ‘.edf’, gzip-compressed ‘.edf.gz’, and zstd-compressed ‘.edf.zst’ files are all accepted; compressed files are streamed directly through zlib or libzstd with no temp file (MEX path). The function provides full access to header metadata, per-signal scaling (digital-to-physical conversion), and EDF+ annotations.

Usage:
[header, signal_header, signal_cell, annotations] = …

read_EDF(filename, ‘Channels’, {‘EEG Fpz-Cz’})

Inputs:

edf_fname : string - path to a .edf, .edf.gz, or .edf.zst file ‘Channels’ : cell array of strings - which channels to emit

(default {} = all). Each entry is parsed as either a passthrough label or an expression (see “Re-referencing on load” below).

‘References’cell array of ‘NAME = expr’ strings (default {}).

Defines named mean / linear-combination signals once, then makes the name usable inside any later ‘References’ or ‘Channels’ spec. Evaluated in declared order, so a later reference can build on an earlier one. References that are not also listed in ‘Channels’ are dropped from the returned outputs (they exist purely to support derivations).

‘Epochs’1x2 vector [start_epoch end_epoch] (0-indexed,

default: all)

‘Verbose’ : logical - print progress / status (default false) ‘RepairHeader’ : logical - correct an invalid num_data_records

in the file header and save a ‘_fixed’ copy (default false; ignored with a warning for compressed inputs).

‘forceMATLAB’logical - disable the MEX backend and use the

pure-MATLAB reader (default false). Compressed inputs are decompressed to a temp file in this path and auto-cleaned-up on return.

‘debug’ : logical - debug mode for MEX (default false) ‘deidentify’ : logical - blank PHI fields and save a

‘_deidentified’ copy (default false; ignored with a warning for compressed inputs).

Outputs:

header : struct of EDF file-level metadata signal_header : struct array of per-signal headers signal_cell : cell array, one signal vector per channel,

in physical units (scaled and offset)

annotations : struct array of EDF+ annotations

Each ‘Channels’ entry is preprocessed (greedy longest-match wraps real labels in ‘$…$’ tokens) and parsed as an expression in this grammar:

spec := [name ‘=’] expr expr := [sign] term (sign term)* term := factor ((‘*’|’/’) factor)* factor := [‘+’|’-’] (number | ‘$’ label ‘$’ | ‘(’ expr ‘)’

mean ‘(’ expr (‘,’ expr)+ ‘)’)

sign := ‘+’ | ‘-’

The output of parsing is always a flat linear combination sum_j coef_j * signal[leaf_j] — number literals fold into the leaf coefficients at parse time, so ‘(1/3)*C1 - 4*(C2-C3)/7’ becomes leaves={C1,C2,C3}, terms=[1/3, -4/7, 4/7].

Linearity is enforced: a single term may contain at most one signal-valued factor. The following are rejected with read_EDF:ParseError:

signal + scalar (DC offsets not supported) signal * signal (not linear) signal / signal (not linear) scalar / signal (not linear)

The optional ‘name =’ prefix sets the output channel’s label (alias). Without it the output label is the raw spec string. Aliases are mandatory for References. Aliased Channels entries are visible by name to subsequent Channels entries (chaining); unaliased entries are one-shot.

mean(arg1, …, argN) requires N >= 2 and produces (1/N) * sum_i argi. Args may be expressions, not just leaves: mean(C1, C2-C3) works.

Examples (each comment is the equivalent math):

% Inline linked-mastoid C3 - (A1+A2)/2 ‘C3 - mean(A1, A2)’

% Same, but bind a reusable name first ‘References’, {‘LM = mean(A1, A2)’}, … ‘Channels’, {‘C3-LM’, ‘C4-LM’, ‘Fpz_LM = -LM’}

% Sum of two channels A1 + A2 ‘A1 + A2’

% Reference building on a reference ‘References’, {‘M = mean(A1,A2)’, ‘R = C3 - M’}, … ‘Channels’, {‘R’} % emits C3 - M

% Arbitrary linear combination ‘(1/3)*C1 - 4*(C2 - C3)/7’ % weighted ‘0.7*C3 + 0.3*C4’ % weighted average ‘mean(C1, 2*C2 + C3)’ % expr inside mean

Every ‘Channels’ and ‘References’ spec is preprocessed before parsing. A single greedy longest-match pass walks the augmented label set (real EDF labels + earlier References + earlier aliased Channels outputs) and wraps each occurrence in ‘$…$’ tokens. The parser then sees only ‘$LABEL$’ tokens, operators (‘+’, ‘-’, ‘,’, ‘(’, ‘)’, ‘=’), and ‘mean’.

EDF labels: {‘EEG C3 - A2’, ‘EEG C3’, ‘A2’, ‘EMG’}

‘EEG C3 - A2’ -> ‘$EEG C3 - A2$’
(passthrough; longest match wraps the

whole spec)

‘EEG C3 - A2 - A2’ -> ‘$EEG C3 - A2$ - $A2$’
((labeled C3-A2) minus A2; longest-

first then leftover)

‘mean(EEG C3, A2)’ -> ‘mean($EEG C3$, $A2$)’ ‘OUT = EEG C3 - A2’ -> ‘OUT = $EEG C3 - A2$’

(alias name is left alone)

User-supplied ‘$…$’ regions are preserved verbatim by the preprocessor, so write ‘$LABEL$’ explicitly to override the longest-match interpretation:

% EDF has both ‘EEG C3 - A2’ (a recording-side reref) AND % ‘EEG C3’ / ‘A2’ as separate channels. read_EDF(f, ‘Channels’, {‘EEG C3 - A2’}) % the labeled channel read_EDF(f, ‘Channels’, {‘$EEG C3$ - $A2$’}) % computed difference

Pass ‘Verbose’, true to see the wrapped form for each spec — useful when you suspect a label-vs-expression collision.

‘$LABEL$’ is also the escape for labels containing characters the parser would otherwise eat (‘+’, ‘,’, ‘(’, ‘)’, ‘=’), labels that start with ‘mean(’, or labels that are substrings of others when you want to force the shorter one:

read_EDF(f, ‘Channels’, {‘$EEG A+B$ - mean($A1$, $A2$)’}) read_EDF(f, ‘Channels’, {‘$mean(LF,RF)$’}) % literal label read_EDF(f, ‘Channels’, {‘$Label = X$’}) % ‘=’ is literal read_EDF(f, ‘Channels’, {‘C3 - mean($A1$, $A2$)’}) % force short A1

Aliased Channels entries become reusable names for any later ‘Channels’ entry, so multi-step derivations can be written inline:

‘Channels’, {‘LM = mean(A1, A2)’, …

‘C3_LM = C3 - LM’, … ‘NewChan = C3_LM + 0.5’}

The difference vs ‘References’: aliased Channels entries are returned as outputs; References outputs are hidden helpers (drop out of the returned cell unless explicitly named in ‘Channels’). Unaliased Channels entries (no ‘=’) don’t pollute the namespace.

read_EDF:UnknownChannel - a leaf in an expression / reference

doesn’t resolve to a label in the augmented set.

read_EDF:RefCollision - a reference name collides with an

EDF label or with an earlier reference (case-insensitive).

read_EDF:RateMismatch - leaves of one spec / reference span

different sampling rates. The toolbox does not resample on the fly; resample after read_EDF or precompute compatible inputs.

read_EDF:BadMean - mean(…) called with fewer than

2 arguments.

read_EDF:ParseError - malformed expression syntax (missing

‘=’ in a Reference, unterminated ‘$’ quote, dangling operator, …).

Validation runs whenever ‘References’ is non-empty or any ‘Channels’ entry contains ‘mean(’, ‘=’, ‘+’, or ‘$’ — even if the caller ignored read_EDF’s outputs. Plain labels and legacy ‘A-B’ strings stay on the existing fast backend (MEX or MATLAB) and are bit-identical to prior versions.

% Plain EDF, all channels [hdr, shdr, sc, ann] = read_EDF(‘sleep.edf’);

% Subset by label, with a legacy A-B reref [~, ~, sc] = read_EDF(‘sleep.edf’, …

‘Channels’, {‘EEG C3-A2’, ‘EEG O2-A1’});

% Linked-mastoid via inline mean() [~, ~, sc] = read_EDF(‘sleep.edf’, …

‘Channels’, {‘C3 - mean(A1, A2)’});

% Linked-mastoid via a named reference reused across channels [~, ~, sc] = read_EDF(‘sleep.edf’, …

‘References’, {‘LM = mean(A1, A2)’}, … ‘Channels’, {‘C3-LM’, ‘C4-LM’, ‘-LM’});

% EDF where a label has a ‘+’ in it (e.g. ‘EEG A+B’). Bare % ‘EEG A+B’ would be parsed as ‘EEG A’ + ‘B’ – escape it: [~, ~, sc] = read_EDF(‘weird.edf’, …

‘Channels’, {‘$EEG A+B$ - mean($A1$, $A2$)’});

% EDF that has both ‘C1’, ‘A2’, AND a labeled ‘C1-A2’ channel. % Bare ‘C1-A2’ returns the labeled channel; ‘$C1$-$A2$’ forces % the computed difference of the raw channels. [~, ~, sc_labeled] = read_EDF(‘overlap.edf’, ‘Channels’, {‘C1-A2’}); [~, ~, sc_computed] = read_EDF(‘overlap.edf’, ‘Channels’, {‘$C1$-$A2$’});

% Compressed input (streamed via zlib, no temp file in MEX path) [hdr, shdr, sc, ann] = read_EDF(‘sleep.edf.gz’);

  • Followed by per-signal headers (16 fields x N signals)

  • Digital samples stored as int16 are scaled to physical units:

    phys_val = phys_min + (dig_val - dig_min) * (phys_max - phys_min) / (dig_max - dig_min)

  • EDF+ annotation channels (labelled ‘EDF Annotations’) contain onset times and event texts in TAL (Time-Annotation List) format.

the bundled zlib (vendored in the ‘zlib/’ subdirectory).

  • Peak memory is one record’s worth of raw bytes plus the per-signal output arrays — files do not need to fit decompressed on disk.

  • RepairHeader and deidentify both modify the file in place, which is not meaningful on a gzip archive; both are silently disabled (with a warning) for .gz inputs. Decompress to .edf first if you need either.