Datapack¶
1. The datapack concept¶
A datapack is the fundamental structural unit of musical content in neumaRk.
It represents a vertical block of musical information synchronized in time, typically corresponding to one or more aligned staves:
- markers
- chords
- notes
- articulations
- dynamics
- lyrics
- formatting
Each datapack describes a coherent temporal sequence (one or more consecutive measures). Musical datapacks are separated from one another by one or more blank lines, which act as structural delimiters. The presence of a blank line implies the conclusion of the preceding datapack.
2. Line types¶
Each line of the datapack has a semantic type.
The type may be:
- explicit, via a marker
- implicit, deduced from the content and position
2.1 Explicit line markers¶
Explicit markers are:
- one, two, or three uppercase letters
- followed by
) - followed by a space
| Marker | Type |
|---|---|
M) |
Markers |
C) |
Chords |
A) |
Articulations |
N) |
Notes |
D) |
Dynamics |
L) |
Lyrics |
F) |
Format |
The markers are optional.
The N) marker admits two morphological 2-character variants,
explicitly provided for by the specification:
N+— new staff introduced in this datapack (§4.5);N2— second voice on the same staff (seeneumaRk_voices.md).
The digit 2 and the sign + replace the ), preserving
the 2-character width invariant required for vertical
alignment.
Likewise, the C) marker admits a variant:
C+— line of alternate chords above the base line (comment-label and line scope inneumaRk_chords.md§7).
The + sign replaces the ) as in N+, preserving the same
width invariant.
3. Logical order of lines (implicit deduction)¶
In the absence of explicit markers, the type of the lines is deduced following the logical order:
- Markers (at most one line, optional)
- Chords
- zero or more lines
- the last is considered the main line
- Note groups, each composed of:
- one Articulations line (optional, first)
- one Notes line (mandatory in the absence of chords)
- one Dynamics line (optional, after — see
neumaRk_dynamics.md) - one Lyrics line (optional, last)
A datapack may contain from 1 to 4 note groups: each group corresponds to a staff of the system (see §4). 4. Format (at most one line, optional, always final)
3.bis Line-type deduction (normative algorithm)¶
§3 gives the logical order of the types; this section formalizes its deduction algorithm: how, in the absence of an explicit marker, the type of a line is determined from content + position. The rule written here is normative — it is the language, not an implementation detail. An explicit marker (§2.1) always wins: the algorithm that follows applies only to lines without a marker.
3.bis.1 Datapack structure¶
Only Markers and Chords are unique per datapack (they are shared by the whole system, §4.1); everything else is a repeated note group, once per (staff × voice):
datapack ::= [Markers]? Head Group{1..8} [Format]?
Head ::= ( [A]? Chord )* // 0+ chord-row, each possibly
// decorated by an A) above
Group ::= [A]? N [D]? [L]? // one (staff × voice)
- Head (Markers + Chords): unique, shared. The articulations line
may precede a chord-row to decorate it (a chord-row may carry its own
rhythm,
neumaRk_chords.md §4); for this reasonA)is admitted beforeC). - Group: repeats up to 8 times = max 4 staves (§4) × 2 voices
per staff (§4.6). Voice 2 is not implicitly deducible — there is no
way to distinguish, from content alone, "voice 2 of the same staff" from "new
staff": it must always be declared with the explicit marker
N2. Thus implicit deduction alone produces up to 4 groups (the staves); the 8 are reached only withN2. - Format: unique, always final (§9).
3.bis.2 Computation model¶
Deduction is a single-pass, per-datapack automaton that scans the lines in source order while maintaining two pieces of state:
lastType— the type of the last line (initEmpty);headClosed—falseuntil the first Notes line of the datapack appears, thentrueforever. It marks the boundary between Head and Groups.
The lastType guards are the transition function of the automaton: they encode
both the order within a group, and the loop-back that opens the
next group (re-entry on A or on N). headClosed prevents Chords from
reappearing once the staves have been entered.
3.bis.3 Pre-filter: decorative lines¶
Before the cascade, a purely decorative line — composed only of
simple barlines, :, compact repeat forms (neumaRk_flow_and_repeats.md §6.1),
dots, spaces, and tabs — is skipped and does not receive a musical type (it remains
structural; its barline decorations belong semantically to the
other lines). Exception: the >/^ rescue in §3.bis.6.
3.bis.4 Classification cascade (normative precedence)¶
On non-decorative lines without a marker, a first-match-wins cascade is applied. The order is binding: the content predicates are deliberately permissive and do not over-match only thanks to (a) this order and (b) the position guard. The triad that defines each type is predicate + guard + exclusions.
| # | Type | Predicate (content) | Guard (position) | Exclusions |
|---|---|---|---|---|
| 1 | Markers | only […] / barline / spaces |
is the first line of the datapack | — |
| 2 | Chords | ≥1 valid chord (neumaRk_chords.md §3) + barline / . / % / comment-label / spaces — or only % (TB1bis) |
head open (!headClosed) |
not slash-only (TB2); not rest-only r/! (TB1) |
| 3 | Articulations | charset of the A) vocabulary (neumaRk_articulations.md) |
lastType ≠ Articulations |
not ^-only (TB3); not a valid notes-line (TB4) |
| 4 | Dynamics | dynamics charset (< > c d f m p s z - . \| :) |
lastType == Notes |
— |
| 5 | Lyrics | word-char + . - _ ' \| : (the most permissive) |
lastType ∈ {Notes, Dynamics, Lyrics} |
— |
| 6 | Notes | default / sink | any | — |
Notes on the guards:
- Markers (line 1): the implicit deduction of Markers occurs only
on the first line of the datapack. A markers-shaped line further down is not
Markers (an explicit
M)remains possible wherever the spec allows it). - Chords (line 2): the normative condition is head open. This is equivalent to
saying that chords live only before the first notes line: chords
are unique and shared (§4.1), and no valid datapack has a chord-row after
a notes-row. The parser realizes the constraint with the positional flag
headClosed(closed at the first Notes line of the datapack) — aligned to §3.bis since 2026-05-25 (FU.3a). - Lyrics (line 5): the predicate is the most permissive (it matches almost any text). This is intentional: the permissiveness is contained by the position guard, and in any case the default type is Notes. It is not narrowed (that would be non-monotonic and would risk changing the classification of existing songs).
- Notes default: line 6 is the sink — every line that reaches the bottom of the
cascade is a notes line. It is also the loop-back mechanism: two consecutive
N)(both sink) are two distinct staves.
3.bis.5 Tiebreakers¶
Three disambiguations resolve cases in which a permissive predicate would match the wrong type:
- TB1 — rest-only → Notes. A line composed only of rest-tokens
(
r/!), barline,.,%, and spaces belongs to the stave (it is a notes line of rests only), not to an empty chord-row. It applies after Chords/Alt and after Articulations:
A7 ← Chords | > . ← Articulations
r ← Notes (TB1) r ← Notes (TB1)
Implemented since 2026-05-25 as a single rule "rest-only forces Notes": the line is intercepted before Chords and Articulations, regardless of
lastType(FU.3b after Articulations + FU.3c after Chords — both produced the wrong type, respectively empty chord-row and Articulations). A guard requires a real rest-tokenr/!or%, so| > |(anacrusis) remains handled by the rescue (§3.bis.6).
- TB1bis —
%-only without a chord-row → Chords. A line composed only of%(plus barline,., spaces — no rest-tokenr/!) with head open (!headClosed) and in the absence of a real chord-row in the datapack is a chord-row of measure repeats only: each%repeats the last chord-measure (even from the preceding datapack — the measure-repeat lookback is cross-datapack). It is the exception to TB1:%without a rest-token is not forced to Notes, but falls into the Chords branch (whoseisChordLinealready accepts%).
| C7 | F7 | ← Chords
| a b c | … ← Notes (closes the head)
| % | % | ← Chords (TB1bis): repeats C7, F7 from the datapack above
| d e f | … ← Notes
Decision 2026-05-26, supersedes the previous principle "if you want a chord-row of only
%you must declare it withC)". Rationale: "chords that repeat while the melody changes" is the most common lead-sheet pattern; forcingC)on every continuation was gratuitous friction. Two guards keep the case narrow:
- no rest-token — a line with
r/!always remains Notes (TB1): rests are unambiguously notes;!sawRealChordRow— if the head already has a chord-row with a real chord, a subsequent%-only line remains Notes (it is a notes-row of repeats), because the datapack's chord-row is already defined.Escape-hatch for the opposite reading (a second notes staff of only
%): the explicit markerN+/N). Known limitation: a datapack of only%(without any notes-row) is read as a chord-row and violates the grammar(A?ND?L?)+(≥1Nis required); declare it withN)if the intent is a standalone notes-row of repeats.
- TB2 — slash-only → Notes. A line of standalone
/only (plus barline,., spaces) is slash rhythm (neumaRk_notes_and_durations.md §10), never a chord-row:/on its own is syntactically a quasi-chord but semantically a note event.
| / / | / / | ← Notes (TB2)
- TB3 —
^-only → Notes. A line of^only (plus|and spaces) is a notes line with ties only (neumaRk_notes_and_durations.md §6), not Articulations.
| ^ | ← Notes (TB3)
- TB4 — valid notes-line → Notes. A line that is a valid notes-line
has priority over Articulations, even if it falls entirely within the charset
of the
A)vocabulary. Necessary because some letters of the vocabulary are also note-names/durations: in particularg(composesglglissando, §9) is also the note G, and the durations1–4are in the charset →g,g4,g g gmatchedarticulations_lineand were stolen from A).
A7 ← Chords
g ← Notes (TB4), not Articulations
Predicate: every token (split on spaces; the barlines
|/:at the edges are stripped) is a valid note-token according to the canonical grammarparse_notes(single source → no divergence), and at least one token carries a real pitch[a-g]/r. Articulation-only tokens (>,tr,-,o,gl,~…) do not matchparse_notes→ the line remains Articulations; a line of only./,/^(placeholder, no real pitch) → remains Articulations. Implemented 2026-05-26 (isNotesLineForTiebreakinmap_functions.cpp), gated also on the>/^/~rescue (§3.bis.6). Principle: in case of A)↔N) ambiguity, the note wins.
3.bis.6 > / ^ rescue¶
A line that the pre-filter (§3.bis.3) would have discarded as decorative — because
> is readable as an anacrusis attached to a barline — but which contains >
or ^ and is entirely articulations-charset, is recovered as
Articulations (here > is an accent, not an anacrusis):
| > | ← Articulations (rescue): accent on the first event of the measure
The rescue is a branch outside the cascade and always produces Articulations (
articulations may open a group in any position); it is subject to the
same constraint lastType ≠ Articulations and the ^-only exclusion (TB3).
First line of the datapack. The rescue presupposes a context above it. In first position (
line_id == 0) a line| > |is read as Markers/anacrusis, not as an accent: the Markers pre-filter (cascade line 1) takes precedence, and>is ambiguous between accent and upbeat when there is no notes line below it to disambiguate. It is a deliberate choice (decision 2026-05-25): in the absence of context, the anacrusis reading prevails.
3.bis.7 Post-pass: AlternateChords promotion¶
The base classification assigns Chords to all implicit chord-rows.
A subsequent pass relabels as AlternateChords (C+, alternate chords
above the base) the consecutive chord-rows that precede the last,
only if no chord-row of the datapack carries an explicit marker (if
the user has chosen the markers, their choice is respected). Limit: max 2 alternate
lines per datapack (E127). Full spec in
docs/feature-alternate-chords.md and neumaRk_chords.md §7.
3.bis.8 Derivations and invariant¶
- The
articulations_linecharset (line 3) derives from the closed vocabulary ofneumaRk_articulations.md(single source): deduction and the renderer semantics must not diverge. - Invariant: the algorithm may only be extended toward what is written
here, never silently change an existing line→type mapping. The
parser↔spec divergences on classification (FU.3a head closed, FU.3b
rest-only after
A), FU.3c rest-only after Chords) were closed on 2026-05-25, verified by difference on the "classification golden" battery (wasm/tests/fixtures/classify_*). FU.1 remains open (articulations charset from the closed vocabulary), which does not change the line→type mapping. - Mapping change 2026-05-26 (TB1bis): the
%-only line without a rest-token, with head open and without a real chord-row, moves from Notes to Chords (§3.bis.5 TB1bis). It is a deliberate and documented change of the mapping, not a silent one — admitted because the language is not yet widespread (no version bump). Covered by the goldensclassify_pct_only_chords(promotion) andclassify_pct_after_real_chords(sawRealChordRowguard). - Correction 2026-05-26 (TB4): the line that is a valid notes-line but falls
within the
A)charset (typically the note-only linesg:g,g4,g g g) moves from Articulations to Notes (§3.bis.5 TB4). It is a bug-fix toward the correct behavior ("the note wins"), not a controversial extension. Covered by the goldensclassify_note_g_after_chordsandclassify_note_g_vs_artic(with the negative> . . .→ remains Articulations).
4. Multiple staves per datapack (multi-stave)¶
A datapack may represent a system with multiple aligned staves
(e.g. melody + bass). The number of staves is given by the number of note
groups (A?ND?L?) present, up to a maximum of 4.
The number of staves may vary from one datapack to another in the same song: one datapack may contain a single staff and the next two or more, without prior declarations.
4.1 Elements shared by the system¶
The following elements are common to all staves of the datapack:
- the Markers line
- the Chords line(s)
- barlines and measure decorators (voltas, change of
meter/key,
$,@,DC,FINE, etc.) - the key and meter of the system
The measure barlines cross all staves vertically.
4.2 Elements independent per staff¶
Each group (A?ND?L?) defines an independent staff with its own:
- Notes (mandatory)
- Articulations, Dynamics, Lyrics (optional)
- clef declared via the inline directive
(@…)as the first token of the Notes line — seeneumaRk_notes_and_durations.md§9. In the absence of a directive, the last clef defined in the context applies (default: treble clef). - ties, beams, and tuplets (all contained within the staff)
4.3 Horizontal alignment¶
All staves of the system must contain the same number of measures: the vertical alignment between staves is temporal.
4.4 Example¶
C) | Gm6 |
N) | g | a | bb | a |
N) | (@F) g,4 bb8 d e4 d | g,4 bb8 d e4 d | g,4 bb8 d e4 d | g,4 bb8 d e4 d |
A 2-staff system: the first in treble clef (default), the second
in bass clef. The chord Gm6 is common to the system.
4.5 Staff continuity between datapacks (N+)¶
When a song has multiple datapacks, each staff (stave) has a persistent identity: the "first staff" of a datapack is the same "first staff" of the previous one — it inherits pitch context, the clef in effect, durCtx, and is rendered in the same vertical position.
The base identity is positional: the first N) of the datapack
corresponds to the first N) of the previous datapack, the second to the
second, and so on.
To add a new staff in a datapack without breaking its
identity with the previous ones, the N+ marker is used in place of N).
N+ <content of the new staff>
The + replaces the ) (it does not add to it): the prefix remains 2
characters as with N), so the vertical alignment of measures remains
identical between adjacent staves.
Rules¶
N)= continuation of the next staff of the previous datapack (FIFO, in source order). It inherits its context (pitch, clef, duration).N+= new staff introduced in this datapack. It starts with fresh context (the orientation reference depends on the initial clef, seeneumaRk_notes_and_durations.md§2.2).- Source order = visual order top→bottom. The
N+marker determines only the identity (new vs. continuation), not the position. N+in the first datapack of the song is admitted but redundant (in the absence of preceding datapacks all staves are necessarily new); it is treated as a normalN).- Unchanged limit: max 4 total staves per datapack (sum of
N) N+).
Errors¶
- E122 —
N) without match: the datapack has moreN)than the number of staves the previous one had.N+is missing (or there is oneN)too many).
Example¶
Datapack 1 (intro with 2 staves, treble + bass):
M) [intro]
C) G7
N) |: <d b>2 <e c>4 | <f d>2 <e c>4 :|
N) (@F) g,4. d'8 e d | f4. d8 e d
Datapack 2 (Theme): adds a third staff at the top (vocal melody), keeping the intro's treble and bass in the middle and at the bottom.
M) [Theme]
C) > | G7
N+ > d8 | b'^ | b2 r8 d,
A) > | . ! | . !
N) > |: <d b>2 <e c>4 | <f d>2 <e c>4 :|
N) > | g,4. d'8 e d | f4. d8 e d
Resulting mapping (for the engine, not visible in source):
| source row | type | identity |
|---|---|---|
1 (N+) |
new | new stave (top) |
2 (N)) |
cont. | = first intro staff (treble) |
3 (N)) |
cont. | = second intro staff (bass) |
The context (last_pitch, clef, durCtx) of staves 2 and 3 of the Theme comes from staves 1 and 2 of the intro; staff 1 of the Theme starts fresh.
Cases not covered¶
The N)/N+ syntax expresses the common situation "I add a
staff at the top/in the middle/at the bottom", but it does not cover:
- selective drop (a datapack that keeps the 2nd staff of the previous one but drops the 1st);
- reorder (swapping the visual order of existing staves while keeping their identity).
For these cases a future extension could introduce an explicit
reference to the staff's internal ID (e.g. N<n>)). Not spec'd in this
version.
4.6 Second voice per staff (N2)¶
Each staff may host an independent second voice, introduced
by the N2 line. Stave and voice are distinct concepts: the staff is the
pentagram, the voice is a musical stream within the staff.
Voice 2 shares the clef, key, meter, barlines, and chord line with voice 1, but maintains an independent persistent musical context. Voice 1 has stems up, voice 2 stems down.
The complete specification of syntax, context, binding, and diagnostics
is in neumaRk_voices.md.
5. Datapack validity rules¶
A datapack is valid if:
- it contains at least one Notes line or one Chords line
- it respects the logical order of the lines
- all musical lines are temporally alignable
It is invalid:
- a datapack with only text lines
- a datapack with a non-final Format line
6. Barlines and measures¶
All musical lines (Markers, Chords, Articulations, Notes, Dynamics, Lyrics):
- contain barlines — simple and compound — and any measure decorators (in the absence of barlines all content belongs to the first measure of the staff)
- implicitly define the division into measures
The Format line is the only exception: it does not admit barlines (neither simple nor compound) and no measure decorators.
6.1 Supported barlines¶
The admitted barlines are:
|simple barline||double barline|.or.|end|:repeat start:|repeat end
The final barline must be preceded by a space.
7. Measure decorators¶
Measure decorators are elements adjacent to a barline and are divided into:
- BEGIN decorators (to the right of the barline)
- END decorators (to the left of the barline)
7.1 BEGIN decorators¶
Positioned immediately to the right of the barline.
Possible decorators:
-
Change of meter and/or key
-
within parentheses
- separated by a comma
- arbitrary order
Examples:
|(3/4,Dm)
|([3+3+2]/8)
-
Volta endings
-
text within square brackets
- optional
+nfor the duration in measures
Example:
|[1.]+4
In the absence of +n, the volta closes automatically at the first :|
encountered within 4 measures (the typical case of [1.] endings).
If no :| appears within the following 4 measures, the volta indicates
an exit — use +n for non-standard cases.
$segno@coda
If present, meter and key must precede every other BEGIN decorator.
7.2 END decorators¶
Positioned immediately to the left of the barline.
Supported decorators:
DCDCal@DCalFINED$D$al@D$alFINEFINEal@- free text within square brackets (graphic annotation)
The $ and @ signs are BEGIN decorators (see §7.1): they identify the point where the segno or coda is located, not a jump toward them.
8. Markers line¶
The Markers line:
- contains markers enclosed within square brackets
- the markers refer to the start of the measure
If a marker is preceded by a barline:
- there must be a space between the barline and the marker
- to avoid ambiguity with the volta decorators
It is best practice to place the flow decorators (DC, D$, coda, etc.) on this line.
9. Format line¶
The Format line may be declared:
- in explicit form, via the marker
F); - in implicit form, if the content of the line is uniquely recognizable as a format line.
The Format line, if present, must be:
- unique per datapack;
- always the last line of the datapack.
In case of ambiguity with a musical line, the musical interpretation always prevails and the line is not considered Format.
It contains alignment indications:
| Symbol | Alignment |
|---|---|
\|* |
LEFT |
*\| |
RIGHT |
\|*\| |
CENTER |
\|**\| |
JUSTIFIED (default) |
10. Margins between datapacks¶
A line:
- that follows a blank line
- that begins with
- - that contains only
-, spaces, tabs, or%
indicates a vertical margin between datapacks.
The depth of the margin is given by the maximum number of consecutive -:
- - -→ margin 1- -- -→ margin 2--- -→ margin 3
The % symbol indicates a possible page break.
11. Comments¶
Single-line comments are permitted in the form // comment. The scope is the line: everything that follows // up to the newline is ignored by the parser. Comments may stand on their own line (above/below a datapack, or between the lines of a datapack) or trailing a line of code.
The discriminant between the two forms is positional: if there is at least one non-whitespace character before //, the comment is trailing (the part of code before // remains valid); otherwise the entire line is a comment.
Multi-line comments (/* … */) are not supported: NRK is a column-sensitive language (the barline | align the lines of the datapack vertically) and block comments would break the alignment.
The sequence %% at the start of a line is not a comment: it is reserved for the song version blocks (see neumaRk_versions.md).
12. Version blocks¶
Between one datapack and another, version blocks may appear, marked %%NAME … %%end, which enclose one or more alternate datapacks of the song. The blocks live standalone between datapacks, not inside them. See neumaRk_versions.md for the full spec.