Skip to content

Datapack

1. The datapack concept

A datapack is the fundamental structural unit of musical content in neumaRk.

It represents a vertical block of musical information synchronized in time, typically corresponding to one or more aligned staves:

  • markers
  • chords
  • notes
  • articulations
  • dynamics
  • lyrics
  • formatting

Each datapack describes a coherent temporal sequence (one or more consecutive measures). Musical datapacks are separated from one another by one or more blank lines, which act as structural delimiters. The presence of a blank line implies the conclusion of the preceding datapack.


2. Line types

Each line of the datapack has a semantic type.

The type may be:

  • explicit, via a marker
  • implicit, deduced from the content and position

2.1 Explicit line markers

Explicit markers are:

  • one, two, or three uppercase letters
  • followed by )
  • followed by a space
Marker Type
M) Markers
C) Chords
A) Articulations
N) Notes
D) Dynamics
L) Lyrics
F) Format

The markers are optional.

The N) marker admits two morphological 2-character variants, explicitly provided for by the specification:

  • N+ — new staff introduced in this datapack (§4.5);
  • N2 — second voice on the same staff (see neumaRk_voices.md).

The digit 2 and the sign + replace the ), preserving the 2-character width invariant required for vertical alignment.

Likewise, the C) marker admits a variant:

  • C+ — line of alternate chords above the base line (comment-label and line scope in neumaRk_chords.md §7).

The + sign replaces the ) as in N+, preserving the same width invariant.


3. Logical order of lines (implicit deduction)

In the absence of explicit markers, the type of the lines is deduced following the logical order:

  1. Markers (at most one line, optional)
  2. Chords
  3. zero or more lines
  4. the last is considered the main line
  5. Note groups, each composed of:
  6. one Articulations line (optional, first)
  7. one Notes line (mandatory in the absence of chords)
  8. one Dynamics line (optional, after — see neumaRk_dynamics.md)
  9. one Lyrics line (optional, last)

A datapack may contain from 1 to 4 note groups: each group corresponds to a staff of the system (see §4). 4. Format (at most one line, optional, always final)


3.bis Line-type deduction (normative algorithm)

§3 gives the logical order of the types; this section formalizes its deduction algorithm: how, in the absence of an explicit marker, the type of a line is determined from content + position. The rule written here is normative — it is the language, not an implementation detail. An explicit marker (§2.1) always wins: the algorithm that follows applies only to lines without a marker.

3.bis.1 Datapack structure

Only Markers and Chords are unique per datapack (they are shared by the whole system, §4.1); everything else is a repeated note group, once per (staff × voice):

datapack ::= [Markers]?  Head  Group{1..8}  [Format]?

Head     ::= ( [A]? Chord )*        // 0+ chord-row, each possibly
                                    //   decorated by an A) above
Group    ::= [A]? N [D]? [L]?       // one (staff × voice)
  • Head (Markers + Chords): unique, shared. The articulations line may precede a chord-row to decorate it (a chord-row may carry its own rhythm, neumaRk_chords.md §4); for this reason A) is admitted before C).
  • Group: repeats up to 8 times = max 4 staves (§4) × 2 voices per staff (§4.6). Voice 2 is not implicitly deducible — there is no way to distinguish, from content alone, "voice 2 of the same staff" from "new staff": it must always be declared with the explicit marker N2. Thus implicit deduction alone produces up to 4 groups (the staves); the 8 are reached only with N2.
  • Format: unique, always final (§9).

3.bis.2 Computation model

Deduction is a single-pass, per-datapack automaton that scans the lines in source order while maintaining two pieces of state:

  • lastType — the type of the last line (init Empty);
  • headClosedfalse until the first Notes line of the datapack appears, then true forever. It marks the boundary between Head and Groups.

The lastType guards are the transition function of the automaton: they encode both the order within a group, and the loop-back that opens the next group (re-entry on A or on N). headClosed prevents Chords from reappearing once the staves have been entered.

3.bis.3 Pre-filter: decorative lines

Before the cascade, a purely decorative line — composed only of simple barlines, :, compact repeat forms (neumaRk_flow_and_repeats.md §6.1), dots, spaces, and tabs — is skipped and does not receive a musical type (it remains structural; its barline decorations belong semantically to the other lines). Exception: the >/^ rescue in §3.bis.6.

3.bis.4 Classification cascade (normative precedence)

On non-decorative lines without a marker, a first-match-wins cascade is applied. The order is binding: the content predicates are deliberately permissive and do not over-match only thanks to (a) this order and (b) the position guard. The triad that defines each type is predicate + guard + exclusions.

# Type Predicate (content) Guard (position) Exclusions
1 Markers only […] / barline / spaces is the first line of the datapack
2 Chords ≥1 valid chord (neumaRk_chords.md §3) + barline / . / % / comment-label / spaces — or only % (TB1bis) head open (!headClosed) not slash-only (TB2); not rest-only r/! (TB1)
3 Articulations charset of the A) vocabulary (neumaRk_articulations.md) lastType ≠ Articulations not ^-only (TB3); not a valid notes-line (TB4)
4 Dynamics dynamics charset (< > c d f m p s z - . \| :) lastType == Notes
5 Lyrics word-char + . - _ ' \| : (the most permissive) lastType ∈ {Notes, Dynamics, Lyrics}
6 Notes default / sink any

Notes on the guards:

  • Markers (line 1): the implicit deduction of Markers occurs only on the first line of the datapack. A markers-shaped line further down is not Markers (an explicit M) remains possible wherever the spec allows it).
  • Chords (line 2): the normative condition is head open. This is equivalent to saying that chords live only before the first notes line: chords are unique and shared (§4.1), and no valid datapack has a chord-row after a notes-row. The parser realizes the constraint with the positional flag headClosed (closed at the first Notes line of the datapack) — aligned to §3.bis since 2026-05-25 (FU.3a).
  • Lyrics (line 5): the predicate is the most permissive (it matches almost any text). This is intentional: the permissiveness is contained by the position guard, and in any case the default type is Notes. It is not narrowed (that would be non-monotonic and would risk changing the classification of existing songs).
  • Notes default: line 6 is the sink — every line that reaches the bottom of the cascade is a notes line. It is also the loop-back mechanism: two consecutive N) (both sink) are two distinct staves.

3.bis.5 Tiebreakers

Three disambiguations resolve cases in which a permissive predicate would match the wrong type:

  • TB1 — rest-only → Notes. A line composed only of rest-tokens (r / !), barline, ., %, and spaces belongs to the stave (it is a notes line of rests only), not to an empty chord-row. It applies after Chords/Alt and after Articulations:
A7        ← Chords          | > .       ← Articulations
r         ← Notes (TB1)     r           ← Notes (TB1)

Implemented since 2026-05-25 as a single rule "rest-only forces Notes": the line is intercepted before Chords and Articulations, regardless of lastType (FU.3b after Articulations + FU.3c after Chords — both produced the wrong type, respectively empty chord-row and Articulations). A guard requires a real rest-token r/! or %, so | > | (anacrusis) remains handled by the rescue (§3.bis.6).

  • TB1bis — %-only without a chord-row → Chords. A line composed only of % (plus barline, ., spaces — no rest-token r/!) with head open (!headClosed) and in the absence of a real chord-row in the datapack is a chord-row of measure repeats only: each % repeats the last chord-measure (even from the preceding datapack — the measure-repeat lookback is cross-datapack). It is the exception to TB1: % without a rest-token is not forced to Notes, but falls into the Chords branch (whose isChordLine already accepts %).
| C7 | F7 |     ← Chords
| a b c | …     ← Notes (closes the head)

| % | % |       ← Chords (TB1bis): repeats C7, F7 from the datapack above
| d e f | …     ← Notes

Decision 2026-05-26, supersedes the previous principle "if you want a chord-row of only % you must declare it with C)". Rationale: "chords that repeat while the melody changes" is the most common lead-sheet pattern; forcing C) on every continuation was gratuitous friction. Two guards keep the case narrow:

  1. no rest-token — a line with r/! always remains Notes (TB1): rests are unambiguously notes;
  2. !sawRealChordRow — if the head already has a chord-row with a real chord, a subsequent %-only line remains Notes (it is a notes-row of repeats), because the datapack's chord-row is already defined.

Escape-hatch for the opposite reading (a second notes staff of only %): the explicit marker N+ / N). Known limitation: a datapack of only % (without any notes-row) is read as a chord-row and violates the grammar (A?ND?L?)+ (≥1 N is required); declare it with N) if the intent is a standalone notes-row of repeats.

  • TB2 — slash-only → Notes. A line of standalone / only (plus barline, ., spaces) is slash rhythm (neumaRk_notes_and_durations.md §10), never a chord-row: / on its own is syntactically a quasi-chord but semantically a note event.
| / / | / / |     ← Notes (TB2)
  • TB3 — ^-only → Notes. A line of ^ only (plus | and spaces) is a notes line with ties only (neumaRk_notes_and_durations.md §6), not Articulations.
| ^ |     ← Notes (TB3)
  • TB4 — valid notes-line → Notes. A line that is a valid notes-line has priority over Articulations, even if it falls entirely within the charset of the A) vocabulary. Necessary because some letters of the vocabulary are also note-names/durations: in particular g (composes gl glissando, §9) is also the note G, and the durations 14 are in the charset → g, g4, g g g matched articulations_line and were stolen from A).
A7        ← Chords
g         ← Notes (TB4), not Articulations

Predicate: every token (split on spaces; the barlines |/: at the edges are stripped) is a valid note-token according to the canonical grammar parse_notes (single source → no divergence), and at least one token carries a real pitch [a-g]/r. Articulation-only tokens (>, tr, -, o, gl, ~…) do not match parse_notes → the line remains Articulations; a line of only ./,/^ (placeholder, no real pitch) → remains Articulations. Implemented 2026-05-26 (isNotesLineForTiebreak in map_functions.cpp), gated also on the >/^/~ rescue (§3.bis.6). Principle: in case of A)↔N) ambiguity, the note wins.

3.bis.6 > / ^ rescue

A line that the pre-filter (§3.bis.3) would have discarded as decorative — because > is readable as an anacrusis attached to a barline — but which contains > or ^ and is entirely articulations-charset, is recovered as Articulations (here > is an accent, not an anacrusis):

| > |     ← Articulations (rescue): accent on the first event of the measure

The rescue is a branch outside the cascade and always produces Articulations ( articulations may open a group in any position); it is subject to the same constraint lastType ≠ Articulations and the ^-only exclusion (TB3).

First line of the datapack. The rescue presupposes a context above it. In first position (line_id == 0) a line | > | is read as Markers/anacrusis, not as an accent: the Markers pre-filter (cascade line 1) takes precedence, and > is ambiguous between accent and upbeat when there is no notes line below it to disambiguate. It is a deliberate choice (decision 2026-05-25): in the absence of context, the anacrusis reading prevails.

3.bis.7 Post-pass: AlternateChords promotion

The base classification assigns Chords to all implicit chord-rows. A subsequent pass relabels as AlternateChords (C+, alternate chords above the base) the consecutive chord-rows that precede the last, only if no chord-row of the datapack carries an explicit marker (if the user has chosen the markers, their choice is respected). Limit: max 2 alternate lines per datapack (E127). Full spec in docs/feature-alternate-chords.md and neumaRk_chords.md §7.

3.bis.8 Derivations and invariant

  • The articulations_line charset (line 3) derives from the closed vocabulary of neumaRk_articulations.md (single source): deduction and the renderer semantics must not diverge.
  • Invariant: the algorithm may only be extended toward what is written here, never silently change an existing line→type mapping. The parser↔spec divergences on classification (FU.3a head closed, FU.3b rest-only after A), FU.3c rest-only after Chords) were closed on 2026-05-25, verified by difference on the "classification golden" battery (wasm/tests/fixtures/classify_*). FU.1 remains open (articulations charset from the closed vocabulary), which does not change the line→type mapping.
  • Mapping change 2026-05-26 (TB1bis): the %-only line without a rest-token, with head open and without a real chord-row, moves from Notes to Chords (§3.bis.5 TB1bis). It is a deliberate and documented change of the mapping, not a silent one — admitted because the language is not yet widespread (no version bump). Covered by the goldens classify_pct_only_chords (promotion) and classify_pct_after_real_chords (sawRealChordRow guard).
  • Correction 2026-05-26 (TB4): the line that is a valid notes-line but falls within the A) charset (typically the note-only lines g: g, g4, g g g) moves from Articulations to Notes (§3.bis.5 TB4). It is a bug-fix toward the correct behavior ("the note wins"), not a controversial extension. Covered by the goldens classify_note_g_after_chords and classify_note_g_vs_artic (with the negative > . . . → remains Articulations).

4. Multiple staves per datapack (multi-stave)

A datapack may represent a system with multiple aligned staves (e.g. melody + bass). The number of staves is given by the number of note groups (A?ND?L?) present, up to a maximum of 4.

The number of staves may vary from one datapack to another in the same song: one datapack may contain a single staff and the next two or more, without prior declarations.

4.1 Elements shared by the system

The following elements are common to all staves of the datapack:

  • the Markers line
  • the Chords line(s)
  • barlines and measure decorators (voltas, change of meter/key, $, @, DC, FINE, etc.)
  • the key and meter of the system

The measure barlines cross all staves vertically.

4.2 Elements independent per staff

Each group (A?ND?L?) defines an independent staff with its own:

  • Notes (mandatory)
  • Articulations, Dynamics, Lyrics (optional)
  • clef declared via the inline directive (@…) as the first token of the Notes line — see neumaRk_notes_and_durations.md §9. In the absence of a directive, the last clef defined in the context applies (default: treble clef).
  • ties, beams, and tuplets (all contained within the staff)

4.3 Horizontal alignment

All staves of the system must contain the same number of measures: the vertical alignment between staves is temporal.

4.4 Example

C) | Gm6 |
N) | g | a | bb | a |
N) | (@F) g,4 bb8 d e4 d | g,4 bb8 d e4 d | g,4 bb8 d e4 d | g,4 bb8 d e4 d |

A 2-staff system: the first in treble clef (default), the second in bass clef. The chord Gm6 is common to the system.


4.5 Staff continuity between datapacks (N+)

When a song has multiple datapacks, each staff (stave) has a persistent identity: the "first staff" of a datapack is the same "first staff" of the previous one — it inherits pitch context, the clef in effect, durCtx, and is rendered in the same vertical position.

The base identity is positional: the first N) of the datapack corresponds to the first N) of the previous datapack, the second to the second, and so on.

To add a new staff in a datapack without breaking its identity with the previous ones, the N+ marker is used in place of N).

N+   <content of the new staff>

The + replaces the ) (it does not add to it): the prefix remains 2 characters as with N), so the vertical alignment of measures remains identical between adjacent staves.

Rules

  • N) = continuation of the next staff of the previous datapack (FIFO, in source order). It inherits its context (pitch, clef, duration).
  • N+ = new staff introduced in this datapack. It starts with fresh context (the orientation reference depends on the initial clef, see neumaRk_notes_and_durations.md §2.2).
  • Source order = visual order top→bottom. The N+ marker determines only the identity (new vs. continuation), not the position.
  • N+ in the first datapack of the song is admitted but redundant (in the absence of preceding datapacks all staves are necessarily new); it is treated as a normal N).
  • Unchanged limit: max 4 total staves per datapack (sum of N)
  • N+).

Errors

  • E122N) without match: the datapack has more N) than the number of staves the previous one had. N+ is missing (or there is one N) too many).

Example

Datapack 1 (intro with 2 staves, treble + bass):

M) [intro]
C) G7
N) |: <d b>2 <e c>4 | <f d>2 <e c>4 :|
N) (@F) g,4. d'8 e d | f4. d8 e d

Datapack 2 (Theme): adds a third staff at the top (vocal melody), keeping the intro's treble and bass in the middle and at the bottom.

M) [Theme]
C) > | G7
N+ >  d8 | b'^ | b2 r8 d,
A) >    | . ! | . !
N) > |: <d b>2 <e c>4 | <f d>2 <e c>4 :|
N) > | g,4. d'8 e d   | f4. d8 e d

Resulting mapping (for the engine, not visible in source):

source row type identity
1 (N+) new new stave (top)
2 (N)) cont. = first intro staff (treble)
3 (N)) cont. = second intro staff (bass)

The context (last_pitch, clef, durCtx) of staves 2 and 3 of the Theme comes from staves 1 and 2 of the intro; staff 1 of the Theme starts fresh.

Cases not covered

The N)/N+ syntax expresses the common situation "I add a staff at the top/in the middle/at the bottom", but it does not cover:

  • selective drop (a datapack that keeps the 2nd staff of the previous one but drops the 1st);
  • reorder (swapping the visual order of existing staves while keeping their identity).

For these cases a future extension could introduce an explicit reference to the staff's internal ID (e.g. N<n>)). Not spec'd in this version.


4.6 Second voice per staff (N2)

Each staff may host an independent second voice, introduced by the N2 line. Stave and voice are distinct concepts: the staff is the pentagram, the voice is a musical stream within the staff.

Voice 2 shares the clef, key, meter, barlines, and chord line with voice 1, but maintains an independent persistent musical context. Voice 1 has stems up, voice 2 stems down.

The complete specification of syntax, context, binding, and diagnostics is in neumaRk_voices.md.


5. Datapack validity rules

A datapack is valid if:

  • it contains at least one Notes line or one Chords line
  • it respects the logical order of the lines
  • all musical lines are temporally alignable

It is invalid:

  • a datapack with only text lines
  • a datapack with a non-final Format line

6. Barlines and measures

All musical lines (Markers, Chords, Articulations, Notes, Dynamics, Lyrics):

  • contain barlines — simple and compound — and any measure decorators (in the absence of barlines all content belongs to the first measure of the staff)
  • implicitly define the division into measures

The Format line is the only exception: it does not admit barlines (neither simple nor compound) and no measure decorators.

6.1 Supported barlines

The admitted barlines are:

  • | simple barline
  • || double barline
  • |. or .| end
  • |: repeat start
  • :| repeat end

The final barline must be preceded by a space.


7. Measure decorators

Measure decorators are elements adjacent to a barline and are divided into:

  • BEGIN decorators (to the right of the barline)
  • END decorators (to the left of the barline)

7.1 BEGIN decorators

Positioned immediately to the right of the barline.

Possible decorators:

  • Change of meter and/or key

  • within parentheses

  • separated by a comma
  • arbitrary order

Examples:

|(3/4,Dm)
|([3+3+2]/8)
  • Volta endings

  • text within square brackets

  • optional +n for the duration in measures

Example:

|[1.]+4

In the absence of +n, the volta closes automatically at the first :| encountered within 4 measures (the typical case of [1.] endings). If no :| appears within the following 4 measures, the volta indicates an exit — use +n for non-standard cases.

  • $ segno
  • @ coda

If present, meter and key must precede every other BEGIN decorator.


7.2 END decorators

Positioned immediately to the left of the barline.

Supported decorators:

  • DC
  • DCal@
  • DCalFINE
  • D$
  • D$al@
  • D$alFINE
  • FINE
  • al@
  • free text within square brackets (graphic annotation)

The $ and @ signs are BEGIN decorators (see §7.1): they identify the point where the segno or coda is located, not a jump toward them.


8. Markers line

The Markers line:

  • contains markers enclosed within square brackets
  • the markers refer to the start of the measure

If a marker is preceded by a barline:

  • there must be a space between the barline and the marker
  • to avoid ambiguity with the volta decorators

It is best practice to place the flow decorators (DC, D$, coda, etc.) on this line.


9. Format line

The Format line may be declared:

  • in explicit form, via the marker F);
  • in implicit form, if the content of the line is uniquely recognizable as a format line.

The Format line, if present, must be:

  • unique per datapack;
  • always the last line of the datapack.

In case of ambiguity with a musical line, the musical interpretation always prevails and the line is not considered Format.

It contains alignment indications:

Symbol Alignment
\|* LEFT
*\| RIGHT
\|*\| CENTER
\|**\| JUSTIFIED (default)

10. Margins between datapacks

A line:

  • that follows a blank line
  • that begins with -
  • that contains only -, spaces, tabs, or %

indicates a vertical margin between datapacks.

The depth of the margin is given by the maximum number of consecutive -:

  • - - - → margin 1
  • - -- - → margin 2
  • --- - → margin 3

The % symbol indicates a possible page break.

11. Comments

Single-line comments are permitted in the form // comment. The scope is the line: everything that follows // up to the newline is ignored by the parser. Comments may stand on their own line (above/below a datapack, or between the lines of a datapack) or trailing a line of code.

The discriminant between the two forms is positional: if there is at least one non-whitespace character before //, the comment is trailing (the part of code before // remains valid); otherwise the entire line is a comment.

Multi-line comments (/* … */) are not supported: NRK is a column-sensitive language (the barline | align the lines of the datapack vertically) and block comments would break the alignment.

The sequence %% at the start of a line is not a comment: it is reserved for the song version blocks (see neumaRk_versions.md).


12. Version blocks

Between one datapack and another, version blocks may appear, marked %%NAME … %%end, which enclose one or more alternate datapacks of the song. The blocks live standalone between datapacks, not inside them. See neumaRk_versions.md for the full spec.