propresenter-php/.sisyphus/notepads/propresenter-parser/learnings.md
Thorsten Bus 813d30dd12 test(playlist): add integration tests and update AGENTS.md
- Add ProPlaylistIntegrationTest with 8 round-trip tests
- All 4 .proplaylist test files validated in ProPlaylistReaderTest
- Update AGENTS.md with playlist module documentation
- Document reading, writing, generating, CLI usage
- Add notepad learnings from Wave 4 tasks
2026-03-01 21:28:18 +01:00

18 KiB

Learnings — ProPresenter Parser

Conventions & Patterns

(Agents will append findings here)

Task 1: Project Scaffolding — Composer + PHPUnit + Directory Structure

Completed

  • Created PHP 8.4 project with Composer
  • Configured PSR-4 autoloading for both namespaces:
    • ProPresenter\Parser\src/
    • Rv\Data\generated/Rv/Data/
  • Installed PHPUnit 11.5.55 with google/protobuf 4.33.5
  • Created phpunit.xml with strict settings
  • Created SmokeTest.php that passes
  • All 5 required directories created: src/, tests/, bin/, proto/, generated/

Key Findings

  • PHP 8.4.7 is available on the system
  • Composer resolves dependencies cleanly (28 packages installed)
  • PHPUnit 11 runs with strict mode enabled (beStrictAboutOutputDuringTests, failOnRisky, failOnWarning)
  • Autoloading works correctly with both namespaces configured

Verification Results

  • Composer install: Success (28 packages)
  • PHPUnit smoke test: 1 test passed
  • Autoload verification: Works correctly
  • Directory structure: All 5 directories present

Task 3: RTF Plain Text Extractor (TDD)

Completed

  • RtfExtractor::toPlainText() static method — standalone, no external deps
  • 11 PHPUnit tests all passing (TDD: RED → GREEN)
  • Handles real ProPresenter CocoaRTF 2761 format

Key RTF Patterns in ProPresenter

  • Format: Always {\rtf1\ansi\ansicpg1252\cocoartf2761 ...}
  • Encoding: Windows-1252 (ansicpg1252), hex escapes \'xx for non-ASCII
  • Soft returns: Single backslash \ followed by newline = line break in text
  • Text location: After last formatting command (often \CocoaLigature0 ), before final }
  • Nested groups: {\fonttbl ...}, {\colortbl ...}, {\*\expandedcolortbl ...} — must be stripped
  • German chars: \'fc=ü, \'f6=ö, \'e4=ä, \'df=ß, \'e9=é, \'e8
  • Unicode: \uNNNN? where NNNN is decimal codepoint, ? is ANSI fallback (skipped)
  • Stroke formatting: Some songs have \outl0\strokewidth-40 \strokec3 before text
  • Translation boxes: Same RTF structure, different font size (e.g., fs80 vs fs84)

Implementation Approach

  • Character-by-character parser (not regex) — handles nested braces correctly
  • Strip all {...} nested groups first, then process flat content
  • Control words: \word[N] pattern, space delimiter consumed
  • Non-RTF input passes through unchanged (graceful fallback)

Testing Gotcha

  • PHP single-quoted strings: \' = escaped quote, NOT literal backslash-quote

  • Use nowdoc (<<<'RTF') for RTF test data with hex escapes (\'xx)

  • Regular concatenated strings work for RTF without hex escapes (soft returns \\ are fine)

  • 2026-03-01 task-2 proto import resolution: copied full Proto7.16.2/ tree (including google/protobuf/*.proto) into php/proto/; imports already resolve with --proto_path=./php/proto, no path rewrites required.

  • 2026-03-01 task-2 version extraction: application_info.platform_version from Test.pro = macOS 14.8.3; application_info.application_version = major 20, build 335544354.

  • 2026-03-01 task-6 binary fidelity baseline: decode->encode byte round-trip currently yields 0/169 identical files (168 non-empty from all-songs + Test.pro); first mismatches typically occur early (~byte offsets 700-3000), indicating systematic re-serialization differences rather than isolated corruption.

Task 5: Group + Arrangement Wrapper Classes (TDD)

Completed

  • Group.php wrapping Rv\Data\Presentation\CueGroup — getUuid(), getName(), getColor(), getSlideUuids(), setName(), getProto()
  • Arrangement.php wrapping Rv\Data\Presentation\Arrangement — getUuid(), getName(), getGroupUuids(), setName(), setGroupUuids(), getProto()
  • 30 tests (16 Group + 14 Arrangement), 74 assertions — all pass
  • TDD: RED confirmed (class not found errors) → GREEN (all pass)

Protobuf Structure Findings

  • CueGroup (field 12) has TWO parts: group (Rv\Data\Group with uuid/name/color) and cue_identifiers (repeated UUID = slide refs)
  • Arrangement (field 11) has: uuid, name, group_identifiers (repeated UUID = group refs, can repeat same group)
  • UUID.getString() returns the string value; UUID.setString() sets it
  • Color has getRed()/getGreen()/getBlue()/getAlpha() returning floats
  • Group also has hotKey, application_group_identifier, application_group_name (not exposed in wrapper — not needed for song parsing)

Test.pro Verified Structure

  • 4 groups: Verse 1 (2 slides), Verse 2 (1 slide), Chorus (1 slide), Ending (1 slide)
  • 2 arrangements: 'normal' (5 group refs), 'test2' (4 group refs)
  • All groups have non-empty UUIDs
  • Arrangement group UUIDs reference valid group UUIDs (cross-validated in test)

Task 4: TextElement + Slide Wrapper Classes (TDD)

Completed

  • TextElement.php wraps Graphics Element: getName(), hasText(), getRtfData(), setRtfData(), getPlainText()
  • Slide.php wraps Cue: getUuid(), getTextElements(), getAllElements(), getPlainText(), hasTranslation(), getTranslation(), getCue()
  • 24 tests (10 TextElement + 14 Slide), 47 assertions, all pass
  • TDD: RED confirmed then GREEN (all pass)
  • Integration tests verify real Test.pro data

Protobuf Navigation Path (Confirmed)

  • Cue -> getActions()[0] -> getSlide() (oneof) -> getPresentation() (oneof) -> getBaseSlide() -> getElements()[]
  • Slide Element -> getElement() -> Graphics Element
  • Graphics Element -> getName() (user-defined label), hasText(), getText() -> Graphics Text -> getRtfData()
  • Elements WITHOUT text (shapes, media) have hasText() === false, must be filtered

Key Design Decisions

  • TextElement wraps Graphics Element (not Slide Element) for clean text-focused API
  • Slide wraps Cue (not PresentationSlide) because UUID is on the Cue
  • Translation = second text element (index 1); no label detection needed
  • Lazy caching: textElements/allElements computed once per instance
  • Test.pro path from tests: dirname(DIR, 2) . '/ref/Test.pro' (2 levels up from php/tests/)

Task 7: Song + ProFileReader Integration (TDD)

Completed

  • Added Song aggregate wrapper (Presentation-level integration over Group/Slide/Arrangement)
  • Added ProFileReader::read(string): Song with file existence and empty-file validation
  • Added integration-heavy tests: SongTest + ProFileReaderTest (12 tests, 44 assertions)

Key Implementation Findings

  • Song constructor can eager-load all wrappers safely: cue_groups -> Group, cues -> Slide, arrangements -> Arrangement
  • UUID cross-reference resolution works best with normalized uppercase lookup maps (groupsByUuid, slidesByUuid) because UUIDs are string-based
  • Group/arrangement references can repeat the same UUID; resolution must preserve order and duplicates (important for repeated chorus)
  • ProFileReader using is_file + filesize correctly handles UTF-8 paths and catches known 0-byte fixture before protobuf parsing

Verified Against Fixtures

  • Test.pro: name Test, 4 groups, 5 slides, 2 arrangements

  • getSlidesForGroup(Verse 1) resolves to slide UUIDs [5A6AF946..., A18EF896...] with texts Vers1.1/Vers1.2 and Vers1.3/Vers1.4

  • getGroupsForArrangement(normal) resolves ordered names [Chorus, Verse 1, Chorus, Verse 2, Chorus]

  • Diverse reads validated through ProFileReader on 6 files, including [TRANS] and UTF-8/non-song file names

  • 2026-03-01 task-2 Zip64Fixer: ProPresenter .proplaylist archives include ZIP64 EOCD with central-directory size consistently 98 bytes too large; recalculating zip64_eocd_position - zip64_cd_offset and patching ZIP64(+40) + EOCD(+12) makes ZipArchive open reliably.

  • 2026-03-01 task-2 verification: fixed bytes opened successfully for TestPlaylist + Gottesdienst, Gottesdienst 2, Gottesdienst 3 (entries: 4/25/38/38).

Task 5 (playlist): PlaylistNode Wrapper (TDD)

Completed

  • PlaylistNode.php wrapping Rv\Data\Playlist — getUuid(), getName(), getType(), isContainer(), isLeaf(), getChildNodes(), getEntries(), getEntryCount(), getPlaylist()
  • 15 tests, 37 assertions — all pass
  • TDD: RED confirmed (class not found) → GREEN (all pass)

Key Findings

  • Playlist proto uses oneof ChildrenType with getChildrenType() returning string: 'playlists' | 'items' | '' (null/unset)
  • Container nodes: getPlaylists() returns PlaylistArray which has getPlaylists() (confusing double-nesting)
  • Leaf nodes: getItems() returns PlaylistItems which has getItems() (same double-nesting pattern)
  • A playlist with neither items nor playlists set has getChildrenType() returning '' — must handle as neither container nor leaf
  • Recursive wrapping works: constructor calls new self($childPlaylist) for nested container nodes
  • PlaylistEntry (Task 4) wraps PlaylistItem with getName(), getUuid(), getType() — compatible interface

Task 4 (Playlist): PlaylistEntry Wrapper Class (TDD)

Completed

  • PlaylistEntry.php wrapping Rv\Data\PlaylistItem - all 4 item types: header, presentation, placeholder, cue
  • 23 tests, 40 assertions - all pass (TDD: RED confirmed then GREEN)
  • QA scenarios verified: arrangement_name field 5, type detection

Protobuf API Findings

  • PlaylistItem.getItemType() uses whichOneof('ItemType') - returns lowercase string: header, presentation, cue, placeholder, planning_center
  • Returns empty string (not null) when no oneof is set
  • hasHeader()/hasPresentation() etc use hasOneof(N) - reliable for type checking
  • Header color: Header.getColor() returns Rv\Data\Color, Header.hasColor() checks existence
  • Color floats: getRed()/getGreen()/getBlue()/getAlpha() - protobuf floats have precision ~6 digits, use assertEqualsWithDelta in tests
  • Presentation document path: Presentation.getDocumentPath() returns Rv\Data\URL, use getAbsoluteString() for full URL
  • URL filename extraction: parse_url + basename + urldecode handles encoded spaces
  • Arrangement UUID: Presentation.getArrangement() returns UUID|null, Presentation.hasArrangement() checks existence
  • Arrangement name (field 5): Presentation.getArrangementName() returns string, empty string when not set

Design Decisions

  • Named class PlaylistEntry (not PlaylistItem) to avoid collision with Rv\Data\PlaylistItem
  • Null safety: type-specific getters return null for wrong item types (not exceptions)
  • getArrangementName() returns null for empty string (treat empty as unset)
  • Color returned as indexed array [r, g, b, a] matching plan spec (not associative like Group.php)
  • getDocumentFilename() decodes URL-encoded characters for human-readable names

Task 6: PlaylistArchive Top-Level Wrapper (TDD)

Completed

  • PlaylistArchive.php wrapping PlaylistDocument + embedded files
  • 18 tests, 37 assertions — all pass (TDD: RED → GREEN)
  • Lazy .pro parsing with caching, file partitioning, root/child node access

Key Implementation Findings

  • PlaylistDocument root_node structure: root Playlist ("PLAYLIST") → child Playlist (actual name via PlaylistArray oneof)
  • PlaylistNode constructor handles oneof: 'playlists' → child nodes, 'items' → entries
  • Lazy parsing pattern: (new Presentation())->mergeFromString($bytes) then new Song($pres) — identical to ProFileReader but from bytes not file
  • str_ends_with(strtolower($filename), '.pro') for case-insensitive .pro detection
  • ARRAY_FILTER_USE_BOTH needed to filter by key (filename) while keeping values (bytes)
  • Constructor takes PlaylistDocument + optional array $embeddedFiles (filename => raw bytes)
  • data file from ZIP is NOT passed to constructor — it's the proto itself, already parsed

Design Decisions

  • Named class PlaylistArchive (not PlaylistDocument) to avoid proto collision

  • getName() returns child playlist name (not root "PLAYLIST") for user-facing convenience

  • getPlaylistNode() returns null when no children (graceful handling)

  • getEmbeddedSong() returns null for non-.pro files AND missing files (both guarded)

  • Cache via $parsedSongs array — same Song instance returned on repeated calls

  • 2026-03-01 task-7 ProPlaylistReader: mirror ProFileReader guard order (is_file/filesize/file_get_contents) with playlist-specific RuntimeException messages to keep reader behavior consistent.

  • 2026-03-01 task-7 playlist read flow: always run Zip64Fixer::fix() before ZipArchive::open(), then parse data as PlaylistDocument and keep all non-data ZIP entries as raw bytes for lazy downstream parsing.

  • 2026-03-01 task-7 cleanup verification: using tempnam(..., 'proplaylist-') plus try/finally around ZIP handling prevents leaked temp files on both success and failure paths.

  • 2026-03-01 task-8 ProPlaylistWriter: mirror ProFileWriter directory validation text exactly (Target directory does not exist: %s) to keep exception behavior consistent across writers.

  • 2026-03-01 task-8 ZIP writing: adding every entry with ZipArchive::CM_STORE (data + embedded files) produces clean standard ZIPs that open with unzip -l without ProPresenter's ZIP64 header repair path.

  • 2026-03-01 task-8 cleanup: tempnam(..., 'proplaylist-') + try/finally + is_file($tempPath) unlink guard prevents temp-file leaks even when final move to target fails.

  • 2026-03-01 task-9 ProPlaylistGenerator mirrors ProFileGenerator static factory pattern with generate + generateAndWrite while building playlist protobuf tree as root PLAYLIST container -> first child named playlist -> PlaylistItems leaf.

  • 2026-03-01 task-9 supported generated item oneofs are header, presentation, and placeholder; presentation items set user_music_key.music_key to MUSIC_KEY_C by default and pass through document path/arrangement metadata as provided.

  • 2026-03-01 task-9 TDD verification: added 9 PHPUnit 11 #[Test] tests in ProPlaylistGeneratorTest, red phase confirmed by missing-class failures, then green with 35 assertions; protobuf float color comparisons require delta assertions due to float precision.

Task 10: parse-playlist.php CLI Tool

Completed

  • Created php/bin/parse-playlist.php executable CLI tool
  • Follows parse-song.php structure exactly (shebang, autoloader, argc check, try/catch)
  • Displays playlist metadata, entries with type-specific details, embedded file lists
  • Plain text output (no colors/ANSI codes)
  • Error handling with user-friendly messages
  • Verified with TestPlaylist.proplaylist and error scenarios

Key Implementation Findings

  • Version objects (Rv\Data\Version) have getMajorVersion(), getMinorVersion(), getPatchVersion(), getBuild() methods
  • Must call methods on Version objects, not concatenate directly (causes "Object of class Rv\Data\Version could not be converted to string" error)
  • Entry type prefixes: [H]=header, [P]=presentation, [-]=placeholder, [C]=cue
  • Header color returned as array [r,g,b,a] from getHeaderColor()
  • Presentation items show arrangement name (if set) and document path URL
  • Embedded files partitioned into .pro files and media files via getEmbeddedProFiles() and getEmbeddedMediaFiles()

Test Results

  • Scenario 1 (TestPlaylist.proplaylist): Structured output with 7 entries, 2 .pro files, 1 media file
  • Scenario 2 (nonexistent file): Error message + exit code 1
  • Scenario 3 (no arguments): Usage message + exit code 1

Design Decisions

  • Followed parse-song.php structure exactly for consistency
  • Version formatting: "major.minor.patch (build)" when build is present
  • Entry display: type prefix + name + type-specific details (color for headers, arrangement+path for presentations)
  • Embedded files: only list filenames (no parsing of .pro files)

Task 13: AGENTS.md Update for .proplaylist Module

Date: 2026-03-01

Completed

  • Added new "ProPresenter Playlist Parser" section to AGENTS.md
  • Matched exact style of existing .pro module documentation
  • Included all required subsections:
    • Spec (file format, key features)
    • PHP Module Usage (Reader, Writer, Generator)
    • Reading a Playlist
    • Accessing Playlist Structure (entries, lazy-loading)
    • Modifying and Writing
    • Generating a New Playlist
    • CLI Tool documentation
    • Format Specification reference
    • Key Files listing

Style Consistency

  • Used same heading levels (H1 for main, H2 for sections, H3 for subsections)
  • Matched code block formatting and indentation
  • Maintained conciseness and clarity
  • Used em-dashes (—) for file descriptions, matching .pro section

Key Files Documented

  • PlaylistArchive.php (top-level wrapper)
  • PlaylistEntry.php (entry wrapper)
  • ProPlaylistReader.php (reader)
  • ProPlaylistWriter.php (writer)
  • ProPlaylistGenerator.php (generator)
  • parse-playlist.php (CLI tool)
  • pp_playlist_spec.md (format spec)

Evidence

  • Verification output saved to: .sisyphus/evidence/task-13-agents-md.txt
  • New section starts at line 186 in AGENTS.md

Task 12: Validation Tests Against Real-World Playlist Files

Key Findings

  • All 4 .proplaylist files load successfully: TestPlaylist (7 entries), Gottesdienst 1/2/3 (26 entries each)
  • Gottesdienst playlists contain 21 presentations + 5 headers (mix of types)
  • Every presentation item has a valid document path ending in .pro
  • Embedded .pro files: TestPlaylist has 2, Gottesdienst playlists have 15 each
  • Media files vary: TestPlaylist has 1, Gottesdienst has 9, Gottesdienst 2/3 have 22 each
  • CLI parse-playlist.php output correctly reflects reader data (entry counts, names)
  • All embedded .pro files parse successfully as Song objects with non-empty names
  • All entries across all files have non-empty UUIDs

Test Pattern

  • Added 7 validation test methods to existing ProPlaylistIntegrationTest.php (alongside 8 round-trip tests)

  • Used minimum thresholds (>20 entries, >10 presentations, >2 headers, >5 .pro files) instead of exact counts

  • allPlaylistFiles() helper returns all 4 required paths for loop-based testing

  • CLI test uses exec() with escapeshellarg() for safe path handling (spaces in filenames)

  • 2026-03-01 21:23:59 - Round-trip integration assertions are stable when comparing logical fields (types, arrangement names, document paths, embedded count, header RGBA) instead of raw archive bytes.