# Learnings — ProPresenter Parser ## Conventions & Patterns (Agents will append findings here) ## Task 1: Project Scaffolding — Composer + PHPUnit + Directory Structure ### Completed - ✅ Created PHP 8.4 project with Composer - ✅ Configured PSR-4 autoloading for both namespaces: - `ProPresenter\Parser\` → `src/` - `Rv\Data\` → `generated/Rv/Data/` - ✅ Installed PHPUnit 11.5.55 with google/protobuf 4.33.5 - ✅ Created phpunit.xml with strict settings - ✅ Created SmokeTest.php that passes - ✅ All 5 required directories created: src/, tests/, bin/, proto/, generated/ ### Key Findings - PHP 8.4.7 is available on the system - Composer resolves dependencies cleanly (28 packages installed) - PHPUnit 11 runs with strict mode enabled (beStrictAboutOutputDuringTests, failOnRisky, failOnWarning) - Autoloading works correctly with both namespaces configured ### Verification Results - Composer install: ✅ Success (28 packages) - PHPUnit smoke test: ✅ 1 test passed - Autoload verification: ✅ Works correctly - Directory structure: ✅ All 5 directories present ## Task 3: RTF Plain Text Extractor (TDD) ### Completed - ✅ RtfExtractor::toPlainText() static method — standalone, no external deps - ✅ 11 PHPUnit tests all passing (TDD: RED → GREEN) - ✅ Handles real ProPresenter CocoaRTF 2761 format ### Key RTF Patterns in ProPresenter - **Format**: Always `{\rtf1\ansi\ansicpg1252\cocoartf2761 ...}` - **Encoding**: Windows-1252 (ansicpg1252), hex escapes `\'xx` for non-ASCII - **Soft returns**: Single backslash `\` followed by newline = line break in text - **Text location**: After last formatting command (often `\CocoaLigature0 `), before final `}` - **Nested groups**: `{\fonttbl ...}`, `{\colortbl ...}`, `{\*\expandedcolortbl ...}` — must be stripped - **German chars**: `\'fc`=ü, `\'f6`=ö, `\'e4`=ä, `\'df`=ß, `\'e9`=é, `\'e8`=è - **Unicode**: `\uNNNN?` where NNNN is decimal codepoint, `?` is ANSI fallback (skipped) - **Stroke formatting**: Some songs have `\outl0\strokewidth-40 \strokec3` before text - **Translation boxes**: Same RTF structure, different font size (e.g., fs80 vs fs84) ### Implementation Approach - Character-by-character parser (not regex) — handles nested braces correctly - Strip all `{...}` nested groups first, then process flat content - Control words: `\word[N]` pattern, space delimiter consumed - Non-RTF input passes through unchanged (graceful fallback) ### Testing Gotcha - PHP single-quoted strings: `\'` = escaped quote, NOT literal backslash-quote - Use **nowdoc** (`<<<'RTF'`) for RTF test data with hex escapes (`\'xx`) - Regular concatenated strings work for RTF without hex escapes (soft returns `\\` are fine) - 2026-03-01 task-2 proto import resolution: copied full `Proto7.16.2/` tree (including `google/protobuf/*.proto`) into `php/proto/`; imports already resolve with `--proto_path=./php/proto`, no path rewrites required. - 2026-03-01 task-2 version extraction: `application_info.platform_version` from Test.pro = macOS 14.8.3; `application_info.application_version` = major 20, build 335544354. - 2026-03-01 task-6 binary fidelity baseline: decode->encode byte round-trip currently yields `0/169` identical files (`168` non-empty from `all-songs` + `Test.pro`); first mismatches typically occur early (~byte offsets 700-3000), indicating systematic re-serialization differences rather than isolated corruption. ## Task 5: Group + Arrangement Wrapper Classes (TDD) ### Completed - ✅ Group.php wrapping Rv\Data\Presentation\CueGroup — getUuid(), getName(), getColor(), getSlideUuids(), setName(), getProto() - ✅ Arrangement.php wrapping Rv\Data\Presentation\Arrangement — getUuid(), getName(), getGroupUuids(), setName(), setGroupUuids(), getProto() - ✅ 30 tests (16 Group + 14 Arrangement), 74 assertions — all pass - ✅ TDD: RED confirmed (class not found errors) → GREEN (all pass) ### Protobuf Structure Findings - CueGroup (field 12) has TWO parts: `group` (Rv\Data\Group with uuid/name/color) and `cue_identifiers` (repeated UUID = slide refs) - Arrangement (field 11) has: uuid, name, `group_identifiers` (repeated UUID = group refs, can repeat same group) - UUID.getString() returns the string value; UUID.setString() sets it - Color has getRed()/getGreen()/getBlue()/getAlpha() returning floats - Group also has hotKey, application_group_identifier, application_group_name (not exposed in wrapper — not needed for song parsing) ### Test.pro Verified Structure - 4 groups: Verse 1 (2 slides), Verse 2 (1 slide), Chorus (1 slide), Ending (1 slide) - 2 arrangements: 'normal' (5 group refs), 'test2' (4 group refs) - All groups have non-empty UUIDs - Arrangement group UUIDs reference valid group UUIDs (cross-validated in test) ## Task 4: TextElement + Slide Wrapper Classes (TDD) ### Completed - TextElement.php wraps Graphics Element: getName(), hasText(), getRtfData(), setRtfData(), getPlainText() - Slide.php wraps Cue: getUuid(), getTextElements(), getAllElements(), getPlainText(), hasTranslation(), getTranslation(), getCue() - 24 tests (10 TextElement + 14 Slide), 47 assertions, all pass - TDD: RED confirmed then GREEN (all pass) - Integration tests verify real Test.pro data ### Protobuf Navigation Path (Confirmed) - Cue -> getActions()[0] -> getSlide() (oneof) -> getPresentation() (oneof) -> getBaseSlide() -> getElements()[] - Slide Element -> getElement() -> Graphics Element - Graphics Element -> getName() (user-defined label), hasText(), getText() -> Graphics Text -> getRtfData() - Elements WITHOUT text (shapes, media) have hasText() === false, must be filtered ### Key Design Decisions - TextElement wraps Graphics Element (not Slide Element) for clean text-focused API - Slide wraps Cue (not PresentationSlide) because UUID is on the Cue - Translation = second text element (index 1); no label detection needed - Lazy caching: textElements/allElements computed once per instance - Test.pro path from tests: dirname(__DIR__, 2) . '/ref/Test.pro' (2 levels up from php/tests/) ## Task 7: Song + ProFileReader Integration (TDD) ### Completed - ✅ Added `Song` aggregate wrapper (Presentation-level integration over Group/Slide/Arrangement) - ✅ Added `ProFileReader::read(string): Song` with file existence and empty-file validation - ✅ Added integration-heavy tests: `SongTest` + `ProFileReaderTest` (12 tests, 44 assertions) ### Key Implementation Findings - Song constructor can eager-load all wrappers safely: `cue_groups` -> Group, `cues` -> Slide, `arrangements` -> Arrangement - UUID cross-reference resolution works best with normalized uppercase lookup maps (`groupsByUuid`, `slidesByUuid`) because UUIDs are string-based - Group/arrangement references can repeat the same UUID; resolution must preserve order and duplicates (important for repeated chorus) - `ProFileReader` using `is_file` + `filesize` correctly handles UTF-8 paths and catches known 0-byte fixture before protobuf parsing ### Verified Against Fixtures - Test.pro: name `Test`, 4 groups, 5 slides, 2 arrangements - `getSlidesForGroup(Verse 1)` resolves to slide UUIDs `[5A6AF946..., A18EF896...]` with texts `Vers1.1/Vers1.2` and `Vers1.3/Vers1.4` - `getGroupsForArrangement(normal)` resolves ordered names `[Chorus, Verse 1, Chorus, Verse 2, Chorus]` - Diverse reads validated through ProFileReader on 6 files, including `[TRANS]` and UTF-8/non-song file names