propresenter-php/.sisyphus/notepads/propresenter-parser/learnings.md
2026-03-01 16:12:17 +01:00

7.2 KiB

Learnings — ProPresenter Parser

Conventions & Patterns

(Agents will append findings here)

Task 1: Project Scaffolding — Composer + PHPUnit + Directory Structure

Completed

  • Created PHP 8.4 project with Composer
  • Configured PSR-4 autoloading for both namespaces:
    • ProPresenter\Parser\src/
    • Rv\Data\generated/Rv/Data/
  • Installed PHPUnit 11.5.55 with google/protobuf 4.33.5
  • Created phpunit.xml with strict settings
  • Created SmokeTest.php that passes
  • All 5 required directories created: src/, tests/, bin/, proto/, generated/

Key Findings

  • PHP 8.4.7 is available on the system
  • Composer resolves dependencies cleanly (28 packages installed)
  • PHPUnit 11 runs with strict mode enabled (beStrictAboutOutputDuringTests, failOnRisky, failOnWarning)
  • Autoloading works correctly with both namespaces configured

Verification Results

  • Composer install: Success (28 packages)
  • PHPUnit smoke test: 1 test passed
  • Autoload verification: Works correctly
  • Directory structure: All 5 directories present

Task 3: RTF Plain Text Extractor (TDD)

Completed

  • RtfExtractor::toPlainText() static method — standalone, no external deps
  • 11 PHPUnit tests all passing (TDD: RED → GREEN)
  • Handles real ProPresenter CocoaRTF 2761 format

Key RTF Patterns in ProPresenter

  • Format: Always {\rtf1\ansi\ansicpg1252\cocoartf2761 ...}
  • Encoding: Windows-1252 (ansicpg1252), hex escapes \'xx for non-ASCII
  • Soft returns: Single backslash \ followed by newline = line break in text
  • Text location: After last formatting command (often \CocoaLigature0 ), before final }
  • Nested groups: {\fonttbl ...}, {\colortbl ...}, {\*\expandedcolortbl ...} — must be stripped
  • German chars: \'fc=ü, \'f6=ö, \'e4=ä, \'df=ß, \'e9=é, \'e8
  • Unicode: \uNNNN? where NNNN is decimal codepoint, ? is ANSI fallback (skipped)
  • Stroke formatting: Some songs have \outl0\strokewidth-40 \strokec3 before text
  • Translation boxes: Same RTF structure, different font size (e.g., fs80 vs fs84)

Implementation Approach

  • Character-by-character parser (not regex) — handles nested braces correctly
  • Strip all {...} nested groups first, then process flat content
  • Control words: \word[N] pattern, space delimiter consumed
  • Non-RTF input passes through unchanged (graceful fallback)

Testing Gotcha

  • PHP single-quoted strings: \' = escaped quote, NOT literal backslash-quote

  • Use nowdoc (<<<'RTF') for RTF test data with hex escapes (\'xx)

  • Regular concatenated strings work for RTF without hex escapes (soft returns \\ are fine)

  • 2026-03-01 task-2 proto import resolution: copied full Proto7.16.2/ tree (including google/protobuf/*.proto) into php/proto/; imports already resolve with --proto_path=./php/proto, no path rewrites required.

  • 2026-03-01 task-2 version extraction: application_info.platform_version from Test.pro = macOS 14.8.3; application_info.application_version = major 20, build 335544354.

  • 2026-03-01 task-6 binary fidelity baseline: decode->encode byte round-trip currently yields 0/169 identical files (168 non-empty from all-songs + Test.pro); first mismatches typically occur early (~byte offsets 700-3000), indicating systematic re-serialization differences rather than isolated corruption.

Task 5: Group + Arrangement Wrapper Classes (TDD)

Completed

  • Group.php wrapping Rv\Data\Presentation\CueGroup — getUuid(), getName(), getColor(), getSlideUuids(), setName(), getProto()
  • Arrangement.php wrapping Rv\Data\Presentation\Arrangement — getUuid(), getName(), getGroupUuids(), setName(), setGroupUuids(), getProto()
  • 30 tests (16 Group + 14 Arrangement), 74 assertions — all pass
  • TDD: RED confirmed (class not found errors) → GREEN (all pass)

Protobuf Structure Findings

  • CueGroup (field 12) has TWO parts: group (Rv\Data\Group with uuid/name/color) and cue_identifiers (repeated UUID = slide refs)
  • Arrangement (field 11) has: uuid, name, group_identifiers (repeated UUID = group refs, can repeat same group)
  • UUID.getString() returns the string value; UUID.setString() sets it
  • Color has getRed()/getGreen()/getBlue()/getAlpha() returning floats
  • Group also has hotKey, application_group_identifier, application_group_name (not exposed in wrapper — not needed for song parsing)

Test.pro Verified Structure

  • 4 groups: Verse 1 (2 slides), Verse 2 (1 slide), Chorus (1 slide), Ending (1 slide)
  • 2 arrangements: 'normal' (5 group refs), 'test2' (4 group refs)
  • All groups have non-empty UUIDs
  • Arrangement group UUIDs reference valid group UUIDs (cross-validated in test)

Task 4: TextElement + Slide Wrapper Classes (TDD)

Completed

  • TextElement.php wraps Graphics Element: getName(), hasText(), getRtfData(), setRtfData(), getPlainText()
  • Slide.php wraps Cue: getUuid(), getTextElements(), getAllElements(), getPlainText(), hasTranslation(), getTranslation(), getCue()
  • 24 tests (10 TextElement + 14 Slide), 47 assertions, all pass
  • TDD: RED confirmed then GREEN (all pass)
  • Integration tests verify real Test.pro data

Protobuf Navigation Path (Confirmed)

  • Cue -> getActions()[0] -> getSlide() (oneof) -> getPresentation() (oneof) -> getBaseSlide() -> getElements()[]
  • Slide Element -> getElement() -> Graphics Element
  • Graphics Element -> getName() (user-defined label), hasText(), getText() -> Graphics Text -> getRtfData()
  • Elements WITHOUT text (shapes, media) have hasText() === false, must be filtered

Key Design Decisions

  • TextElement wraps Graphics Element (not Slide Element) for clean text-focused API
  • Slide wraps Cue (not PresentationSlide) because UUID is on the Cue
  • Translation = second text element (index 1); no label detection needed
  • Lazy caching: textElements/allElements computed once per instance
  • Test.pro path from tests: dirname(DIR, 2) . '/ref/Test.pro' (2 levels up from php/tests/)

Task 7: Song + ProFileReader Integration (TDD)

Completed

  • Added Song aggregate wrapper (Presentation-level integration over Group/Slide/Arrangement)
  • Added ProFileReader::read(string): Song with file existence and empty-file validation
  • Added integration-heavy tests: SongTest + ProFileReaderTest (12 tests, 44 assertions)

Key Implementation Findings

  • Song constructor can eager-load all wrappers safely: cue_groups -> Group, cues -> Slide, arrangements -> Arrangement
  • UUID cross-reference resolution works best with normalized uppercase lookup maps (groupsByUuid, slidesByUuid) because UUIDs are string-based
  • Group/arrangement references can repeat the same UUID; resolution must preserve order and duplicates (important for repeated chorus)
  • ProFileReader using is_file + filesize correctly handles UTF-8 paths and catches known 0-byte fixture before protobuf parsing

Verified Against Fixtures

  • Test.pro: name Test, 4 groups, 5 slides, 2 arrangements
  • getSlidesForGroup(Verse 1) resolves to slide UUIDs [5A6AF946..., A18EF896...] with texts Vers1.1/Vers1.2 and Vers1.3/Vers1.4
  • getGroupsForArrangement(normal) resolves ordered names [Chorus, Verse 1, Chorus, Verse 2, Chorus]
  • Diverse reads validated through ProFileReader on 6 files, including [TRANS] and UTF-8/non-song file names