Skip to content

[file-diet] Refactor Large Go File: pkg/parser/schedule_parser.go (1084 lines) #11285

@agentic-workflows-dev

Description

@agentic-workflows-dev

Overview

The file pkg/parser/schedule_parser.go has grown to 1084 lines, significantly exceeding the healthy 800-line threshold. While it has excellent test coverage (2031 test lines, ~1.87 ratio), the file's size makes it difficult to navigate and maintain. This task involves refactoring it into smaller, focused modules with clear boundaries.

Current State

  • File: pkg/parser/schedule_parser.go
  • Size: 1084 lines
  • Test Coverage: 2031 lines in schedule_parser_test.go (~1.87 ratio)
  • Complexity: High - contains 5 distinct functional areas mixed together

Problem Analysis

The file mixes multiple concerns:

  1. Cron detection - Pattern matching for different cron types
  2. Fuzzy scattering - Complex deterministic time distribution (290+ lines in ScatterSchedule)
  3. Parser orchestration - Tokenization and high-level parsing
  4. Expression parsing - Interval and base schedule parsing
  5. Time utilities - Time parsing and conversion helpers

Refactoring Strategy

Split the file into focused modules aligned with functional boundaries:

1. schedule_cron_detection.go (~150 lines)

Functions to move:

  • IsDailyCron(cron string) bool
  • IsHourlyCron(cron string) bool
  • IsWeeklyCron(cron string) bool
  • IsFuzzyCron(cron string) bool
  • IsCronExpression(input string) bool

Responsibility: Cron pattern detection and classification

Rationale: These are pure functions that analyze cron strings. They have no dependencies on the parser state and can be tested independently.

2. schedule_fuzzy_scatter.go (~320 lines)

Functions to move:

  • ScatterSchedule(fuzzyCron, workflowIdentifier string) (string, error) (290+ lines)
  • stableHash(s string, modulo int) int

Responsibility: Deterministic fuzzy schedule time distribution

Rationale: The ScatterSchedule function is self-contained with complex logic for different fuzzy patterns (DAILY, HOURLY, WEEKLY, etc.). Extracting it reduces the main file by ~30%.

3. schedule_time_utils.go (~150 lines)

Functions to move:

  • parseTime(timeStr string) (minute string, hour string) (120 lines)
  • parseTimeToMinutes(hourStr, minuteStr string) int
  • mapWeekday(day string) string

Responsibility: Time parsing and conversion utilities

Rationale: These are helper functions used by the parser but don't depend on parser state. Can be reused by other modules.

4. schedule_parser_core.go (~250 lines)

Keep in this file:

  • ScheduleParser struct
  • ParseSchedule(input string) (cron, original, error) (entry point)
  • (p *ScheduleParser) tokenize() error
  • (p *ScheduleParser) parse() (string, error)

Responsibility: High-level parsing orchestration

Rationale: This is the main API surface. Keeping it focused on orchestration makes the flow clear.

5. schedule_parser_expressions.go (~250 lines)

Functions to move:

  • (p *ScheduleParser) parseInterval() (string, error) (150+ lines)
  • (p *ScheduleParser) parseBase() (string, error) (180+ lines)
  • (p *ScheduleParser) extractTime(startPos int) (string, error)
  • (p *ScheduleParser) extractTimeBetween(startPos, endPos int) (string, error)
  • (p *ScheduleParser) extractTimeAfter(startPos int) (string, error)

Responsibility: Schedule expression parsing (interval, base, time extraction)

Rationale: These are the detailed parsing methods that consume tokens. They depend on parser state but can be in a separate file via schedule_parser_core.go.

Implementation Plan

Step-by-step refactoring approach
  1. Create test infrastructure first

    • Run make test-unit to establish baseline
    • Ensure all existing tests pass before changes
  2. Extract utilities (no dependencies)

    • Create schedule_time_utils.go with time parsing functions
    • Create schedule_time_utils_test.go with relevant tests
    • Run tests to verify behavior unchanged
  3. Extract detection functions

    • Create schedule_cron_detection.go with Is*Cron functions
    • Create schedule_cron_detection_test.go with relevant tests
    • Run tests to verify behavior unchanged
  4. Extract fuzzy scattering

    • Create schedule_fuzzy_scatter.go with ScatterSchedule and stableHash
    • Create schedule_fuzzy_scatter_test.go with relevant tests
    • Run tests to verify behavior unchanged
  5. Split parser methods

    • Create schedule_parser_expressions.go with parsing methods
    • Move parseInterval, parseBase, extractTime* methods
    • Ensure methods can still access ScheduleParser receiver
    • Run tests to verify behavior unchanged
  6. Rename original file

    • Rename schedule_parser.go to schedule_parser_core.go
    • Keep only core orchestration (struct, ParseSchedule, tokenize, parse)
    • Run tests to verify behavior unchanged
  7. Update package documentation

    • Add file-level comments to each new file
    • Update any package-level documentation
  8. Final validation

    • Run make agent-finish to ensure all checks pass
    • Verify no breaking changes to public API
    • Review test coverage with go test -cover

Test Coverage Plan

Each new file should have corresponding test file with ≥80% coverage:

schedule_cron_detection_test.go

  • Test each Is*Cron function with valid/invalid patterns
  • Edge cases: empty strings, malformed cron, boundary values
  • Target coverage: >85%

schedule_fuzzy_scatter_test.go

  • Test ScatterSchedule with all fuzzy patterns (DAILY, HOURLY, WEEKLY, etc.)
  • Test stableHash determinism and distribution
  • Test error cases (invalid patterns, out-of-range values)
  • Target coverage: >80%

schedule_time_utils_test.go

  • Test parseTime with various time formats
  • Test parseTimeToMinutes edge cases
  • Test mapWeekday with all weekday variations
  • Target coverage: >85%

schedule_parser_expressions_test.go

  • Test parseInterval with short/long duration formats
  • Test parseBase with daily/weekly/monthly patterns
  • Test extractTime* methods with various input positions
  • Target coverage: >80%

schedule_parser_core_test.go

  • Integration tests ensuring all modules work together
  • Test ParseSchedule end-to-end with complex expressions
  • Target coverage: >85%

Implementation Guidelines

  1. Preserve Behavior: All existing tests must pass without modification
  2. Maintain Exports: Public API (ParseSchedule, IsCronExpression, etc.) unchanged
  3. Add Tests First: Create test files for new modules before moving code
  4. Incremental Changes: Split one module at a time, commit after each successful split
  5. Run Tests Frequently: Verify make test-unit passes after each module extraction
  6. Update Imports: The new files will be in the same parser package (no import changes needed)
  7. Document Boundaries: Add clear file-level comments explaining each module's responsibility

Acceptance Criteria

  • Original file split into 5 focused files
  • Each new file is under 350 lines
  • All existing tests pass (make test-unit)
  • New test files created with ≥80% coverage per module
  • No breaking changes to public API (ParseSchedule, Is*Cron functions)
  • Code passes linting (make lint)
  • Code passes formatting (make fmt)
  • Build succeeds (make build)
  • Integration tests verify end-to-end functionality

Additional Context

  • Repository Guidelines: Follow patterns in AGENTS.md
  • Code Organization: Prefer many small files grouped by functionality (per AGENTS.md)
  • Testing: Match existing test patterns in pkg/parser/*_test.go
  • Logger: Each new file should have its own logger (e.g., logger.New("parser:schedule_fuzzy_scatter"))

Estimated Impact

Priority: Medium
Effort: Medium (5-8 hours estimated)
Benefits:

  • Maintainability: Easier to locate and modify specific functionality
  • Testing: Clearer test organization and faster test execution per module
  • Complexity: Reduced cognitive load when reading/debugging
  • Reusability: Utility functions can be used by other parser components

References

AI generated by Daily File Diet

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions