Skip to content

Releases: coregx/coregex

v0.8.24: Longest() mode optimization

14 Dec 00:46
5850603

Choose a tag to compare

Fixed

Longest() mode performance - BoundedBacktracker now supports leftmost-longest matching (#52)

  • Root cause: BoundedBacktracker was disabled entirely in Longest() mode, forcing PikeVM fallback
  • Solution: Implemented backtrackFindLongest() that explores all branches at splits
  • Found by: Ben Hoyt (GoAWK integration testing with re.Longest())

Performance (Longest() mode)

Metric Before After Improvement
coregex Longest() 450 ns 133 ns 3.4x faster
Longest() overhead +270% +8% Target was +10%
vs stdlib Longest() 2.4x slower 1.37x faster

Install

go get github.com/coregx/coregex@v0.8.24

Full Changelog: v0.8.23...v0.8.24

v0.8.23: Unicode char class fix

13 Dec 20:54

Choose a tag to compare

Critical Bug Fix

Unicode character classes now work correctly.

The Bug

Character classes with non-ASCII characters (code points 128-255) returned incorrect matches:

// Before v0.8.23:
re := coregex.MustCompile(`[föd]+`)
re.FindString("fööd") // returned "f" (wrong!)

// After v0.8.23:
re.FindString("fööd") // returns "fööd" (correct)

Root Cause

CharClassSearcher uses a 256-byte lookup table for O(1) membership testing. The guard was rune > 255 but characters like ö (code point 246) are multi-byte in UTF-8 (0xC3 0xB6), so byte-based lookup fails.

Fix

Changed check from > 255 to > 127 - only true ASCII (0-127) can use byte lookup table.

Affected Patterns

Any character class containing non-ASCII: [äöü]+, [café]+, [α-ω]+, etc.

Credit

Found by Ben Hoyt during GoAWK integration testing.

Upgrade recommended for all users with internationalized patterns.

v0.8.22: Small string optimization

13 Dec 10:30
0837e6a

Choose a tag to compare

Small String Optimization (1.4-20x faster)

Addresses performance issues reported by @benhoyt (#29) where coregex was 2-6x slower than stdlib on small inputs (~44 bytes).

Key Optimizations

  1. Zero-allocation string-to-bytes conversion

    • stringToBytes() using unsafe.Slice (like Rust's as_bytes())
    • MatchString: 48B/op → 0B/op
  2. BoundedBacktracker for small NFA patterns

    • O(1) generation-based reset vs PikeVM's thread queues
    • 2-3x faster on small inputs
  3. Prefilter integration in NFA path

Performance Results

Pattern stdlib coregex Speedup
j[a-z]+p 357ns 253ns 1.4x
\d+ 1.13µs 57ns 20x
\w+ 1.05µs 58ns 18x
[a-z]+ 1.02µs 63ns 16x

Commits

  • perf: optimize small string matching with BoundedBacktracker (#46)

Closes #47

v0.8.21: CharClassSearcher + ByteClasses compression

12 Dec 23:10
aff7f51

Choose a tag to compare

What's New

Added

  • CharClassSearcher - Specialized 256-byte lookup table for simple char_class patterns (Fixes #44)

    • Patterns like [\w]+, \d+, [a-z]+ now use O(1) byte membership test
    • 23x faster than stdlib (623ms → 27ms on 6MB input with 1.3M matches)
    • 2x faster than Rust regex! (57ms → 27ms)
    • Zero allocations in hot path
  • UseCharClassSearcher strategy

    • Auto-selected for simple char_class patterns without capture groups
    • Patterns WITH captures ((\w)+) continue to use BoundedBacktracker
  • Zero-allocation Count() method

Fixed

  • DFA ByteClasses compression (Rust-style optimization)

    • Compile memory for hello pattern: 1195KB → 598KB (2x reduction)
  • Removed unused reverseDFA field from Engine

    • Was creating redundant reverse DFA for ALL patterns (2x memory overhead)
  • Reverse NFA ByteClasses registration

    • Matches Rust's approach in nfa.rs

Performance Summary

Pattern Input stdlib coregex Rust coregex vs Rust
[\w]+ 6MB, 1.3M matches 623ms 27ms 57ms 2.1x faster
Pattern Before After Improvement
hello compile 1195KB 598KB -50%
char_class runtime 180ms 109ms -39%

Full Changelog: v0.8.20...v0.8.21

v0.8.20: UseReverseSuffixSet for multi-suffix patterns (34-385x faster)

12 Dec 15:16
8e2b20e

Choose a tag to compare

Highlights

New UseReverseSuffixSet strategy for patterns like .*\.(txt|log|md) where the Longest Common Suffix (LCS) is empty but multiple suffix literals are available.

🚀 Novel optimization NOT present in rust-regex - they fall back to Core strategy for these patterns!

Performance

Input stdlib coregex Speedup
1KB 15.5µs 454ns 34x faster
32KB 1.95ms 5µs 384x faster
1MB 57ms 147µs 385x faster

Changes

Added

  • ReverseSuffixSetSearcher - Teddy SIMD prefilter + reverse DFA for multi-suffix patterns
  • cross_reverse algorithm - proper suffix extraction for OpConcat (rust-regex port)
  • Regression benchmarks - meta/reverse_suffix_set_bench_test.go

Changed

  • Refactored selectReverseStrategy to reduce cyclomatic complexity
  • Extracted shouldUseReverseSuffixSet helper function

Files Changed

  • meta/reverse_suffix_set.go - New ReverseSuffixSetSearcher (306 lines)
  • meta/reverse_suffix_set_bench_test.go - Benchmarks (186 lines)
  • meta/strategy.go - Strategy selection logic
  • meta/meta.go - Engine integration
  • literal/extractor.go - cross_reverse for suffix extraction

Full Changelog: v0.8.19...v0.8.20

v0.8.19: FindAll ReverseSuffix optimization (87x faster)

12 Dec 12:31

Choose a tag to compare

Performance

FindAll with ReverseSuffix patterns now dramatically faster:

Pattern Operation stdlib coregex Speedup
.*@example\.com FindAll (6MB) 316ms 3.6ms 87x faster
.*@example\.com Find (6MB) ~300ms <1ms 300x+ faster

Changes

  • FindAll ReverseSuffix optimization (Fixes #41)

    • FindIndicesAt() now supports UseReverseSuffix strategy
    • Added ReverseSuffixSearcher.FindAt() and FindIndicesAt() methods
  • ReverseSuffix Find() optimization

    • Use bytes.LastIndex for O(n) single-pass suffix search
    • Added matchStartZero flag: skip reverse DFA for .* prefix patterns

Install

go get github.com/coregx/coregex@v0.8.19

Full Changelog: v0.8.18...v0.8.19

v0.8.18: UseTeddy literal engine bypass

12 Dec 02:39
ec67c66

Choose a tag to compare

Highlights

UseTeddy Strategy (Literal Engine Bypass)

Exact literal alternations like (foo|bar|baz) now skip DFA construction entirely:

  • Compile time: 109us -> 11us (10x faster)
  • Memory: 598KB -> 19KB (31x less)
  • Inspired by Rust regex's literal engine bypass optimization

Teddy Multi-Pattern Prefilter

  • Alternation patterns now use Teddy SIMD prefilter
  • (foo|bar|baz|qux): 242x faster than stdlib (was 24x slower)

Other Improvements

  • ReverseSuffix.Find(): Last-suffix algorithm for greedy semantics
  • ReverseAnchored.Find(): Zero-allocation using SearchReverse
  • BoundedBacktracker: O(1) visited tracking with generation counter
  • Single-char inner literals: Email patterns 11-42x faster

Performance Summary

Pattern Before After
(foo|bar|baz|qux) 24x slower 242x faster
(a|b|c)+ 1.8x slower 2.5x faster
\d+ 2x slower 4.5x faster
Email pattern - 11-42x faster

All tested patterns now faster than Go stdlib!

Full Changelog: https://github.com/coregx/coregex/blob/main/CHANGELOG.md#0818---2025-12-12

v0.8.17: BoundedBacktracker for character class patterns

11 Dec 22:30

Choose a tag to compare

What's New

BoundedBacktracker Engine - New execution engine for character class patterns

Performance Improvement

  • Patterns like \d+, \w+, [a-z]+ are now 2.5x faster than stdlib
  • Previously these patterns were 2-3x slower than stdlib
  • Uses recursive backtracking with bit-vector visited tracking for O(1) lookup

How It Works

  • Automatic strategy selection via UseBoundedBacktracker in meta-engine
  • Selected when pattern has no good literals for prefiltering
  • Memory-bounded: max 256KB visited bit vector (falls back to PikeVM for larger inputs)
  • 2-5x faster than PikeVM for simple patterns

Technical Details

  • New files: nfa/backtrack.go, nfa/backtrack_test.go, nfa/backtrack_bench_test.go
  • PR #38

Full Changelog: v0.8.16...v0.8.17

v0.8.16: FindAll, ReplaceAll, and character class optimizations

11 Dec 11:03

Choose a tag to compare

Performance Improvements

This release completes the performance optimization work from #29, addressing all remaining issues reported by @benhoyt.

Character class pattern optimization (Fixes #33)

  • Simple patterns like [0-9]+, \d+, \w+ now use NFA directly
  • Skip DFA overhead when no prefilter benefit
  • Added isSimpleCharClass() detection in strategy selection

ReplaceAll optimization (Fixes #34)

  • Pre-allocate result buffer (input + 25%)
  • Reuse matchIndices buffer across iterations (was allocating per match)

FindAll/FindAllIndex optimization (Fixes #35)

  • Use FindIndicesAt() instead of FindAt() (avoids Match object creation)
  • Lazy allocation - only allocate when first match found
  • Pre-allocate with estimated capacity (10 matches per 1KB)

Benchmark Results

Benchmark Before After Change
Find/hello 619 ns 88 ns -85% (~7x faster)
OnePassIsMatch 25 ns 20 ns -19%
LazyDFARepetition 1059 ns 839 ns -21%

Summary of v0.8.14-v0.8.16 performance work

All issues from #29 are now resolved:

  • #29: Literal patterns now ~7x faster than stdlib (was 5x slower)
  • #31: IsMatch() is zero-allocation
  • #32: FindIndices() is zero-allocation
  • #33: Character class patterns use smart strategy
  • #34: ReplaceAll optimized with buffer reuse
  • #35: FindAll optimized with lazy allocation

Full Changelog: v0.8.15...v0.8.16

v0.8.15: Zero-allocation IsMatch and FindIndices

11 Dec 10:41

Choose a tag to compare

Performance Improvements

This release adds zero-allocation methods for hot paths, addressing performance issues reported in #29.

Zero-allocation IsMatch() (Fixes #31)

  • PikeVM.IsMatch() returns immediately on first match without computing positions
  • 0 B/op, 0 allocs/op in hot path
  • Speedup vs stdlib: 52-1863x faster (depending on input size)

Zero-allocation FindIndices() (Fixes #32)

  • Engine.FindIndices() returns (start, end int, found bool) tuple
  • 0 B/op, 0 allocs/op - no Match object allocation
  • Used internally by Find() and FindIndex() public API

Changes

  • Find() and FindIndex() now use FindIndices() internally
  • isMatchNFA() now uses optimized PikeVM.IsMatch() instead of Search()

Benchmarks

Method Before After
IsMatch() 48 B/op, 1 allocs 0 B/op, 0 allocs
FindIndices() (new) N/A 0 B/op, 0 allocs

Thanks to @benhoyt for detailed performance analysis!

Full Changelog: v0.8.14...v0.8.15