Releases: coregx/coregex
v0.8.24: Longest() mode optimization
Fixed
Longest() mode performance - BoundedBacktracker now supports leftmost-longest matching (#52)
- Root cause: BoundedBacktracker was disabled entirely in Longest() mode, forcing PikeVM fallback
- Solution: Implemented
backtrackFindLongest()that explores all branches at splits - Found by: Ben Hoyt (GoAWK integration testing with
re.Longest())
Performance (Longest() mode)
| Metric | Before | After | Improvement |
|---|---|---|---|
| coregex Longest() | 450 ns | 133 ns | 3.4x faster |
| Longest() overhead | +270% | +8% | Target was +10% |
| vs stdlib Longest() | 2.4x slower | 1.37x faster | — |
Install
go get github.com/coregx/coregex@v0.8.24Full Changelog: v0.8.23...v0.8.24
v0.8.23: Unicode char class fix
Critical Bug Fix
Unicode character classes now work correctly.
The Bug
Character classes with non-ASCII characters (code points 128-255) returned incorrect matches:
// Before v0.8.23:
re := coregex.MustCompile(`[föd]+`)
re.FindString("fööd") // returned "f" (wrong!)
// After v0.8.23:
re.FindString("fööd") // returns "fööd" (correct)Root Cause
CharClassSearcher uses a 256-byte lookup table for O(1) membership testing. The guard was rune > 255 but characters like ö (code point 246) are multi-byte in UTF-8 (0xC3 0xB6), so byte-based lookup fails.
Fix
Changed check from > 255 to > 127 - only true ASCII (0-127) can use byte lookup table.
Affected Patterns
Any character class containing non-ASCII: [äöü]+, [café]+, [α-ω]+, etc.
Credit
Found by Ben Hoyt during GoAWK integration testing.
Upgrade recommended for all users with internationalized patterns.
v0.8.22: Small string optimization
Small String Optimization (1.4-20x faster)
Addresses performance issues reported by @benhoyt (#29) where coregex was 2-6x slower than stdlib on small inputs (~44 bytes).
Key Optimizations
-
Zero-allocation string-to-bytes conversion
stringToBytes()usingunsafe.Slice(like Rust'sas_bytes())MatchString: 48B/op → 0B/op
-
BoundedBacktracker for small NFA patterns
- O(1) generation-based reset vs PikeVM's thread queues
- 2-3x faster on small inputs
-
Prefilter integration in NFA path
Performance Results
| Pattern | stdlib | coregex | Speedup |
|---|---|---|---|
j[a-z]+p |
357ns | 253ns | 1.4x |
\d+ |
1.13µs | 57ns | 20x |
\w+ |
1.05µs | 58ns | 18x |
[a-z]+ |
1.02µs | 63ns | 16x |
Commits
- perf: optimize small string matching with BoundedBacktracker (#46)
Closes #47
v0.8.21: CharClassSearcher + ByteClasses compression
What's New
Added
-
CharClassSearcher - Specialized 256-byte lookup table for simple char_class patterns (Fixes #44)
- Patterns like
[\w]+,\d+,[a-z]+now use O(1) byte membership test - 23x faster than stdlib (623ms → 27ms on 6MB input with 1.3M matches)
- 2x faster than Rust regex! (57ms → 27ms)
- Zero allocations in hot path
- Patterns like
-
UseCharClassSearcher strategy
- Auto-selected for simple char_class patterns without capture groups
- Patterns WITH captures (
(\w)+) continue to use BoundedBacktracker
-
Zero-allocation Count() method
Fixed
-
DFA ByteClasses compression (Rust-style optimization)
- Compile memory for
hellopattern: 1195KB → 598KB (2x reduction)
- Compile memory for
-
Removed unused reverseDFA field from Engine
- Was creating redundant reverse DFA for ALL patterns (2x memory overhead)
-
Reverse NFA ByteClasses registration
- Matches Rust's approach in
nfa.rs
- Matches Rust's approach in
Performance Summary
| Pattern | Input | stdlib | coregex | Rust | coregex vs Rust |
|---|---|---|---|---|---|
[\w]+ |
6MB, 1.3M matches | 623ms | 27ms | 57ms | 2.1x faster |
| Pattern | Before | After | Improvement |
|---|---|---|---|
hello compile |
1195KB | 598KB | -50% |
| char_class runtime | 180ms | 109ms | -39% |
Full Changelog: v0.8.20...v0.8.21
v0.8.20: UseReverseSuffixSet for multi-suffix patterns (34-385x faster)
Highlights
New UseReverseSuffixSet strategy for patterns like .*\.(txt|log|md) where the Longest Common Suffix (LCS) is empty but multiple suffix literals are available.
🚀 Novel optimization NOT present in rust-regex - they fall back to Core strategy for these patterns!
Performance
| Input | stdlib | coregex | Speedup |
|---|---|---|---|
| 1KB | 15.5µs | 454ns | 34x faster |
| 32KB | 1.95ms | 5µs | 384x faster |
| 1MB | 57ms | 147µs | 385x faster |
Changes
Added
- ReverseSuffixSetSearcher - Teddy SIMD prefilter + reverse DFA for multi-suffix patterns
cross_reversealgorithm - proper suffix extraction for OpConcat (rust-regex port)- Regression benchmarks -
meta/reverse_suffix_set_bench_test.go
Changed
- Refactored
selectReverseStrategyto reduce cyclomatic complexity - Extracted
shouldUseReverseSuffixSethelper function
Files Changed
meta/reverse_suffix_set.go- New ReverseSuffixSetSearcher (306 lines)meta/reverse_suffix_set_bench_test.go- Benchmarks (186 lines)meta/strategy.go- Strategy selection logicmeta/meta.go- Engine integrationliteral/extractor.go- cross_reverse for suffix extraction
Full Changelog: v0.8.19...v0.8.20
v0.8.19: FindAll ReverseSuffix optimization (87x faster)
Performance
FindAll with ReverseSuffix patterns now dramatically faster:
| Pattern | Operation | stdlib | coregex | Speedup |
|---|---|---|---|---|
.*@example\.com |
FindAll (6MB) | 316ms | 3.6ms | 87x faster |
.*@example\.com |
Find (6MB) | ~300ms | <1ms | 300x+ faster |
Changes
-
FindAll ReverseSuffix optimization (Fixes #41)
FindIndicesAt()now supportsUseReverseSuffixstrategy- Added
ReverseSuffixSearcher.FindAt()andFindIndicesAt()methods
-
ReverseSuffix Find() optimization
- Use
bytes.LastIndexfor O(n) single-pass suffix search - Added
matchStartZeroflag: skip reverse DFA for.*prefix patterns
- Use
Install
go get github.com/coregx/coregex@v0.8.19Full Changelog: v0.8.18...v0.8.19
v0.8.18: UseTeddy literal engine bypass
Highlights
UseTeddy Strategy (Literal Engine Bypass)
Exact literal alternations like (foo|bar|baz) now skip DFA construction entirely:
- Compile time: 109us -> 11us (10x faster)
- Memory: 598KB -> 19KB (31x less)
- Inspired by Rust regex's literal engine bypass optimization
Teddy Multi-Pattern Prefilter
- Alternation patterns now use Teddy SIMD prefilter
- (foo|bar|baz|qux): 242x faster than stdlib (was 24x slower)
Other Improvements
- ReverseSuffix.Find(): Last-suffix algorithm for greedy semantics
- ReverseAnchored.Find(): Zero-allocation using SearchReverse
- BoundedBacktracker: O(1) visited tracking with generation counter
- Single-char inner literals: Email patterns 11-42x faster
Performance Summary
| Pattern | Before | After |
|---|---|---|
| (foo|bar|baz|qux) | 24x slower | 242x faster |
| (a|b|c)+ | 1.8x slower | 2.5x faster |
| \d+ | 2x slower | 4.5x faster |
| Email pattern | - | 11-42x faster |
All tested patterns now faster than Go stdlib!
Full Changelog: https://github.com/coregx/coregex/blob/main/CHANGELOG.md#0818---2025-12-12
v0.8.17: BoundedBacktracker for character class patterns
What's New
BoundedBacktracker Engine - New execution engine for character class patterns
Performance Improvement
- Patterns like
\d+,\w+,[a-z]+are now 2.5x faster than stdlib - Previously these patterns were 2-3x slower than stdlib
- Uses recursive backtracking with bit-vector visited tracking for O(1) lookup
How It Works
- Automatic strategy selection via
UseBoundedBacktrackerin meta-engine - Selected when pattern has no good literals for prefiltering
- Memory-bounded: max 256KB visited bit vector (falls back to PikeVM for larger inputs)
- 2-5x faster than PikeVM for simple patterns
Technical Details
- New files:
nfa/backtrack.go,nfa/backtrack_test.go,nfa/backtrack_bench_test.go - PR #38
Full Changelog: v0.8.16...v0.8.17
v0.8.16: FindAll, ReplaceAll, and character class optimizations
Performance Improvements
This release completes the performance optimization work from #29, addressing all remaining issues reported by @benhoyt.
Character class pattern optimization (Fixes #33)
- Simple patterns like
[0-9]+,\d+,\w+now use NFA directly - Skip DFA overhead when no prefilter benefit
- Added
isSimpleCharClass()detection in strategy selection
ReplaceAll optimization (Fixes #34)
- Pre-allocate result buffer (input + 25%)
- Reuse
matchIndicesbuffer across iterations (was allocating per match)
FindAll/FindAllIndex optimization (Fixes #35)
- Use
FindIndicesAt()instead ofFindAt()(avoids Match object creation) - Lazy allocation - only allocate when first match found
- Pre-allocate with estimated capacity (10 matches per 1KB)
Benchmark Results
| Benchmark | Before | After | Change |
|---|---|---|---|
| Find/hello | 619 ns | 88 ns | -85% (~7x faster) |
| OnePassIsMatch | 25 ns | 20 ns | -19% |
| LazyDFARepetition | 1059 ns | 839 ns | -21% |
Summary of v0.8.14-v0.8.16 performance work
All issues from #29 are now resolved:
- ✅ #29: Literal patterns now ~7x faster than stdlib (was 5x slower)
- ✅ #31:
IsMatch()is zero-allocation - ✅ #32:
FindIndices()is zero-allocation - ✅ #33: Character class patterns use smart strategy
- ✅ #34:
ReplaceAlloptimized with buffer reuse - ✅ #35:
FindAlloptimized with lazy allocation
Full Changelog: v0.8.15...v0.8.16
v0.8.15: Zero-allocation IsMatch and FindIndices
Performance Improvements
This release adds zero-allocation methods for hot paths, addressing performance issues reported in #29.
Zero-allocation IsMatch() (Fixes #31)
PikeVM.IsMatch()returns immediately on first match without computing positions- 0 B/op, 0 allocs/op in hot path
- Speedup vs stdlib: 52-1863x faster (depending on input size)
Zero-allocation FindIndices() (Fixes #32)
Engine.FindIndices()returns(start, end int, found bool)tuple- 0 B/op, 0 allocs/op - no Match object allocation
- Used internally by
Find()andFindIndex()public API
Changes
Find()andFindIndex()now useFindIndices()internallyisMatchNFA()now uses optimizedPikeVM.IsMatch()instead ofSearch()
Benchmarks
| Method | Before | After |
|---|---|---|
IsMatch() |
48 B/op, 1 allocs | 0 B/op, 0 allocs |
FindIndices() (new) |
N/A | 0 B/op, 0 allocs |
Thanks to @benhoyt for detailed performance analysis!
Full Changelog: v0.8.14...v0.8.15