Releases · coregx/coregex

14 Dec 00:46

kolkov

v0.8.24

5850603

v0.8.24: Longest() mode optimization Latest

Latest

Fixed

Longest() mode performance - BoundedBacktracker now supports leftmost-longest matching (#52)

Root cause: BoundedBacktracker was disabled entirely in Longest() mode, forcing PikeVM fallback
Solution: Implemented backtrackFindLongest() that explores all branches at splits
Found by: Ben Hoyt (GoAWK integration testing with re.Longest())

Performance (Longest() mode)

Metric	Before	After	Improvement
coregex Longest()	450 ns	133 ns	3.4x faster
Longest() overhead	+270%	+8%	Target was +10%
vs stdlib Longest()	2.4x slower	1.37x faster	—

Install

go get github.com/coregx/coregex@v0.8.24

Full Changelog: v0.8.23...v0.8.24

Assets 2

13 Dec 20:54

kolkov

v0.8.23

d16020a

v0.8.23: Unicode char class fix

Critical Bug Fix

Unicode character classes now work correctly.

The Bug

Character classes with non-ASCII characters (code points 128-255) returned incorrect matches:

// Before v0.8.23:
re := coregex.MustCompile(`[föd]+`)
re.FindString("fööd") // returned "f" (wrong!)

// After v0.8.23:
re.FindString("fööd") // returns "fööd" (correct)

Root Cause

CharClassSearcher uses a 256-byte lookup table for O(1) membership testing. The guard was rune > 255 but characters like ö (code point 246) are multi-byte in UTF-8 (0xC3 0xB6), so byte-based lookup fails.

Fix

Changed check from > 255 to > 127 - only true ASCII (0-127) can use byte lookup table.

Affected Patterns

Any character class containing non-ASCII: [äöü]+, [café]+, [α-ω]+, etc.

Credit

Found by Ben Hoyt during GoAWK integration testing.

Upgrade recommended for all users with internationalized patterns.

Assets 2

13 Dec 10:30

kolkov

v0.8.22

0837e6a

v0.8.22: Small string optimization

Small String Optimization (1.4-20x faster)

Addresses performance issues reported by @benhoyt (#29) where coregex was 2-6x slower than stdlib on small inputs (~44 bytes).

Key Optimizations

Zero-allocation string-to-bytes conversion
- stringToBytes() using unsafe.Slice (like Rust's as_bytes())
- MatchString: 48B/op → 0B/op
BoundedBacktracker for small NFA patterns
- O(1) generation-based reset vs PikeVM's thread queues
- 2-3x faster on small inputs
Prefilter integration in NFA path

Performance Results

Pattern	stdlib	coregex	Speedup
`j[a-z]+p`	357ns	253ns	1.4x
`\d+`	1.13µs	57ns	20x
`\w+`	1.05µs	58ns	18x
`[a-z]+`	1.02µs	63ns	16x

Commits

perf: optimize small string matching with BoundedBacktracker (#46)

Closes #47

Contributors

benhoyt

Assets 2

12 Dec 23:10

kolkov

v0.8.21

aff7f51

v0.8.21: CharClassSearcher + ByteClasses compression

What's New

Added

CharClassSearcher - Specialized 256-byte lookup table for simple char_class patterns (Fixes #44)
- Patterns like [\w]+, \d+, [a-z]+ now use O(1) byte membership test
- 23x faster than stdlib (623ms → 27ms on 6MB input with 1.3M matches)
- 2x faster than Rust regex! (57ms → 27ms)
- Zero allocations in hot path
UseCharClassSearcher strategy
- Auto-selected for simple char_class patterns without capture groups
- Patterns WITH captures ((\w)+) continue to use BoundedBacktracker
Zero-allocation Count() method

Fixed

DFA ByteClasses compression (Rust-style optimization)
- Compile memory for hello pattern: 1195KB → 598KB (2x reduction)
Removed unused reverseDFA field from Engine
- Was creating redundant reverse DFA for ALL patterns (2x memory overhead)
Reverse NFA ByteClasses registration
- Matches Rust's approach in nfa.rs

Performance Summary

Pattern	Input	stdlib	coregex	Rust	coregex vs Rust
`[\w]+`	6MB, 1.3M matches	623ms	27ms	57ms	2.1x faster

Pattern	Before	After	Improvement
`hello` compile	1195KB	598KB	-50%
char_class runtime	180ms	109ms	-39%

Full Changelog: v0.8.20...v0.8.21

Assets 2

12 Dec 15:16

kolkov

v0.8.20

8e2b20e

v0.8.20: UseReverseSuffixSet for multi-suffix patterns (34-385x faster)

Highlights

New UseReverseSuffixSet strategy for patterns like .*\.(txt|log|md) where the Longest Common Suffix (LCS) is empty but multiple suffix literals are available.

🚀 Novel optimization NOT present in rust-regex - they fall back to Core strategy for these patterns!

Performance

Input	stdlib	coregex	Speedup
1KB	15.5µs	454ns	34x faster
32KB	1.95ms	5µs	384x faster
1MB	57ms	147µs	385x faster

Changes

Added

ReverseSuffixSetSearcher - Teddy SIMD prefilter + reverse DFA for multi-suffix patterns
cross_reverse algorithm - proper suffix extraction for OpConcat (rust-regex port)
Regression benchmarks - meta/reverse_suffix_set_bench_test.go

Changed

Refactored selectReverseStrategy to reduce cyclomatic complexity
Extracted shouldUseReverseSuffixSet helper function

Files Changed

meta/reverse_suffix_set.go - New ReverseSuffixSetSearcher (306 lines)
meta/reverse_suffix_set_bench_test.go - Benchmarks (186 lines)
meta/strategy.go - Strategy selection logic
meta/meta.go - Engine integration
literal/extractor.go - cross_reverse for suffix extraction

Full Changelog: v0.8.19...v0.8.20

Assets 2

12 Dec 12:31

kolkov

v0.8.19

ee1c67b

v0.8.19: FindAll ReverseSuffix optimization (87x faster)

Performance

FindAll with ReverseSuffix patterns now dramatically faster:

Pattern	Operation	stdlib	coregex	Speedup
`.*@example\.com`	FindAll (6MB)	316ms	3.6ms	87x faster
`.*@example\.com`	Find (6MB)	~300ms	<1ms	300x+ faster

Changes

FindAll ReverseSuffix optimization (Fixes #41)
- FindIndicesAt() now supports UseReverseSuffix strategy
- Added ReverseSuffixSearcher.FindAt() and FindIndicesAt() methods
ReverseSuffix Find() optimization
- Use bytes.LastIndex for O(n) single-pass suffix search
- Added matchStartZero flag: skip reverse DFA for .* prefix patterns

Install

go get github.com/coregx/coregex@v0.8.19

Full Changelog: v0.8.18...v0.8.19

Assets 2

12 Dec 02:39

kolkov

v0.8.18

ec67c66

v0.8.18: UseTeddy literal engine bypass

Highlights

UseTeddy Strategy (Literal Engine Bypass)

Exact literal alternations like (foo|bar|baz) now skip DFA construction entirely:

Compile time: 109us -> 11us (10x faster)
Memory: 598KB -> 19KB (31x less)
Inspired by Rust regex's literal engine bypass optimization

Teddy Multi-Pattern Prefilter

Alternation patterns now use Teddy SIMD prefilter
(foo|bar|baz|qux): 242x faster than stdlib (was 24x slower)

Other Improvements

ReverseSuffix.Find(): Last-suffix algorithm for greedy semantics
ReverseAnchored.Find(): Zero-allocation using SearchReverse
BoundedBacktracker: O(1) visited tracking with generation counter
Single-char inner literals: Email patterns 11-42x faster

Performance Summary

Pattern	Before	After
(foo\|bar\|baz\|qux)	24x slower	242x faster
(a\|b\|c)+	1.8x slower	2.5x faster
\d+	2x slower	4.5x faster
Email pattern	-	11-42x faster

All tested patterns now faster than Go stdlib!

Full Changelog: https://github.com/coregx/coregex/blob/main/CHANGELOG.md#0818---2025-12-12

Assets 2

11 Dec 22:30

kolkov

v0.8.17

0da45f0

v0.8.17: BoundedBacktracker for character class patterns

What's New

BoundedBacktracker Engine - New execution engine for character class patterns

Performance Improvement

Patterns like \d+, \w+, [a-z]+ are now 2.5x faster than stdlib
Previously these patterns were 2-3x slower than stdlib
Uses recursive backtracking with bit-vector visited tracking for O(1) lookup

How It Works

Automatic strategy selection via UseBoundedBacktracker in meta-engine
Selected when pattern has no good literals for prefiltering
Memory-bounded: max 256KB visited bit vector (falls back to PikeVM for larger inputs)
2-5x faster than PikeVM for simple patterns

Technical Details

New files: nfa/backtrack.go, nfa/backtrack_test.go, nfa/backtrack_bench_test.go
PR #38

Full Changelog: v0.8.16...v0.8.17

Assets 2

11 Dec 11:03

kolkov

v0.8.16

f50ef57

v0.8.16: FindAll, ReplaceAll, and character class optimizations

Performance Improvements

This release completes the performance optimization work from #29, addressing all remaining issues reported by @benhoyt.

Character class pattern optimization (Fixes #33)

Simple patterns like [0-9]+, \d+, \w+ now use NFA directly
Skip DFA overhead when no prefilter benefit
Added isSimpleCharClass() detection in strategy selection

ReplaceAll optimization (Fixes #34)

Pre-allocate result buffer (input + 25%)
Reuse matchIndices buffer across iterations (was allocating per match)

FindAll/FindAllIndex optimization (Fixes #35)

Use FindIndicesAt() instead of FindAt() (avoids Match object creation)
Lazy allocation - only allocate when first match found
Pre-allocate with estimated capacity (10 matches per 1KB)

Benchmark Results

Benchmark	Before	After	Change
Find/hello	619 ns	88 ns	-85% (~7x faster)
OnePassIsMatch	25 ns	20 ns	-19%
LazyDFARepetition	1059 ns	839 ns	-21%

Summary of v0.8.14-v0.8.16 performance work

All issues from #29 are now resolved:

✅ #29: Literal patterns now ~7x faster than stdlib (was 5x slower)
✅ #31: IsMatch() is zero-allocation
✅ #32: FindIndices() is zero-allocation
✅ #33: Character class patterns use smart strategy
✅ #34: ReplaceAll optimized with buffer reuse
✅ #35: FindAll optimized with lazy allocation

Full Changelog: v0.8.15...v0.8.16

Contributors

benhoyt

Assets 2

11 Dec 10:41

kolkov

v0.8.15

76b0d0b

v0.8.15: Zero-allocation IsMatch and FindIndices

Performance Improvements

This release adds zero-allocation methods for hot paths, addressing performance issues reported in #29.

Zero-allocation `IsMatch()` (Fixes #31)

PikeVM.IsMatch() returns immediately on first match without computing positions
0 B/op, 0 allocs/op in hot path
Speedup vs stdlib: 52-1863x faster (depending on input size)

Zero-allocation `FindIndices()` (Fixes #32)

Engine.FindIndices() returns (start, end int, found bool) tuple
0 B/op, 0 allocs/op - no Match object allocation
Used internally by Find() and FindIndex() public API

Changes

Find() and FindIndex() now use FindIndices() internally
isMatchNFA() now uses optimized PikeVM.IsMatch() instead of Search()

Benchmarks

Method	Before	After
`IsMatch()`	48 B/op, 1 allocs	0 B/op, 0 allocs
`FindIndices()` (new)	N/A	0 B/op, 0 allocs

Thanks to @benhoyt for detailed performance analysis!

Full Changelog: v0.8.14...v0.8.15

Contributors

benhoyt

Assets 2

Releases: coregx/coregex

v0.8.24: Longest() mode optimization

Fixed

Performance (Longest() mode)

Install

Uh oh!

v0.8.23: Unicode char class fix

Critical Bug Fix

The Bug

Root Cause

Fix

Affected Patterns

Credit

Uh oh!

v0.8.22: Small string optimization

Small String Optimization (1.4-20x faster)

Key Optimizations

Performance Results

Commits

Contributors

Uh oh!

v0.8.21: CharClassSearcher + ByteClasses compression

What's New

Added

Fixed

Performance Summary

Uh oh!

v0.8.20: UseReverseSuffixSet for multi-suffix patterns (34-385x faster)

Highlights

Performance

Changes

Added

Changed

Files Changed

Uh oh!

v0.8.19: FindAll ReverseSuffix optimization (87x faster)

Performance

Changes

Install

Uh oh!

v0.8.18: UseTeddy literal engine bypass

Highlights

UseTeddy Strategy (Literal Engine Bypass)

Teddy Multi-Pattern Prefilter

Other Improvements

Performance Summary

Uh oh!

v0.8.17: BoundedBacktracker for character class patterns

What's New

Performance Improvement

How It Works

Technical Details

Uh oh!

v0.8.16: FindAll, ReplaceAll, and character class optimizations

Performance Improvements

Character class pattern optimization (Fixes #33)

ReplaceAll optimization (Fixes #34)

FindAll/FindAllIndex optimization (Fixes #35)

Benchmark Results

Summary of v0.8.14-v0.8.16 performance work

Contributors

Uh oh!

v0.8.15: Zero-allocation IsMatch and FindIndices

Performance Improvements

Zero-allocation IsMatch() (Fixes #31)

Zero-allocation FindIndices() (Fixes #32)

Changes

Benchmarks

Contributors

Uh oh!

Zero-allocation `IsMatch()` (Fixes #31)

Zero-allocation `FindIndices()` (Fixes #32)