Documentation
¶
Index ¶
- Constants
- Variables
- func ParseFilterParams(dict *core.Dictionary) ([]core.Name, []*core.Dictionary, error)deprecated
- type ASCII85Filter
- type ASCIIHexFilter
- type CCITTFaxFilter
- type DCTFilter
- type DecompressionLimits
- type Filter
- type FlateFilter
- type JBIG2Filter
- type JPXFilter
- type LZWFilter
- type LimitedDecompressor
- type Registry
- func (r *Registry) Decode(filterName core.Name, params *core.Dictionary, reader io.Reader) (io.Reader, error)
- func (r *Registry) DecodeChain(data io.Reader, filters []core.Name, params []*core.Dictionary) (io.Reader, error)
- func (r *Registry) DecodeWithSize(filterName core.Name, params *core.Dictionary, reader io.Reader, ...) (io.Reader, error)
- func (r *Registry) Encode(filterName core.Name, params *core.Dictionary, reader io.Reader) (io.Reader, error)
- func (r *Registry) Get(name core.Name) (Filter, bool)
- func (r *Registry) Limits() DecompressionLimits
- func (r *Registry) Register(filter Filter)
- func (r *Registry) SetLimits(limits DecompressionLimits)
- type RunLengthFilter
- type UnsupportedFilterError
Constants ¶
const ( // DefaultMaxDecompressionRatio is the maximum ratio of decompressed to compressed size. // A 100:1 ratio means 1KB compressed can expand to at most 100KB. // Most legitimate PDF streams have ratios under 20:1. DefaultMaxDecompressionRatio = 100 // DefaultMaxDecompressedSize is the absolute maximum decompressed size in bytes. // This is a hard cap regardless of ratio. 512MB accommodates large images/fonts. DefaultMaxDecompressedSize = 512 * 1024 * 1024 // 512MB // DefaultMinCompressedSize is the minimum compressed size before ratio checking applies. // Very small inputs (< 1KB) may legitimately have high ratios. DefaultMinCompressedSize = 1024 // 1KB )
Decompression safety limits. These protect against "zip bomb" attacks where a small compressed payload expands to an enormous size, exhausting memory.
Variables ¶
var ( // ErrDecompressionRatioExceeded indicates the decompression ratio exceeded the limit. ErrDecompressionRatioExceeded = errors.New("decompression ratio exceeded maximum allowed") // ErrDecompressedSizeExceeded indicates the decompressed size exceeded the absolute limit. ErrDecompressedSizeExceeded = errors.New("decompressed size exceeded maximum allowed") )
Decompression limit errors.
Functions ¶
func ParseFilterParams
deprecated
func ParseFilterParams(dict *core.Dictionary) ([]core.Name, []*core.Dictionary, error)
ParseFilterParams extracts filter names and decode parameters from a stream dictionary.
Deprecated: Use core.ParseFilterChain instead for consistent behavior. This function is stricter (returns errors for malformed entries) and has a behavioral difference: a single /DecodeParms dictionary applies to the FIRST filter only, whereas core.ParseFilterChain applies it to ALL filters (which is more spec-compliant).
The /Filter entry can be:
- A Name (single filter)
- An Array of Names (filter chain)
The /DecodeParms entry can be:
- Absent (returns nil params for all filters)
- A Dictionary (params for single filter or first in chain)
- An Array of Dictionaries or Nulls (parallel to filter array)
Returns the filter names and corresponding parameters in parallel arrays. If there are no filters, returns empty slices.
Types ¶
type ASCII85Filter ¶
type ASCII85Filter struct{}
ASCII85Filter implements the ASCII85Decode filter. ASCII85Decode is the PDF variant of btoa (base-85) encoding. Groups of 5 ASCII characters (33-117, '!' to 'u') represent 4 bytes. Special case: 'z' represents four zero bytes. The '~>' sequence marks the end of data. Whitespace characters are ignored.
Final groups may be short:
- 2 chars decode to 1 byte
- 3 chars decode to 2 bytes
- 4 chars decode to 3 bytes
func (*ASCII85Filter) Decode ¶
func (f *ASCII85Filter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Decode decodes ASCII85-encoded data. The params dictionary is not used for ASCII85Decode.
func (*ASCII85Filter) Encode ¶
func (f *ASCII85Filter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Encode encodes data to ASCII85 format. For now, this is a stub that will be implemented when needed.
type ASCIIHexFilter ¶
type ASCIIHexFilter struct{}
ASCIIHexFilter implements the ASCIIHexDecode filter. ASCIIHexDecode decodes data encoded in ASCII hexadecimal representation. Each pair of hexadecimal digits (0-9, A-F, a-f) represents one byte. Whitespace characters are ignored. The '>' character marks the end of data (EOD marker). If an odd number of hex digits appear before EOD, a 0 is appended.
func (*ASCIIHexFilter) Decode ¶
func (f *ASCIIHexFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Decode decodes ASCII hex-encoded data. The params dictionary is not used for ASCIIHexDecode.
func (*ASCIIHexFilter) Encode ¶
func (f *ASCIIHexFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Encode encodes data to ASCII hex format. For now, this is a stub that will be implemented when needed.
func (*ASCIIHexFilter) Name ¶
func (f *ASCIIHexFilter) Name() core.Name
Name returns "ASCIIHexDecode".
type CCITTFaxFilter ¶
type CCITTFaxFilter struct{}
CCITTFaxFilter implements the CCITTFaxDecode filter. CCITTFaxDecode uses Group 3 or Group 4 fax compression, commonly used for bilevel (black and white) images.
Parameters from ISO 32000-2 section 7.4.6:
- K: Determines CCITT compression type (default 0) K < 0: Group 4 (2D) K = 0: Group 3 (1D) K > 0: Mixed 1D/2D (treated as Group 3)
- Columns: Width in pixels (default 1728, standard fax width)
- Rows: Height in pixels (default 0 = determine from data)
- EndOfLine: Whether EOL codes are present (default false)
- EncodedByteAlign: Whether lines are byte-aligned (default false)
- EndOfBlock: Whether end-of-block pattern is expected (default true)
- BlackIs1: If true, 1 bits are black; otherwise 0 bits are black (default false)
func (*CCITTFaxFilter) Decode ¶
func (f *CCITTFaxFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Decode decompresses CCITT Group 3 or Group 4 encoded data.
func (*CCITTFaxFilter) Encode ¶
func (f *CCITTFaxFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Encode is not yet implemented for CCITT fax.
func (*CCITTFaxFilter) Name ¶
func (f *CCITTFaxFilter) Name() core.Name
Name returns "CCITTFaxDecode".
type DCTFilter ¶
type DCTFilter struct{}
DCTFilter implements the DCTDecode filter (JPEG compression). DCTDecode uses the JPEG (DCT) compression algorithm. In PDF, DCT-encoded streams contain complete JPEG file data.
For Phase 1, this is a passthrough filter - the compressed JPEG data is returned as-is. Actual JPEG decoding can be done by the caller using the standard library's image/jpeg package if needed.
Future enhancements could decode to raw pixels for manipulation.
func (*DCTFilter) Decode ¶
Decode passes through JPEG-compressed data without modification. The params dictionary may contain JPEG-specific parameters but is currently ignored since we're doing passthrough.
To actually decode the JPEG image, the caller can use:
img, err := jpeg.Decode(reader)
type DecompressionLimits ¶
type DecompressionLimits struct {
// MaxRatio is the maximum allowed ratio of decompressed to compressed size.
// Set to 0 to disable ratio checking (not recommended).
MaxRatio int
// MaxSize is the absolute maximum decompressed size in bytes.
// Set to 0 to disable size checking (not recommended).
MaxSize int64
// MinCompressedSize is the minimum compressed size before ratio checking applies.
// Inputs smaller than this only check MaxSize, not MaxRatio.
MinCompressedSize int64
}
DecompressionLimits configures safety limits for stream decompression.
func DefaultDecompressionLimits ¶
func DefaultDecompressionLimits() DecompressionLimits
DefaultDecompressionLimits returns the default safety limits.
func NoDecompressionLimits ¶
func NoDecompressionLimits() DecompressionLimits
NoDecompressionLimits returns limits that effectively disable all checks. Use only for trusted input or testing.
type Filter ¶
type Filter interface {
// Name returns the filter's name (e.g., "FlateDecode", "ASCII85Decode").
Name() core.Name
// Decode decodes (decompresses/decrypts) stream data.
// params may be nil if no decode parameters are specified.
Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)
// Encode encodes (compresses/encrypts) stream data.
// params may be nil if no encode parameters are specified.
Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)
}
Filter is the interface for a single stream filter.
type FlateFilter ¶
type FlateFilter struct{}
FlateFilter implements the FlateDecode filter (zlib/deflate compression). This is the most common compression filter in PDF files.
func (*FlateFilter) Decode ¶
func (f *FlateFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Decode decompresses zlib-compressed data. The params dictionary can contain:
- /Predictor: PNG predictor function (1 = no predictor, 2 = TIFF, 10-15 = PNG)
- /Columns: number of samples per row
- /Colors: number of color components
- /BitsPerComponent: bits per component
func (*FlateFilter) Encode ¶
func (f *FlateFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Encode compresses data using zlib. The params dictionary can contain:
- /Predictor: PNG predictor function (1 = no predictor, 2 = TIFF, 10-15 = PNG)
- /Columns: number of samples per row
- /Colors: number of color components
- /BitsPerComponent: bits per component
type JBIG2Filter ¶
type JBIG2Filter struct{}
JBIG2Filter implements the JBIG2Decode filter. JBIG2Decode is an efficient compression method for bilevel images, offering better compression than CCITT fax for most content.
This is a stub for Phase 1. Full implementation requires external JBIG2 library support (not widely available in Go).
func (*JBIG2Filter) Decode ¶
func (f *JBIG2Filter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Decode is not yet implemented for JBIG2.
func (*JBIG2Filter) Encode ¶
func (f *JBIG2Filter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Encode is not yet implemented for JBIG2.
type JPXFilter ¶
type JPXFilter struct{}
JPXFilter implements the JPXDecode filter (JPEG2000 compression). JPXDecode uses JPEG2000 compression, which provides better compression ratios and quality than standard JPEG (DCT).
This is a stub for Phase 1. Full implementation requires JPEG2000 library support (not available in Go standard library).
type LZWFilter ¶
type LZWFilter struct{}
LZWFilter implements the LZWDecode filter. LZW (Lempel-Ziv-Welch) compression was common in older PDFs but has largely been replaced by FlateDecode. It uses a dictionary-based compression algorithm.
func (*LZWFilter) Decode ¶
Decode decompresses LZW-compressed data. The params dictionary can contain:
- /EarlyChange: 0 or 1 (default 1) 1 = increase code size early (PDF/TIFF style, MSB-first) 0 = increase code size late (GIF style, LSB-first)
- /Predictor: PNG or TIFF predictor function (same as FlateDecode)
- /Columns: number of samples per row (for predictor)
- /Colors: number of color components (for predictor)
- /BitsPerComponent: bits per component (for predictor)
PDF LZW differs from GIF LZW in several ways:
- MSB-first bit order (GIF is LSB-first)
- Early code size increase by default (GIF is late)
- Initial literal width is 8 bits
type LimitedDecompressor ¶
type LimitedDecompressor struct {
// contains filtered or unexported fields
}
LimitedDecompressor wraps a decompressing reader with safety limits. It tracks bytes read and enforces ratio and size limits.
func NewLimitedDecompressor ¶
func NewLimitedDecompressor(r io.Reader, compressedSize int64, limits DecompressionLimits) *LimitedDecompressor
NewLimitedDecompressor creates a new limited decompressor. compressedSize should be the size of the compressed input (0 if unknown). If compressedSize is 0, only the absolute size limit is enforced.
func (*LimitedDecompressor) BytesRead ¶
func (l *LimitedDecompressor) BytesRead() int64
BytesRead returns the total bytes read so far.
func (*LimitedDecompressor) Close ¶
func (l *LimitedDecompressor) Close() error
Close closes the underlying reader if it implements io.Closer.
type Registry ¶
type Registry struct {
// contains filtered or unexported fields
}
Registry implements core.FilterRegistry for encoding and decoding stream filters. It maintains a map of filter names to filter implementations.
func NewDefaultRegistry ¶
func NewDefaultRegistry() *Registry
NewDefaultRegistry creates a registry with all standard PDF filters registered.
func NewRegistry ¶
func NewRegistry() *Registry
NewRegistry creates a new empty filter registry with default limits.
func NewRegistryWithLimits ¶
func NewRegistryWithLimits(limits DecompressionLimits) *Registry
NewRegistryWithLimits creates a registry with custom decompression limits.
func (*Registry) Decode ¶
func (r *Registry) Decode(filterName core.Name, params *core.Dictionary, reader io.Reader) (io.Reader, error)
Decode applies the named filter to decode stream data. The decoded output is wrapped with decompression limits for safety. Returns an error if the filter is not registered.
func (*Registry) DecodeChain ¶
func (r *Registry) DecodeChain(data io.Reader, filters []core.Name, params []*core.Dictionary) (io.Reader, error)
DecodeChain applies a chain of filters to decode stream data. Filters are applied in order (first filter in the array is applied first). The params array should be parallel to the filters array, with nil entries allowed.
Note: For stream decoding with size-aware decompression limits, use core.Stream.Decode() or core.Stream.DecodeReader() instead, which automatically handle filter parsing and apply ratio-based limits.
Returns an error if any filter is not registered or if decoding fails.
func (*Registry) DecodeWithSize ¶
func (r *Registry) DecodeWithSize(filterName core.Name, params *core.Dictionary, reader io.Reader, compressedSize int64) (io.Reader, error)
DecodeWithSize applies the named filter with known compressed size for ratio checking.
func (*Registry) Encode ¶
func (r *Registry) Encode(filterName core.Name, params *core.Dictionary, reader io.Reader) (io.Reader, error)
Encode applies the named filter to encode stream data. Returns an error if the filter is not registered.
func (*Registry) Get ¶
Get retrieves a filter by name. Returns the filter and true if found, nil and false otherwise.
func (*Registry) Limits ¶
func (r *Registry) Limits() DecompressionLimits
Limits returns the current decompression limits.
func (*Registry) Register ¶
Register adds a filter to the registry. If a filter with the same name already exists, it is replaced.
func (*Registry) SetLimits ¶
func (r *Registry) SetLimits(limits DecompressionLimits)
SetLimits updates the decompression limits.
type RunLengthFilter ¶
type RunLengthFilter struct{}
RunLengthFilter implements the RunLengthDecode filter. RunLengthDecode uses a simple run-length compression algorithm:
- Byte 0-127: Copy the next (byte+1) bytes literally
- Byte 128: End of data (EOD) marker
- Byte 129-255: Repeat the next byte (257-byte) times
This is a simple and fast compression scheme for data with many repeated values (e.g., bilevel fax images).
func (*RunLengthFilter) Decode ¶
func (f *RunLengthFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Decode decodes run-length encoded data. The params dictionary is not used for RunLengthDecode.
func (*RunLengthFilter) Encode ¶
func (f *RunLengthFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)
Encode would encode data using run-length compression. For now, this is a stub that will be implemented when needed.
func (*RunLengthFilter) Name ¶
func (f *RunLengthFilter) Name() core.Name
Name returns "RunLengthDecode".
type UnsupportedFilterError ¶
func (*UnsupportedFilterError) Error ¶
func (e *UnsupportedFilterError) Error() string