filter

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 31, 2025 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// DefaultMaxDecompressionRatio is the maximum ratio of decompressed to compressed size.
	// A 100:1 ratio means 1KB compressed can expand to at most 100KB.
	// Most legitimate PDF streams have ratios under 20:1.
	DefaultMaxDecompressionRatio = 100

	// DefaultMaxDecompressedSize is the absolute maximum decompressed size in bytes.
	// This is a hard cap regardless of ratio. 512MB accommodates large images/fonts.
	DefaultMaxDecompressedSize = 512 * 1024 * 1024 // 512MB

	// DefaultMinCompressedSize is the minimum compressed size before ratio checking applies.
	// Very small inputs (< 1KB) may legitimately have high ratios.
	DefaultMinCompressedSize = 1024 // 1KB
)

Decompression safety limits. These protect against "zip bomb" attacks where a small compressed payload expands to an enormous size, exhausting memory.

Variables

View Source
var (
	// ErrDecompressionRatioExceeded indicates the decompression ratio exceeded the limit.
	ErrDecompressionRatioExceeded = errors.New("decompression ratio exceeded maximum allowed")

	// ErrDecompressedSizeExceeded indicates the decompressed size exceeded the absolute limit.
	ErrDecompressedSizeExceeded = errors.New("decompressed size exceeded maximum allowed")
)

Decompression limit errors.

Functions

func ParseFilterParams deprecated

func ParseFilterParams(dict *core.Dictionary) ([]core.Name, []*core.Dictionary, error)

ParseFilterParams extracts filter names and decode parameters from a stream dictionary.

Deprecated: Use core.ParseFilterChain instead for consistent behavior. This function is stricter (returns errors for malformed entries) and has a behavioral difference: a single /DecodeParms dictionary applies to the FIRST filter only, whereas core.ParseFilterChain applies it to ALL filters (which is more spec-compliant).

The /Filter entry can be:

  • A Name (single filter)
  • An Array of Names (filter chain)

The /DecodeParms entry can be:

  • Absent (returns nil params for all filters)
  • A Dictionary (params for single filter or first in chain)
  • An Array of Dictionaries or Nulls (parallel to filter array)

Returns the filter names and corresponding parameters in parallel arrays. If there are no filters, returns empty slices.

Types

type ASCII85Filter

type ASCII85Filter struct{}

ASCII85Filter implements the ASCII85Decode filter. ASCII85Decode is the PDF variant of btoa (base-85) encoding. Groups of 5 ASCII characters (33-117, '!' to 'u') represent 4 bytes. Special case: 'z' represents four zero bytes. The '~>' sequence marks the end of data. Whitespace characters are ignored.

Final groups may be short:

  • 2 chars decode to 1 byte
  • 3 chars decode to 2 bytes
  • 4 chars decode to 3 bytes

func (*ASCII85Filter) Decode

func (f *ASCII85Filter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Decode decodes ASCII85-encoded data. The params dictionary is not used for ASCII85Decode.

func (*ASCII85Filter) Encode

func (f *ASCII85Filter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Encode encodes data to ASCII85 format. For now, this is a stub that will be implemented when needed.

func (*ASCII85Filter) Name

func (f *ASCII85Filter) Name() core.Name

Name returns "ASCII85Decode".

type ASCIIHexFilter

type ASCIIHexFilter struct{}

ASCIIHexFilter implements the ASCIIHexDecode filter. ASCIIHexDecode decodes data encoded in ASCII hexadecimal representation. Each pair of hexadecimal digits (0-9, A-F, a-f) represents one byte. Whitespace characters are ignored. The '>' character marks the end of data (EOD marker). If an odd number of hex digits appear before EOD, a 0 is appended.

func (*ASCIIHexFilter) Decode

func (f *ASCIIHexFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Decode decodes ASCII hex-encoded data. The params dictionary is not used for ASCIIHexDecode.

func (*ASCIIHexFilter) Encode

func (f *ASCIIHexFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Encode encodes data to ASCII hex format. For now, this is a stub that will be implemented when needed.

func (*ASCIIHexFilter) Name

func (f *ASCIIHexFilter) Name() core.Name

Name returns "ASCIIHexDecode".

type CCITTFaxFilter

type CCITTFaxFilter struct{}

CCITTFaxFilter implements the CCITTFaxDecode filter. CCITTFaxDecode uses Group 3 or Group 4 fax compression, commonly used for bilevel (black and white) images.

Parameters from ISO 32000-2 section 7.4.6:

  • K: Determines CCITT compression type (default 0) K < 0: Group 4 (2D) K = 0: Group 3 (1D) K > 0: Mixed 1D/2D (treated as Group 3)
  • Columns: Width in pixels (default 1728, standard fax width)
  • Rows: Height in pixels (default 0 = determine from data)
  • EndOfLine: Whether EOL codes are present (default false)
  • EncodedByteAlign: Whether lines are byte-aligned (default false)
  • EndOfBlock: Whether end-of-block pattern is expected (default true)
  • BlackIs1: If true, 1 bits are black; otherwise 0 bits are black (default false)

func (*CCITTFaxFilter) Decode

func (f *CCITTFaxFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Decode decompresses CCITT Group 3 or Group 4 encoded data.

func (*CCITTFaxFilter) Encode

func (f *CCITTFaxFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Encode is not yet implemented for CCITT fax.

func (*CCITTFaxFilter) Name

func (f *CCITTFaxFilter) Name() core.Name

Name returns "CCITTFaxDecode".

type DCTFilter

type DCTFilter struct{}

DCTFilter implements the DCTDecode filter (JPEG compression). DCTDecode uses the JPEG (DCT) compression algorithm. In PDF, DCT-encoded streams contain complete JPEG file data.

For Phase 1, this is a passthrough filter - the compressed JPEG data is returned as-is. Actual JPEG decoding can be done by the caller using the standard library's image/jpeg package if needed.

Future enhancements could decode to raw pixels for manipulation.

func (*DCTFilter) Decode

func (f *DCTFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Decode passes through JPEG-compressed data without modification. The params dictionary may contain JPEG-specific parameters but is currently ignored since we're doing passthrough.

To actually decode the JPEG image, the caller can use:

img, err := jpeg.Decode(reader)

func (*DCTFilter) Encode

func (f *DCTFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Encode would compress data using JPEG compression. For now, this is a stub that will be implemented when needed.

func (*DCTFilter) Name

func (f *DCTFilter) Name() core.Name

Name returns "DCTDecode".

type DecompressionLimits

type DecompressionLimits struct {
	// MaxRatio is the maximum allowed ratio of decompressed to compressed size.
	// Set to 0 to disable ratio checking (not recommended).
	MaxRatio int

	// MaxSize is the absolute maximum decompressed size in bytes.
	// Set to 0 to disable size checking (not recommended).
	MaxSize int64

	// MinCompressedSize is the minimum compressed size before ratio checking applies.
	// Inputs smaller than this only check MaxSize, not MaxRatio.
	MinCompressedSize int64
}

DecompressionLimits configures safety limits for stream decompression.

func DefaultDecompressionLimits

func DefaultDecompressionLimits() DecompressionLimits

DefaultDecompressionLimits returns the default safety limits.

func NoDecompressionLimits

func NoDecompressionLimits() DecompressionLimits

NoDecompressionLimits returns limits that effectively disable all checks. Use only for trusted input or testing.

type Filter

type Filter interface {
	// Name returns the filter's name (e.g., "FlateDecode", "ASCII85Decode").
	Name() core.Name

	// Decode decodes (decompresses/decrypts) stream data.
	// params may be nil if no decode parameters are specified.
	Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

	// Encode encodes (compresses/encrypts) stream data.
	// params may be nil if no encode parameters are specified.
	Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)
}

Filter is the interface for a single stream filter.

type FlateFilter

type FlateFilter struct{}

FlateFilter implements the FlateDecode filter (zlib/deflate compression). This is the most common compression filter in PDF files.

func (*FlateFilter) Decode

func (f *FlateFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Decode decompresses zlib-compressed data. The params dictionary can contain:

  • /Predictor: PNG predictor function (1 = no predictor, 2 = TIFF, 10-15 = PNG)
  • /Columns: number of samples per row
  • /Colors: number of color components
  • /BitsPerComponent: bits per component

func (*FlateFilter) Encode

func (f *FlateFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Encode compresses data using zlib. The params dictionary can contain:

  • /Predictor: PNG predictor function (1 = no predictor, 2 = TIFF, 10-15 = PNG)
  • /Columns: number of samples per row
  • /Colors: number of color components
  • /BitsPerComponent: bits per component

func (*FlateFilter) Name

func (f *FlateFilter) Name() core.Name

Name returns "FlateDecode".

type JBIG2Filter

type JBIG2Filter struct{}

JBIG2Filter implements the JBIG2Decode filter. JBIG2Decode is an efficient compression method for bilevel images, offering better compression than CCITT fax for most content.

This is a stub for Phase 1. Full implementation requires external JBIG2 library support (not widely available in Go).

func (*JBIG2Filter) Decode

func (f *JBIG2Filter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Decode is not yet implemented for JBIG2.

func (*JBIG2Filter) Encode

func (f *JBIG2Filter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Encode is not yet implemented for JBIG2.

func (*JBIG2Filter) Name

func (f *JBIG2Filter) Name() core.Name

Name returns "JBIG2Decode".

type JPXFilter

type JPXFilter struct{}

JPXFilter implements the JPXDecode filter (JPEG2000 compression). JPXDecode uses JPEG2000 compression, which provides better compression ratios and quality than standard JPEG (DCT).

This is a stub for Phase 1. Full implementation requires JPEG2000 library support (not available in Go standard library).

func (*JPXFilter) Decode

func (f *JPXFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Decode is not yet implemented for JPEG2000.

func (*JPXFilter) Encode

func (f *JPXFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Encode is not yet implemented for JPEG2000.

func (*JPXFilter) Name

func (f *JPXFilter) Name() core.Name

Name returns "JPXDecode".

type LZWFilter

type LZWFilter struct{}

LZWFilter implements the LZWDecode filter. LZW (Lempel-Ziv-Welch) compression was common in older PDFs but has largely been replaced by FlateDecode. It uses a dictionary-based compression algorithm.

func (*LZWFilter) Decode

func (f *LZWFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Decode decompresses LZW-compressed data. The params dictionary can contain:

  • /EarlyChange: 0 or 1 (default 1) 1 = increase code size early (PDF/TIFF style, MSB-first) 0 = increase code size late (GIF style, LSB-first)
  • /Predictor: PNG or TIFF predictor function (same as FlateDecode)
  • /Columns: number of samples per row (for predictor)
  • /Colors: number of color components (for predictor)
  • /BitsPerComponent: bits per component (for predictor)

PDF LZW differs from GIF LZW in several ways:

  • MSB-first bit order (GIF is LSB-first)
  • Early code size increase by default (GIF is late)
  • Initial literal width is 8 bits

func (*LZWFilter) Encode

func (f *LZWFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Encode compresses data using LZW. For now, this is a stub that will be implemented when needed.

func (*LZWFilter) Name

func (f *LZWFilter) Name() core.Name

Name returns "LZWDecode".

type LimitedDecompressor

type LimitedDecompressor struct {
	// contains filtered or unexported fields
}

LimitedDecompressor wraps a decompressing reader with safety limits. It tracks bytes read and enforces ratio and size limits.

func NewLimitedDecompressor

func NewLimitedDecompressor(r io.Reader, compressedSize int64, limits DecompressionLimits) *LimitedDecompressor

NewLimitedDecompressor creates a new limited decompressor. compressedSize should be the size of the compressed input (0 if unknown). If compressedSize is 0, only the absolute size limit is enforced.

func (*LimitedDecompressor) BytesRead

func (l *LimitedDecompressor) BytesRead() int64

BytesRead returns the total bytes read so far.

func (*LimitedDecompressor) Close

func (l *LimitedDecompressor) Close() error

Close closes the underlying reader if it implements io.Closer.

func (*LimitedDecompressor) Read

func (l *LimitedDecompressor) Read(p []byte) (int, error)

Read implements io.Reader with decompression limit enforcement.

type Registry

type Registry struct {
	// contains filtered or unexported fields
}

Registry implements core.FilterRegistry for encoding and decoding stream filters. It maintains a map of filter names to filter implementations.

func NewDefaultRegistry

func NewDefaultRegistry() *Registry

NewDefaultRegistry creates a registry with all standard PDF filters registered.

func NewRegistry

func NewRegistry() *Registry

NewRegistry creates a new empty filter registry with default limits.

func NewRegistryWithLimits

func NewRegistryWithLimits(limits DecompressionLimits) *Registry

NewRegistryWithLimits creates a registry with custom decompression limits.

func (*Registry) Decode

func (r *Registry) Decode(filterName core.Name, params *core.Dictionary, reader io.Reader) (io.Reader, error)

Decode applies the named filter to decode stream data. The decoded output is wrapped with decompression limits for safety. Returns an error if the filter is not registered.

func (*Registry) DecodeChain

func (r *Registry) DecodeChain(data io.Reader, filters []core.Name, params []*core.Dictionary) (io.Reader, error)

DecodeChain applies a chain of filters to decode stream data. Filters are applied in order (first filter in the array is applied first). The params array should be parallel to the filters array, with nil entries allowed.

Note: For stream decoding with size-aware decompression limits, use core.Stream.Decode() or core.Stream.DecodeReader() instead, which automatically handle filter parsing and apply ratio-based limits.

Returns an error if any filter is not registered or if decoding fails.

func (*Registry) DecodeWithSize

func (r *Registry) DecodeWithSize(filterName core.Name, params *core.Dictionary, reader io.Reader, compressedSize int64) (io.Reader, error)

DecodeWithSize applies the named filter with known compressed size for ratio checking.

func (*Registry) Encode

func (r *Registry) Encode(filterName core.Name, params *core.Dictionary, reader io.Reader) (io.Reader, error)

Encode applies the named filter to encode stream data. Returns an error if the filter is not registered.

func (*Registry) Get

func (r *Registry) Get(name core.Name) (Filter, bool)

Get retrieves a filter by name. Returns the filter and true if found, nil and false otherwise.

func (*Registry) Limits

func (r *Registry) Limits() DecompressionLimits

Limits returns the current decompression limits.

func (*Registry) Register

func (r *Registry) Register(filter Filter)

Register adds a filter to the registry. If a filter with the same name already exists, it is replaced.

func (*Registry) SetLimits

func (r *Registry) SetLimits(limits DecompressionLimits)

SetLimits updates the decompression limits.

type RunLengthFilter

type RunLengthFilter struct{}

RunLengthFilter implements the RunLengthDecode filter. RunLengthDecode uses a simple run-length compression algorithm:

  • Byte 0-127: Copy the next (byte+1) bytes literally
  • Byte 128: End of data (EOD) marker
  • Byte 129-255: Repeat the next byte (257-byte) times

This is a simple and fast compression scheme for data with many repeated values (e.g., bilevel fax images).

func (*RunLengthFilter) Decode

func (f *RunLengthFilter) Decode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Decode decodes run-length encoded data. The params dictionary is not used for RunLengthDecode.

func (*RunLengthFilter) Encode

func (f *RunLengthFilter) Encode(r io.Reader, params *core.Dictionary) (io.Reader, error)

Encode would encode data using run-length compression. For now, this is a stub that will be implemented when needed.

func (*RunLengthFilter) Name

func (f *RunLengthFilter) Name() core.Name

Name returns "RunLengthDecode".

type UnsupportedFilterError

type UnsupportedFilterError struct {
	Filter     string
	Reason     string
	Suggestion string
}

func (*UnsupportedFilterError) Error

func (e *UnsupportedFilterError) Error() string