Skip to content

Filters

This document specifies the filter methods used in 7z archives for pre-processing data before compression.

Overview

Filters transform data to improve subsequent compression. They do not reduce size themselves but make patterns more compressible.

Typical usage: Data → [Filter] → [Compressor] → Compressed

Filter Types

TypePurpose
BCJ (Branch/Call/Jump)Transform relative addresses to absolute
DeltaTransform adjacent byte differences

Delta Filter

Method ID: 0x03Properties: 1 byte (optional) Support: Mandatory

Transforms data by storing differences between bytes at fixed intervals.

Delta Properties

ByteDescription
0Delta distance (1-256, default 1)

Encoding: Property byte = distance - 1

PropertyDistance
0x001 (default)
0x012
0xFF256

Delta Algorithm

Encoding:

output[i] = input[i] - input[i - distance]

Decoding:

output[i] = input[i] + output[i - distance]

For i < distance, treat missing values as 0.

Use Cases

  • Audio samples (distance = bytes per sample)
  • Multi-channel data
  • Gradual value changes

BCJ Filters

BCJ (Branch/Call/Jump) filters transform relative addresses in executable code to absolute addresses. This improves compression because absolute addresses are more predictable across nearby instructions.

Common BCJ Properties

Most BCJ filters accept an optional 4-byte property:

BytesDescription
0-3Start offset for address calculation (UINT32, little-endian)

Property presence:

  • If HasProperties flag is clear or PropertiesSize is 0: start offset = 0
  • If present: 4 bytes interpreted as UINT32 little-endian start offset

Zero-length properties (HasProperties set but PropertiesSize = 0) are equivalent to absent properties.

x86/x64 BCJ

Method ID: 0x03 0x03 0x01 0x03Alternate ID: 0x04 (simple form) Support: Mandatory

Transforms x86/x64 CALL and JMP instructions.

Targeted instructions:

  • E8 (CALL rel32)
  • E9 (JMP rel32)

Alignment: 5 bytes (opcode + 4-byte offset)

x86 BCJ2

Method ID: 0x03 0x03 0x01 0x1BSupport: Optional (decompression only for some implementations)

Advanced x86 filter that separates branch target addresses into dedicated streams for better compression.

Input streams: 4 (during decompression) Output streams: 1

BCJ2 Stream Definition (Normative)

Stream IndexContentEncoding
0Main data with address placeholdersRaw bytes
1CALL (E8) displacement valuesLittle-endian UINT32 sequence
2JMP (E9) displacement valuesLittle-endian UINT32 sequence
3Selector streamRange coder (selects CALL/JMP/neither)

Stream ordering requirement: Stream indices 0, 1, 2, 3 MUST appear in ascending order in the folder's PackStreamIndex array. Implementations MUST NOT reorder these streams. During decompression, the BCJ2 decoder reads from all 4 input streams simultaneously and produces a single output stream with reconstructed relative addresses.

BCJ2 provides better compression than BCJ but is more complex to implement.

ARM BCJ

Method ID: 0x03 0x03 0x05 0x01Alternate ID: 0x07Support: Recommended

Transforms ARM (32-bit) branch instructions.

Alignment: 4 bytes

ARM64 BCJ

Method ID: 0x0ASupport: Recommended

Transforms ARM64 (AArch64) branch instructions.

Targeted instructions:

  • BL (branch with link)
  • B (unconditional branch)

Alignment: 4 bytes

ARM Thumb BCJ

Method ID: 0x03 0x03 0x07 0x01Alternate ID: 0x08Support: Optional

Transforms ARM Thumb mode instructions.

Alignment: 2 bytes

PowerPC BCJ

Method ID: 0x03 0x03 0x02 0x05Alternate ID: 0x05Support: Optional

Transforms PowerPC branch instructions.

Alignment: 4 bytes

IA-64 BCJ

Method ID: 0x03 0x03 0x04 0x01Alternate ID: 0x06Support: Optional

Transforms Intel IA-64 (Itanium) branch instructions.

Alignment: 16 bytes

SPARC BCJ

Method ID: 0x03 0x03 0x08 0x05Alternate ID: 0x09Support: Optional

Transforms SPARC branch instructions.

Alignment: 4 bytes

RISC-V BCJ

Method ID: 0x0BSupport: Optional

Transforms RISC-V branch and jump instructions.

Alignment: 2 bytes (compressed) or 4 bytes (standard)

Filter Summary Table

FilterMethod IDPropertiesAlignment
Delta031 byteN/A
BCJ x8603 03 01 034 bytes opt5
BCJ2 x8603 03 01 1BNoneComplex
BCJ PPC03 03 02 054 bytes opt4
BCJ IA6403 03 04 014 bytes opt16
BCJ ARM03 03 05 014 bytes opt4
BCJ ARMT03 03 07 014 bytes opt2
BCJ SPARC03 03 08 054 bytes opt4
ARM640A4 bytes opt4
RISC-V0B4 bytes opt2/4

Alternate (Short) Method IDs

Some filters have short alternate IDs for backward compatibility:

Short IDFull IDFilter
0403 03 01 03BCJ x86
0503 03 02 05BCJ PPC
0603 03 04 01BCJ IA64
0703 03 05 01BCJ ARM
0803 03 07 01BCJ ARMT
0903 03 08 05BCJ SPARC

Implementations MUST accept both forms when reading.

Writing guidance: Writers SHOULD use the short form for maximum compatibility with older tools. The short form is universally supported; the long form may not be recognized by all implementations.

Filter Chaining

Filters are typically chained with compressors:

Common Chains

Executables (x86):

Data → [BCJ x86] → [LZMA2] → Compressed

Multi-channel audio:

Data → [Delta, distance=2] → [LZMA2] → Compressed

Coder Order in Folder

In folder definitions, coders are listed in decompression order (reverse of compression):

Compression: Input → BCJ → LZMA2 → Output Folder coders: [LZMA2, BCJ] Bind pairs: BCJ input binds to LZMA2 output

BCJ Algorithm Details

x86 BCJ Transformation

Encoding (compression):

for each position i:
    if byte[i] == 0xE8 or byte[i] == 0xE9:
        if (i + 5) is aligned:
            offset = read_i32_le(i + 1)
            absolute = i + 5 + offset
            write_i32_le(i + 1, absolute)

Decoding (decompression):

for each position i:
    if byte[i] == 0xE8 or byte[i] == 0xE9:
        if (i + 5) is aligned:
            absolute = read_i32_le(i + 1)
            offset = absolute - (i + 5)
            write_i32_le(i + 1, offset)

State Management

BCJ filters maintain state (current position) for address calculation. When used in solid archives:

  • State continues across file boundaries
  • Or state resets at each file (implementation-defined)

Filter Selection Guidance

Content TypeRecommended Filter
x86/x64 executablesBCJ x86
ARM executablesBCJ ARM or ARM64
Audio (16-bit stereo)Delta, distance=4
Audio (16-bit mono)Delta, distance=2
Generic binaryNone
TextNone

Implementation Notes

Filter Detection

Filters are transparent to data integrity—filtered + unfiltered data produces identical results after decompression.

Performance

  • BCJ filters are very fast (simple byte scanning)
  • Delta filter is extremely fast (single subtraction per byte)
  • Filtering overhead is negligible compared to compression

Reversibility

All filters are reversible. Applying encode then decode (or vice versa) produces original data.

See Also

Released under MIT OR Apache-2.0 License