Implementation details

This page covers performance considerations, caching mechanisms, precompilation, and other implementation details of Semisimple.jl.

Caching

Semisimple.jl uses several internal caches to avoid recomputing expensive results. Understanding these caches is important for benchmarking and memory management.

Available caches

Semisimple.jl maintains ten internal caches. Six are unbounded Dict caches for small singletons and lookup tables; four are bounded LRU caches (from LRUCache.jl) whose total memory budget is configurable at runtime via configure_caches!.

CacheVariableTypePurpose
Root systemSemisimple._root_system_cacheDictSingleton RootSystem instances per Dynkin type
Positive roots setSemisimple._positive_roots_set_cacheDictFast is_positive_root lookup sets
Longest Weyl elementSemisimple._longest_element_cacheDictCached longest element w₀ per Dynkin type
Coset representativesSemisimple._coset_reps_cacheDictWeyl orbit coset reps for exceptional types
Dominant character (type)Semisimple._dominant_character_type_cacheDictType-level Freudenthal intermediates
Weyl dimension dataSemisimple._weyl_dimension_data_cacheDictDimension formula denominator and scaled roots
Dominant characterSemisimple._dominant_character_cacheLRUDominant weight multiplicities from Freudenthal's formula
Tensor productSemisimple._tensor_cacheLRUTensor product decompositions
Symmetric powerSemisimple._symmetric_power_cacheLRUSymmetric power decompositions
Exterior powerSemisimple._exterior_power_cacheLRUExterior power decompositions

The six Dict caches are unbounded and persist for the lifetime of the Julia session. The four LRU caches have a configurable memory budget (default: 25 % of system RAM, minimum 256 MiB) and automatically evict least-recently-used entries when the budget is exceeded.

Why the dominant character cache matters

The dominant character cache is the main performance lever for downstream operations. Tensor products, symmetric/exterior powers, and plethysms call dominant_character repeatedly for the same highest weights.

Inspecting caches

Use cache_info to get a snapshot of cache occupancy:

using Semisimple

# Snapshot before any work
info = cache_info()
println("Tensor cache: ", info.tensor.length, " entries (max ", info.tensor.maxsize, " bytes)")

# Populate some caches by doing computations
ω₁ = fundamental_weight(TypeE{8}, 1)
freudenthal_formula(ω₁)
tensor_product(ω₁, ω₁)

# Snapshot after
info = cache_info()
println("Dominant character cache: ", info.dominant_character.length, " entries")
println("Tensor cache: ", info.tensor.length, " entries")

Clearing caches

Use clear_caches! (or its alias clear_all_caches!) to empty every cache at once:

using Semisimple

# Do some computations
ω₁ = fundamental_weight(TypeA{2}, 1)
tensor_product(ω₁, ω₁)
freudenthal_formula(ω₁)
symmetric_power(ω₁, 3)

# Clear everything
clear_caches!()

This is particularly useful for:

  • Benchmarking cold-start performance — measure how long operations take without cached results
  • Memory management — free memory after large computations (e.g., after computing many E₈ tensor products)
  • Reproducible testing — ensure tests start from a clean state

Individual cache variables have underscored names and are internal implementation details. Prefer the public clear_caches! and configure_caches! APIs.

Configuring cache budgets

Use configure_caches! to resize the LRU caches at runtime. The budget (in bytes) controls the total memory envelope; the four fraction arguments determine how it is divided:

using Semisimple

# Give caches 512 MiB total
configure_caches!(budget = 512 * 1024^2)

# Custom split: 50 % tensor, 30 % dominant, 10 % each for Sym/⋀
configure_caches!(
  budget = 512 * 1024^2,
  dominant_frac = 0.30,
  tensor_frac = 0.50,
  sym_power_frac = 0.10,
  ext_power_frac = 0.10,
)

Default fractions: dominant 30 %, tensor 40 %, symmetric 15 %, exterior 15 %. The default total budget is 25 % of system RAM (minimum 256 MiB). These defaults can also be set persistently via Julia's Preferences.jl (keys: cache_budget, dominant_frac, tensor_frac, sym_power_frac, ext_power_frac).

Cache invalidation

Caches are never invalidated by code changes — all cached functions are pure (same inputs always produce same outputs). However, cached entries can disappear in three ways:

  • You explicitly clear a cache (via clear_caches! or empty!(...))
  • An LRU cache evicts least-recently-used entries when its memory budget is exceeded
  • Your Julia session ends

Automatic eviction only affects the four bounded LRU caches. The six unbounded Dict caches persist until cleared or session end.

This design is safe because:

  • Dynkin types are immutable compile-time constants
  • Weights are immutable SVector objects
  • All cached functions are pure — re-computing an evicted entry always gives the same result

Precompilation

Semisimple.jl precompiles many commonly-used methods to reduce first-call latency. When you load the package with using Semisimple, the precompilation work has already been done.

What gets precompiled

The package precompiles the following operations for all simple Dynkin types up to rank 9 (plus the exceptional types):

Dynkin types precompiled:

  • TypeA{1} through TypeA{9}
  • TypeB{2} through TypeB{9}
  • TypeC{2} through TypeC{9}
  • TypeD{4} through TypeD{9}
  • TypeE{6}, TypeE{7}, TypeE{8}
  • TypeF4
  • TypeG2

Operations precompiled:

  • cartan_matrix, cartan_symmetrizer, cartan_bilinear_form, cartan_matrix_inverse
  • _make_root_system (internal root system construction)
  • _weyl_denominator, _weyl_dim_scaled_roots (Weyl dimension formula internals)
  • degree (representation dimension)
  • conjugate_dominant_weight (dominant weight conjugation)
  • conjugate_dominant_weight_with_length (Borel–Weil–Bott hot path)
  • weyl_orbit (Weyl orbit generation)
  • Weyl group actions (* operator for roots and weights with Weyl elements)
  • freudenthal_formula (weight multiplicities)
  • dot_reduce (weight normalization)
  • lr_tensor_product (Littlewood–Richardson rule for Type A)

Why precompilation matters

Without precompilation, the first call to a method triggers just-in-time (JIT) compilation, which can take hundreds of milliseconds. With precompilation, these methods are ready to use immediately:

using Semisimple

# First call is fast due to precompilation
@time degree(fundamental_weight(TypeE{8}, 1))  # ~0.001s

# Without precompilation, this would take ~0.5s for the first call

What is NOT precompiled

Operations involving:

  • Product Dynkin types (e.g., ProductDynkinType{Tuple{TypeA{2}, TypeB{3}}})
  • Rank ≥ 10 simple types (e.g., TypeA{15})
  • Specific high-dimensional computations (e.g., tensor_product(ω₇, ω₇) for E₈)

These will experience first-call latency but will be fast on subsequent calls (after JIT compilation).

Performance characteristics

Compile-time vs. run-time

Semisimple.jl leverages Julia's type system and @generated functions to move many computations to compile time:

Compile-Time (Type-Level)Run-Time
Dynkin type classificationWeight coordinate values
Rank of Dynkin typeWeight lattice arithmetic
Cartan matrix entriesWeyl orbit traversal
Root system enumerationFreudenthal recursion
Weyl denominator productCharacter multiplication

This means that cartan_matrix(TypeE{8}) produces a compile-time constant SMatrix that is embedded directly into your compiled code — there's no matrix allocation at runtime.

Memory usage

OperationMemory FootprintNotes
RootSystem{TypeE{8}}~15 KBSingleton, cached per type
WeightLatticeElem8R bytesR = rank; stored as SVector{R,Int}
WeylGroupElem~40 + L bytesWord stored as Vector{UInt8}; L = word length
WeylCharacter~24 + 40N bytesN = number of terms in the character
Freudenthal cache (E₈ adjoint)~40 KB3,875 weight multiplicities

For large-scale computations (e.g., thousands of E₈ tensor products), the character-related LRU caches will automatically evict old entries once their memory budget is reached. Use configure_caches! to increase the budget, or clear_caches! to free memory immediately.

Asymptotic complexity

OperationTime ComplexityNotes
degree(λ)O(N²)N = number of positive roots
freudenthal_formula(λ)O(M·N)M =
tensor_product(λ, μ) (BK)O(M·W·d)W = Weyl group order, d = dim V(smaller weight)
tensor_product(λ, μ) (LR, Type A)O(n³)n = max(
symmetric_power(λ, k)O(k²·T)T = cost of one tensor product
weyl_orbit(λ)O(W·R·R)W = orbit size ≤ Weyl order, R = rank

For reproducible performance measurements, see the benchmark scripts in benchmark/.

Type stability

Semisimple.jl is designed for complete type stability:

using Semisimple

ω₁ = fundamental_weight(TypeE{8}, 1)
typeof(ω₁)  # WeightLatticeElem{TypeE{8}, 8} — concrete type

ch = freudenthal_formula(ω₁)
typeof(ch)  # Dict{SVector{8, Int64}, Int64} — concrete type

result = tensor_product(ω₁, ω₁)
typeof(result)  # WeylCharacter{TypeE{8}, 8} — concrete type

All public APIs return concrete types, enabling aggressive compiler optimizations. There are no type instabilities in hot paths.

Numerical precision

All computations use exact integer arithmetic — there are no floating-point operations:

  • Weights are SVector{R, Int} — exact integer vectors
  • Multiplicities are Int — exact integer counts
  • Dimensions are computed exactly (Weyl dimension formula uses BigInt for large products)
  • Inner products use scaled integer forms to avoid division

This means:

  • No numerical stability concerns — safe for arbitrarily large representations
  • Overflow protection — dimension computations automatically promote to BigInt when needed

Example:

julia> ω7 = fundamental_weight(TypeE{8}, 7);

julia> degree(ω7)  # 147,250 × 2⁶⁰ — too large for Int64
170141183460469137866240

julia> typeof(degree(ω7))
BigInt

Thread safety

Some caches are not thread-safe

The small Dict singleton/type-data caches are protected by locks where they are populated through public APIs. The bounded LRU character caches are not synchronized, so concurrent cache-populating calls such as dominant_character, tensor_product, symmetric_power, or exterior_power can race.

Safe: Using Semisimple.jl from a single thread (the default)

Safe: Read-only operations from multiple threads after warming up caches

Unsafe: Calling LRU-cache-populating representation operations from multiple threads simultaneously

If you need parallel computation, populate caches in a single-threaded warm-up phase, then perform read-only operations in parallel.

Comparison with LiE

Semisimple.jl reimplements many algorithms from the LiE computer algebra system. Key differences:

AspectLiE (C)Semisimple.jl (Julia)
LanguageC (CWEB literate programming)Julia (pure Julia)
Type systemRuntime group structsCompile-time Dynkin type parameters
Cartan matricesRuntime matrix allocationCompile-time SMatrix constants
CachingPermanent "long-life" objectsBounded LRU caches + Dict singletons
Hot performanceFast (compiled C)Fast (JIT-compiled, with caching)
Cold performanceInstant (no compilation)Slow first call (JIT overhead)

For hot operations (cached, precompiled), Semisimple.jl matches or exceeds LiE's performance. For cold operations, LiE is faster due to no JIT compilation delay.

Implementation philosophy

Semisimple.jl follows these design principles:

  1. Type-level dispatch — Use Julia's type system to specialize code for each Dynkin type
  2. Compile-time constants — Leverage @generated functions to embed mathematical data
  3. Immutability — All core types are immutable for thread safety and optimization
  4. Caching — Trade memory for speed by memoizing expensive computations
  5. Minimal dependencies — StaticArrays.jl, LRUCache.jl, PrecompileTools.jl, Preferences.jl, and LinearAlgebra (stdlib)
  6. Pure Julia — No C/Fortran, enabling introspection and compilation to other targets

These principles enable aggressive compiler optimizations while maintaining mathematical rigor.

Weyl orbit traversal

Weyl orbits are computed by the internal module Weylloop.jl, which implements LiE-style systematic orbit traversal. Rather than a hash-set BFS that scales with orbit size, it converts weight coordinates to the ε-basis where classical Weyl subgroups act as permutations (type A) or permutations + sign flips (types B/C/D). Orbits are enumerated via lexicographic permutation generation and Gray-code sign flips, eliminating the $O(|\text{orbit}|)$ hash-set overhead that would otherwise dominate for large orbits (e.g., E₈ orbits with millions of elements). For exceptional types, precomputed coset representatives reduce the problem to the classical case.

API reference

Semisimple.clear_all_caches!Function
clear_all_caches!()

Clear all internal caches used by Semisimple.jl. Alias for clear_caches!.

Examples

julia> using Semisimple

julia> clear_all_caches!()
source
Semisimple.clear_caches!Function
clear_caches!()

Empty every internal cache in Semisimple.jl (both bounded Dict caches and LRU caches).

Examples

julia> using Semisimple

julia> ω1 = fundamental_weight(TypeA{2}, 1);

julia> tensor_product(ω1, ω1);  # populates caches

julia> clear_caches!()
source
Semisimple.configure_caches!Function
configure_caches!(; budget=nothing, dominant_frac=nothing, tensor_frac=nothing,
                    sym_power_frac=nothing, ext_power_frac=nothing)

Resize the LRU caches at runtime. Unspecified keyword arguments retain their current values. The budget is in bytes and controls the total memory envelope; the four fraction arguments determine how it is divided among the caches (they need not sum to 1 — each cache is sized independently as budget * frac).

Examples

julia> using Semisimple

julia> configure_caches!(budget = 512 * 1024^2)  # 512 MiB total
source
Semisimple.cache_infoFunction
cache_info() -> NamedTuple

Return a snapshot of the current cache occupancy. Each entry is a NamedTuple with fields length (number of entries) and maxsize (capacity in bytes).

Examples

julia> using Semisimple

julia> info = cache_info();

julia> info.tensor.length >= 0
true
source

Internals reference

These are internal functions not part of the public API. They are documented here for contributors and advanced users.

Root system internals

Semisimple._root_system_cacheConstant
RootSystem(::Type{DT}) -> RootSystem{DT,R,N}

Return the root system for Dynkin type DT. A single instance is cached per Dynkin type, with small ranks using fully generated literals and larger ranks using a compact runtime builder.

Examples

julia> using Semisimple

julia> RootSystem(TypeA{2}) === RootSystem(TypeA{2})
true
source
Semisimple._compute_positive_roots_and_reflectionsFunction

Compute positive roots, coroots, and the reflection table from a Cartan matrix.

This uses the standard algorithm: start with simple roots and iteratively apply simple reflections to discover new positive roots.

Returns:

  • pos_roots::Vector{SVector{R,Int}} — positive roots in simple root coordinates
  • pos_coroots::Vector{SVector{R,Int}} — positive coroots
  • refl::Matrix{UInt} — reflection table
source
Semisimple._make_root_system_runtimeFunction
_make_root_system_runtime(::Type{DT}) -> RootSystem{DT,R,N}

Compact runtime builder for root systems. This is used directly for higher ranks and via a small generated wrapper for larger mid-rank types to avoid emitting enormous tuple literals into method bodies.

source

Weyl group internals

Semisimple._explain_rmulFunction

Internal: determines what right-multiplication by s does to word x.

Returns (insert::Bool, position::Int, letter::UInt8):

  • if insert=true: insert letter at position
  • if insert=false: delete the element at position
source
Semisimple.weylloopFunction
weylloop(action!, ::Type{DT}, v::AbstractVector{<:Integer})

Call action!(w) once for each weight w in the Weyl orbit of v, where v and w are in the fundamental weight (ω) basis.

Uses LiE's ε-basis algorithm: converts to ε-coordinates where a classical subgroup acts by permutations (type A) or permutations + sign changes (types B/C/D), then enumerates orbits systematically via lexicographic permutation generation and Gray-code sign flips — with no hash set or BFS.

For exceptional types, coset representatives W / W_classical are precomputed as matrices.

All transforms dispatch on the Dynkin type, enabling the compiler to inline them and unroll fixed-size loops. The hot per-orbit workspace uses stack-allocated MVector from StaticArrays; the deduplicated suborbit representatives still live in a small heap vector because their count depends on the Dynkin type and the input weight.

action! receives a mutable workspace vector; it must NOT hold a reference to this vector across calls (copy if needed).

source

Cache internals

Semisimple._apply_cache_preferences!Function
_apply_cache_preferences!()

Read Preferences-based cache settings and resize caches accordingly. Called from __init__() so that user preferences take effect on load.

Recognised preferences (set via Preferences.set_preferences!):

KeyTypeDefault
cache_budgetInt25% of RAM (≥256 MiB)
dominant_fracFloat640.30
tensor_fracFloat640.40
sym_power_fracFloat640.15
ext_power_fracFloat640.15
source

Character internals

Semisimple.dot_reduceFunction
dot_reduce(λ::WeightLatticeElem{DT,R}) -> Tuple{Int, WeightLatticeElem{DT,R}}

Compute the "dot-action reduction" of λ:

Return (ε, μ) where:

  • μ is the dominant weight such that w ⋅ λ = μ under the dot action w ⋅ λ = w(λ + ρ) - ρ for some Weyl group element w
  • ε = (-1)^{ℓ(w)} is the sign of w, or ε = 0 if λ + ρ is singular (lies on a Weyl chamber wall)

This is the key ingredient in the Brauer–Klimyk algorithm.

source
Semisimple.brauer_klimykFunction
brauer_klimyk(char::Dict{SVector{R,Int}, <:Integer}, μ::WeightLatticeElem{DT,R}) -> WeylCharacter{DT,R}

Tensor the representation with weight multiplicities char (as from freudenthal_formula) with the irreducible representation $\mathrm{V}(μ)$, using the Brauer–Klimyk formula:

$\mathrm{V} \otimes \mathrm{V}(μ) = \sum_{\text{weights } λ \text{ of } \mathrm{V}} m(λ) \cdot ε(λ+μ) \cdot \mathrm{V}(ν(λ+μ))$

where (ε, ν) = dot_reduce(μ + λ). Accepts BigInt-valued multiplicities (as returned by freudenthal_formula); the resulting irreducible multiplicities are stored as Int in the returned WeylCharacter, so this will throw InexactError if any of them exceeds typemax(Int64).

source
Semisimple._brauer_klimyk_dominantFunction
_brauer_klimyk_dominant(dom_char::Dict{SVector{R,Int}, <:Integer}, μ::WeightLatticeElem{DT,R}) -> WeylCharacter{DT,R}

Like brauer_klimyk, but takes a dominant-only character dict (as returned by dominant_character) and expands Weyl orbits on-the-fly using weylloop. This avoids materializing the full weight system and eliminates hash-set overhead for large orbits.

The Brauer–Klimyk formula is: $\mathrm{V} \otimes \mathrm{V}(μ) = \sum_{\text{dom. wts } λ_d} m(λ_d) \sum_{w \in W \cdot λ_d} ε(w(λ_d)+μ) \cdot \mathrm{V}(ν(w(λ_d)+μ))$

Accepts BigInt-valued multiplicities; the resulting irreducible multiplicities are stored as Int in the returned WeylCharacter and will throw InexactError if any of them exceeds typemax(Int64).

source
Semisimple._vdecompFunction
_vdecomp(::Type{DT}, dom_char::Dict{SVector{R,Int},<:Integer}) -> WeylCharacter{DT,R}

Virtual decomposition: given a character as a dict of weight multiplicities (dominant weights only, as produced by Adams operators), decompose into irreducibles using the Weyl orbit / alternating-dominant method.

This is the analogue of LiE's Vdecomp function.

source
Semisimple._tensor_charactersFunction
_tensor_characters(V::WeylCharacter{DT,R}, W::WeylCharacter{DT,R}) -> WeylCharacter{DT,R}

Tensor product of two virtual characters (each decomposed into irreducibles).

source

Littlewood–Richardson internals (type A)

Semisimple._weight_to_partitionFunction
_weight_to_partition(λ::WeightLatticeElem{TypeA{N},N}) -> Vector{Int}

Convert a dominant weight in fundamental weight coordinates to a partition with $N+1$ parts (for $\mathrm{GL}_{N+1}$).

For $\mathrm{A}_N$, the dominant weight $λ = (λ_1, …, λ_N)$ in the fundamental weight basis corresponds to the partition $μ = (μ_1 ≥ μ_2 ≥ ⋯ ≥ μ_N ≥ 0)$ where $μ_i = λ_i + λ_{i+1} + ⋯ + λ_N$ (partial sums from right to left), with $μ_{N+1} = 0$.

source
Semisimple._partition_to_weightFunction
_partition_to_weight(::Type{TypeA{N}}, p::Vector{Int}) -> WeightLatticeElem{TypeA{N},N}

Convert a partition back to a dominant weight in the fundamental weight basis for $\mathrm{SL}_{N+1}$. First reduces the partition by subtracting the minimum part (to pass from $\mathrm{GL}$ to $\mathrm{SL}$), then computes successive differences: $λ_i = μ_i - μ_{i+1}$.

source
Semisimple._lr_coefficientsFunction
_lr_coefficients(α::Vector{<:Integer}, β::Vector{<:Integer}, n::Integer) -> Dict{Vector{Int}, Int}

Compute all Littlewood–Richardson coefficients $c^ν_{αβ}$ for partitions α and β, where partitions have at most n parts.

Returns a dictionary mapping each partition ν (as Vector{Int}) to the LR coefficient $c^ν_{αβ}$.

The algorithm fills the skew shape $ν / α$ with content β row by row, enforcing:

  1. Semistandard: entries weakly increase along rows, strictly increase down columns.
  2. Ballot (lattice word) condition: reading the filling right-to-left, top-to-bottom, at every prefix the count of j ≥ count of j+1.

We enumerate valid fillings recursively row by row, which implicitly determines the partition ν.

source
Semisimple._n_tableauxFunction
_n_tableaux(λ::Vector{<:Integer}, l::Integer) -> BigInt

Compute the number of standard Young tableaux of shape $λ$ using the hook-length formula:

$f^λ = \frac{n!}{\prod_{\text{boxes}} h(b)}$

l is the number of (nonzero) rows.

source

Plethysm internals (Murnaghan–Nakayama)

Semisimple._classordFunction
_classord(κ::Vector{Int}) -> BigInt

Compute the size of the conjugacy class in $S_n$ corresponding to cycle type $κ$:

$|\mathrm{Cl}(κ)| = \frac{n!}{\prod_{k>0} k^{c_k(κ)} \cdot c_k(κ)!}$

where $c_k(κ)$ is the number of parts of $κ$ equal to $k$. The parts of κ must be in weakly decreasing order.

source
Semisimple._mn_char_valFunction
_mn_char_val(λ::Vector{Int}, μ::Vector{Int}) -> BigInt

Compute the irreducible $S_n$-character value $χ^λ(μ)$ using the Murnaghan–Nakayama rule.

Both λ and μ are partitions of $n$ with parts in weakly decreasing order.

The algorithm represents the Young diagram of $λ$ as a Maya diagram (edge sequence of horizontal/vertical steps), then recursively removes rim hooks of the sizes given by the parts of $μ$ (largest first).

source
Semisimple._mn_recurse!Function

Recursive Murnaghan–Nakayama: remove rim hooks of sizes μ[i], μ[i+1], … from the Maya diagram edge, tracking leg-length parity in k.

source