Implementation details

This page covers performance considerations, caching mechanisms, precompilation, and other implementation details of Semisimple.jl.

Caching

Semisimple.jl uses several internal caches to avoid recomputing expensive results. Understanding these caches is important for benchmarking and memory management.

Semisimple.jl maintains ten internal caches. Six are unbounded Dict caches for small singletons and lookup tables; four are bounded LRU caches (from LRUCache.jl) whose total memory budget is configurable at runtime via configure_caches!.

Cache	Variable	Type	Purpose
Root system	`Semisimple._root_system_cache`	`Dict`	Singleton `RootSystem` instances per Dynkin type
Positive roots set	`Semisimple._positive_roots_set_cache`	`Dict`	Fast `is_positive_root` lookup sets
Longest Weyl element	`Semisimple._longest_element_cache`	`Dict`	Cached longest element `w₀` per Dynkin type
Coset representatives	`Semisimple._coset_reps_cache`	`Dict`	Weyl orbit coset reps for exceptional types
Dominant character (type)	`Semisimple._dominant_character_type_cache`	`Dict`	Type-level Freudenthal intermediates
Weyl dimension data	`Semisimple._weyl_dimension_data_cache`	`Dict`	Dimension formula denominator and scaled roots
Dominant character	`Semisimple._dominant_character_cache`	`LRU`	Dominant weight multiplicities from Freudenthal's formula
Tensor product	`Semisimple._tensor_cache`	`LRU`	Tensor product decompositions
Symmetric power	`Semisimple._symmetric_power_cache`	`LRU`	Symmetric power decompositions
Exterior power	`Semisimple._exterior_power_cache`	`LRU`	Exterior power decompositions

The six Dict caches are unbounded and persist for the lifetime of the Julia session. The four LRU caches have a configurable memory budget (default: 25 % of system RAM, minimum 256 MiB) and automatically evict least-recently-used entries when the budget is exceeded.

Why the dominant character cache matters

The dominant character cache is the main performance lever for downstream operations. Tensor products, symmetric/exterior powers, and plethysms call dominant_character repeatedly for the same highest weights.

Inspecting caches

Use cache_info to get a snapshot of cache occupancy:

using Semisimple

# Snapshot before any work
info = cache_info()
println("Tensor cache: ", info.tensor.length, " entries (max ", info.tensor.maxsize, " bytes)")

# Populate some caches by doing computations
ω₁ = fundamental_weight(TypeE{8}, 1)
freudenthal_formula(ω₁)
tensor_product(ω₁, ω₁)

# Snapshot after
info = cache_info()
println("Dominant character cache: ", info.dominant_character.length, " entries")
println("Tensor cache: ", info.tensor.length, " entries")

Clearing caches

Use clear_caches! (or its alias clear_all_caches!) to empty every cache at once:

using Semisimple

# Do some computations
ω₁ = fundamental_weight(TypeA{2}, 1)
tensor_product(ω₁, ω₁)
freudenthal_formula(ω₁)
symmetric_power(ω₁, 3)

# Clear everything
clear_caches!()

This is particularly useful for:

Benchmarking cold-start performance — measure how long operations take without cached results
Memory management — free memory after large computations (e.g., after computing many E₈ tensor products)
Reproducible testing — ensure tests start from a clean state

Individual cache variables have underscored names and are internal implementation details. Prefer the public clear_caches! and configure_caches! APIs.

Configuring cache budgets

Use configure_caches! to resize the LRU caches at runtime. The budget (in bytes) controls the total memory envelope; the four fraction arguments determine how it is divided:

using Semisimple

# Give caches 512 MiB total
configure_caches!(budget = 512 * 1024^2)

# Custom split: 50 % tensor, 30 % dominant, 10 % each for Sym/⋀
configure_caches!(
  budget = 512 * 1024^2,
  dominant_frac = 0.30,
  tensor_frac = 0.50,
  sym_power_frac = 0.10,
  ext_power_frac = 0.10,
)

Default fractions: dominant 30 %, tensor 40 %, symmetric 15 %, exterior 15 %. The default total budget is 25 % of system RAM (minimum 256 MiB). These defaults can also be set persistently via Julia's Preferences.jl (keys: cache_budget, dominant_frac, tensor_frac, sym_power_frac, ext_power_frac).

Cache invalidation

Caches are never invalidated by code changes — all cached functions are pure (same inputs always produce same outputs). However, cached entries can disappear in three ways:

You explicitly clear a cache (via clear_caches! or empty!(...))
An LRU cache evicts least-recently-used entries when its memory budget is exceeded
Your Julia session ends

Automatic eviction only affects the four bounded LRU caches. The six unbounded Dict caches persist until cleared or session end.

This design is safe because:

Dynkin types are immutable compile-time constants
Weights are immutable SVector objects
All cached functions are pure — re-computing an evicted entry always gives the same result

Precompilation

Semisimple.jl precompiles many commonly-used methods to reduce first-call latency. When you load the package with using Semisimple, the precompilation work has already been done.

What gets precompiled

The package precompiles the following operations for all simple Dynkin types up to rank 10 (plus the exceptional types):

Dynkin types precompiled:

TypeA{1} through TypeA{10}
TypeB{2} through TypeB{10}
TypeC{2} through TypeC{10}
TypeD{3} through TypeD{10}
TypeE{6}, TypeE{7}, TypeE{8}
TypeF4
TypeG2

Operations precompiled:

cartan_matrix, cartan_symmetrizer, cartan_bilinear_form, cartan_matrix_inverse
_make_root_system (internal root system construction)
_weyl_denominator, _weyl_dim_scaled_roots (Weyl dimension formula internals)
degree (representation dimension)
conjugate_dominant_weight (dominant weight conjugation)
conjugate_dominant_weight_with_length (Borel–Weil–Bott hot path)
weyl_orbit (Weyl orbit generation)
Weyl group actions (* operator for roots and weights with Weyl elements)
freudenthal_formula (weight multiplicities)
dot_reduce (weight normalization)
tensor_product (Brauer–Klimyk; Littlewood–Richardson rule for Type A)

Keeping precompilation affordable

Covering 41 Dynkin types would be very expensive if every method were compiled separately per type. Instead, the heavy numeric kernels (the Freudenthal recursion, the Weyl dimension formula, dominant-weight enumeration, and the Weyl-orbit traversal) are parametrized by the rank only, so for example A₇, B₇, C₇, D₇, and E₇ all share a single compiled kernel. Only thin per-type wrappers remain, which keeps both precompilation time and the package-image size in check.

To skip the precompile workload entirely (e.g. during development or in CI jobs that never call into the numeric routines), set the package preference:

using Preferences, Semisimple
set_preferences!(Semisimple, "precompile_workload" => false)

The package then precompiles in a couple of seconds; all functionality remains available, with compilation happening lazily at first use.

Why precompilation matters

Without precompilation, the first call to a method triggers just-in-time (JIT) compilation, which can take hundreds of milliseconds. With precompilation, these methods are ready to use immediately:

using Semisimple

# First call is fast due to precompilation
@time degree(fundamental_weight(TypeE{8}, 1))  # ~0.001s

# Without precompilation, this would take ~0.5s for the first call

What is NOT precompiled

Operations involving:

Product Dynkin types (e.g., ProductDynkinType{Tuple{TypeA{2}, TypeB{3}}})
Rank ≥ 11 simple types (e.g., TypeA{15})

These will experience first-call latency but will be fast on subsequent calls (after JIT compilation). Because the numeric kernels are shared per rank, the first-call latency for an uncovered type is limited to its thin wrappers.

Performance characteristics

Compile-time vs. run-time

Semisimple.jl leverages Julia's type system and @generated functions to move selected computations to compile time, while larger derived data is computed once at runtime and cached per Dynkin type (keeping precompilation cheap):

Compile-Time (Type-Level)	Run-Time (cached per type)	Run-Time
Dynkin type classification	Root system enumeration	Weight coordinate values
Rank of Dynkin type	Weyl denominator product	Weight lattice arithmetic
Cartan matrix entries	Reflection tables	Weyl orbit traversal
Cartan symmetrizer, bilinear forms	Coset representatives	Freudenthal recursion
		Character multiplication

This means that cartan_matrix(TypeE{8}) produces a compile-time constant SMatrix that is embedded directly into your compiled code — there's no matrix allocation at runtime. Root systems, by contrast, are built by a single shared routine on first use and memoized, so the construction cost (microseconds) is paid once per session.

Memory usage

Operation	Memory Footprint	Notes
`RootSystem{TypeE{8}}`	~15 KB	Singleton, cached per type
`WeightLatticeElem`	8R bytes	R = rank; stored as `SVector{R,Int}`
`WeylGroupElem`	~40 + L bytes	Word stored as `Vector{UInt8}`; L = word length
`WeylCharacter`	~24 + 40N bytes	N = number of terms in the character
Freudenthal cache (E₈ adjoint)	~40 KB	3,875 weight multiplicities

For large-scale computations (e.g., thousands of E₈ tensor products), the character-related LRU caches will automatically evict old entries once their memory budget is reached. Use configure_caches! to increase the budget, or clear_caches! to free memory immediately.

Asymptotic complexity

Operation	Time Complexity	Notes
`degree(λ)`	O(N²)	N = number of positive roots
`freudenthal_formula(λ)`	O(M·N)	M =
`tensor_product(λ, μ)` (BK)	O(M·W·d)	W = Weyl group order, d = dim V(smaller weight)
`tensor_product(λ, μ)` (LR, Type A)	O(n³)	n = max(
`symmetric_power(λ, k)`	O(k²·T)	T = cost of one tensor product
`weyl_orbit(λ)`	O(W·R·R)	W = orbit size ≤ Weyl order, R = rank

For reproducible performance measurements, see the benchmark scripts in benchmark/.

Type stability

Semisimple.jl is designed for complete type stability:

using Semisimple

ω₁ = fundamental_weight(TypeE{8}, 1)
typeof(ω₁)  # WeightLatticeElem{TypeE{8}, 8} — concrete type

ch = freudenthal_formula(ω₁)
typeof(ch)  # Dict{SVector{8, Int64}, BigInt} — concrete type

result = tensor_product(ω₁, ω₁)
typeof(result)  # WeylCharacter{TypeE{8}, 8} — concrete type

All public APIs return concrete types, enabling aggressive compiler optimizations. There are no type instabilities in hot paths.

Numerical precision

All computations use exact integer arithmetic — there are no floating-point operations:

Weights are SVector{R, Int} — exact integer vectors
Irreducible multiplicities in a WeylCharacter are Int; weight multiplicities from the Freudenthal recursion are BigInt (they can exceed typemax(Int64) for large representations)
Dimensions are computed exactly (the Weyl dimension formula works in BigInt)
Inner products use scaled integer forms to avoid division

This means:

No numerical stability concerns — safe for arbitrarily large representations
No overflow — dimensions and weight multiplicities are BigInt throughout

Example: the irreducible representation of E₈ with highest weight ρ has dimension $2^{120}$, far beyond typemax(Int64) ≈ 9.2 × 10^{18}:

julia> using Semisimple

julia> ρ = weyl_vector(TypeE{8});

julia> degree(ρ)
1329227995784915872903807060280344576

julia> degree(ρ) == big(2)^120
true

Thread safety

Cache thread-safety guarantees are limited

The small Dict singleton/type-data caches are protected by locks where they are populated, except that lookups do not take the lock — so a reader racing a first-time writer is unsynchronized. The bounded LRU character caches are internally locked by LRUCache.jl, so concurrent cache-populating calls such as dominant_character or tensor_product cannot corrupt them; at worst two threads compute the same uncached entry twice. Cached dictionaries are returned by reference and must not be mutated.

Safe: Using Semisimple.jl from a single thread (the default)

Safe: Read-only operations from multiple threads after warming up caches

Unsafe: Calling clear_caches! or configure_caches! while other threads are computing

The recommended pattern for parallel computation remains: populate caches in a single-threaded warm-up phase, then perform read-only operations in parallel.

Comparison with LiE

Semisimple.jl reimplements many algorithms from the LiE computer algebra system. Key differences:

Aspect	LiE (C)	Semisimple.jl (Julia)
Language	C (CWEB literate programming)	Julia (pure Julia)
Type system	Runtime `group` structs	Compile-time Dynkin type parameters
Cartan matrices	Runtime matrix allocation	Compile-time `SMatrix` constants
Caching	Permanent "long-life" objects	Bounded `LRU` caches + `Dict` singletons
Hot performance	Fast (compiled C)	Fast (JIT-compiled, with caching)
Cold performance	Instant (no compilation)	Slow first call (JIT overhead)

For hot operations (cached, precompiled), Semisimple.jl matches or exceeds LiE's performance. For cold operations, LiE is faster due to no JIT compilation delay.

Implementation philosophy

Semisimple.jl follows these design principles:

Type-level dispatch at the API surface — Use Julia's type system to specialize the user-facing layer for each Dynkin type, while the heavy numeric kernels are shared per rank so they are compiled only once for e.g. A₇/B₇/C₇/D₇/E₇
Compile-time constants where they pay — @generated Cartan data; larger derived data (root systems, Weyl dimension data) is computed once at runtime and memoized
Immutability — All core types are immutable for thread safety and optimization
Caching — Trade memory for speed by memoizing expensive computations
Minimal dependencies — StaticArrays.jl, LRUCache.jl, PrecompileTools.jl, Preferences.jl, and LinearAlgebra (stdlib)
Pure Julia — No C/Fortran, enabling introspection and compilation to other targets

These principles enable aggressive compiler optimizations while maintaining mathematical rigor.

Weyl orbit traversal

Weyl orbits are computed by the internal module Weylloop.jl, which implements LiE-style systematic orbit traversal. Rather than a hash-set BFS that scales with orbit size, it converts weight coordinates to the ε-basis where classical Weyl subgroups act as permutations (type A) or permutations + sign flips (types B/C/D). Orbits are enumerated via lexicographic permutation generation and Gray-code sign flips, eliminating the $O(|\text{orbit}|)$ hash-set overhead that would otherwise dominate for large orbits (e.g., E₈ orbits with millions of elements). For exceptional types, precomputed coset representatives reduce the problem to the classical case.

API reference

Semisimple.clear_all_caches! — Function

clear_all_caches!()

Clear all internal caches used by Semisimple.jl. Alias for clear_caches!.

Examples

julia> using Semisimple

julia> clear_all_caches!()

Key	Type	Default
`cache_budget`	`Int`	25% of RAM (≥256 MiB)
`dominant_frac`	`Float64`	0.30
`tensor_frac`	`Float64`	0.40
`sym_power_frac`	`Float64`	0.15
`ext_power_frac`	`Float64`	0.15