Training

Loss Functions

ParametricDFT.MSELossType

MSE loss with top-k truncation: ||x - T⁻¹(truncate(T(x), k))||². Field k is the number of kept coefficients.

source
ParametricDFT._topk_maskMethod
_topk_mask(x::AbstractMatrix, k::Int) -> BitMatrix

Compute a boolean mask selecting the k elements of x with largest absolute value. Uses quickselect (O(n) average) to find the threshold, then a broadcast comparison.

source
ParametricDFT.batched_topk_truncateMethod
batched_topk_truncate(x_batched::AbstractArray{T,N}, m::Int, n::Int, k::Integer)

Apply per-image top-k truncation to a batched frequency-domain tensor of shape (2, 2, …, 2, B) (with m + n qubit dims). Returns a tensor of the same shape with all but the k largest-magnitude entries of each image zeroed.

The mask is content-dependent — each image can keep a different set of coefficients — so on CPU this falls back to a per-image loop. GPU specialisations in ext/CUDAExt.jl compute all B masks in a single sort call.

source
ParametricDFT.loss_functionMethod
loss_function(tensors, m, n, optcode, pic::AbstractMatrix, loss; inverse_code=nothing)

Compute loss for a single image pic (2^m x 2^n) under the given circuit parameters.

source
ParametricDFT.loss_functionMethod
loss_function(tensors, m, n, optcode, pics::Vector{<:AbstractMatrix}, loss; inverse_code=nothing, batched_optcode=nothing)

Average loss over a batch of images. Uses batched einsum if batched_optcode is provided.

source
ParametricDFT.topk_truncateMethod
topk_truncate(x::AbstractMatrix, k::Integer)

Magnitude-based top-k truncation: keeps the k coefficients with largest absolute value, zeroing the rest. This is basis-agnostic — it does not assume any particular frequency layout.

source

Manifolds

ParametricDFT._make_identity_batchMethod
_make_identity_batch(::Type{T}, d::Int, n::Int) -> Array{T,3}

Create a (d, d, n) array of identity matrices. Used by optimizers for Cayley retraction pre-allocation and as fallback in retract(::UnitaryManifold, ...).

source
ParametricDFT.batched_invMethod
batched_inv(A::AbstractArray{T,3})

Batched matrix inverse: C[:,:,k] = inv(A[:,:,k]) for each slice k. Uses LU factorization for general matrices.

source
ParametricDFT.batched_matmulMethod
batched_matmul(A::AbstractArray{T,3}, B::AbstractArray{T,3})

Batched matrix multiply: C[:,:,k] = A[:,:,k] * B[:,:,k] for each slice k.

source
ParametricDFT.projectFunction
project(m::AbstractRiemannianManifold, points, euclidean_grads)

Project Euclidean gradients onto the tangent space at points. Batched over last dim.

source
ParametricDFT.retractFunction
retract(m::AbstractRiemannianManifold, points, tangent_vec, α)

Retract from points along tangent_vec with step size α. Batched over last dim.

source
ParametricDFT.retractMethod

Batched Cayley retraction on U(n): (I - α/2·W)⁻¹(I + α/2·W)·U where W = Ξ·U'. Pass I_batch to reuse a pre-allocated identity tensor and avoid repeated allocations.

source
ParametricDFT.transportFunction
transport(m::AbstractRiemannianManifold, old_points, new_points, vec)

Parallel transport vec from old_points to new_points. Batched over last dim.

source

Optimizers

ParametricDFT.OptimizationStateType
OptimizationState{ET, RT}

Bundles shared loop state built by _common_setup. Holds manifold groupings, batched point/gradient buffers, the identity-batch cache for Cayley retraction, and a per-tensor Euclidean-gradient buffer that is reused across iterations.

source
ParametricDFT._batched_projectMethod
_batched_project(manifold_groups, point_batches, grad_buf_batches, euclidean_grads)

Batched Riemannian projection. Returns (rg_batches, grad_norm).

source
ParametricDFT._common_setupMethod
_common_setup(tensors)

Build an OptimizationState from the initial tensor list. Groups tensors by manifold, stacks into batched arrays, allocates gradient buffers, and creates identity-batch caches for UnitaryManifold groups via _make_identity_batch.

source
ParametricDFT._compute_gradients!Method
_compute_gradients!(buf, grad_fn, tensors)

Compute Euclidean gradients via grad_fn, writing into the pre-allocated buf::Vector{AbstractMatrix} (typically state.euclidean_grads_buf) to avoid per-iteration wrapper allocation. Returns buf on success, nothing on NaN/Inf (after logging which tensor carried the non-finite value).

source
ParametricDFT._compute_gradientsMethod
_compute_gradients(grad_fn, tensors)

Allocating wrapper used outside the main optimization loop. Prefer _compute_gradients! inside the loop, where a buffer is already available.

source
ParametricDFT._init_optimizer_stateMethod
_init_optimizer_state(opt::AbstractRiemannianOptimizer, state::OptimizationState)

Initialize per-optimizer state. Returns nothing for GD, a NamedTuple of moment/direction buffers for Adam.

source
ParametricDFT._optimization_loopMethod
_optimization_loop(opt, tensors, loss_fn, grad_fn; max_iter, tol, loss_trace)

Shared optimization loop. Delegates setup/gradient/convergence logic once, then calls _update_step! for per-optimizer behavior each iteration. Returns the optimized tensor vector.

source
ParametricDFT._update_step!Method
_update_step!(opt, state, rg_batches, loss_fn, grad_norm_sq, opt_state, iter; cached_loss)

Per-optimizer update dispatch. Returns cached_loss::RT (NaN when not evaluated).

  • RiemannianGD: Armijo backtracking line search, evaluates loss multiple times. Returns the accepted candidate loss, or RT(NaN) after exhausting line-search steps.
  • RiemannianAdam: moment update + retract, does not evaluate loss. Returns RT(NaN).
source
ParametricDFT.optimize!Method
optimize!(opt::AbstractRiemannianOptimizer, tensors, loss_fn, grad_fn; max_iter=100, tol=1e-6, loss_trace=nothing)

Run Riemannian optimization on circuit tensors. Dispatches to _optimization_loop which uses per-optimizer hooks (_init_optimizer_state, _update_step!). Returns optimized tensors.

When loss_trace::Vector{Float64} is provided, per-iteration losses are appended to it.

source

Training Pipeline

ParametricDFT._cosine_with_warmupMethod
_cosine_with_warmup(step, total_steps; warmup_frac, lr_peak, lr_final)

Linear warmup followed by cosine decay. step is 0-indexed global step; warmup_frac ∈ (0, 1) sets the warmup portion of total steps.

source
ParametricDFT._train_basis_coreMethod

Core training loop shared by all basis types. Returns (final_tensors, best_val_loss, train_losses, val_losses, step_train_losses). Uses optimize! from optimizers.jl for all optimization (GPU and CPU). Supports optimizers: RiemannianGD(), RiemannianAdam(), or symbols :gradient_descent, :adam.

source
ParametricDFT.train_basisMethod
train_basis(::Type{B}, dataset; m, n, loss, epochs, steps_per_image,
            optimizer, batch_size, device, ...)

Train any AbstractSparseBasis subtype on images. Returns (basis, history). Basis-specific kwargs (e.g. phases, entangle_phases) are forwarded to _init_circuit and _build_basis.

source

Einsum Cache

ParametricDFT.optimize_code_cachedFunction
optimize_code_cached(flat_code, size_dict, optimizer=TreeSA())

Like optimize_code(flat_code, size_dict, optimizer) but caches the result to disk. On cache hit, returns immediately without running the optimizer.

source