HomeCompanion pagesFractal Learning

Fractal Learning

Inverse iterated-function-system fitting: gradient-optimize a small alphabet of affine maps so the depth-N orbit of the origin matches a target point cloud under a Chamfer metric.

Open the interactive lab

Affine IFS Point-Set Fitting

Overview

We want to approximate a target point cloud in $\mathbb{R}^d$ by the orbit of the origin under all length-$N$ words in a small alphabet of affine transforms. The transforms are then optimized (e.g. via gradient descent) so that the generated orbit matches the target as closely as possible under a nearest-neighbor metric.

This is closely related to the inverse problem for iterated function systems (IFS), but restricted to a fixed depth $N$ and with an explicit (commutative-style) enumeration of all transform compositions.


1. Setup

1.1 Transforms

Let $K \in \mathbb{N}$ be the number of affine transforms. Each transform $T_k : \mathbb{R}^d \to \mathbb{R}^d$ has the form

$$ T_k(x) ;=; A_k, x ;+; b_k, \qquad A_k \in \mathbb{R}^{d \times d},; b_k \in \mathbb{R}^d. $$

Collect all parameters into

$$ \theta ;=; {(A_k, b_k)}_{k=1}^{K} ;\in; \mathbb{R}^{K(d^2 + d)}. $$

1.2 Generated point set

Fix a depth $N \in \mathbb{N}$ ("number of transforms applied"). For a multi-index (word) $w = (k_1, k_2, \dots, k_N) \in {1,\dots,K}^N$ define

$$ P_w(\theta) ;=; \bigl(T_{k_N} \circ T_{k_{N-1}} \circ \cdots \circ T_{k_1}\bigr)(0). $$

The full generated point set is

$$ \mathcal{P}_N(\theta) ;=; \bigl{, P_w(\theta) ;:; w \in \mathcal{W} ,\bigr}, $$

where $\mathcal{W}$ is the indexing set of words.

Two flavors of enumeration

  1. Ordered words (non-commutative): $\mathcal{W} = {1,\dots,K}^N$, $|\mathcal{W}| = K^N$.

  2. Commutative / multiset words: only the count of each transform matters, not the order. Then $$ \mathcal{W} ;=; \Bigl{ (n_1,\dots,n_K) \in \mathbb{Z}_{\ge 0}^K ;:; \sum_k n_k = N \Bigr}, \qquad |\mathcal{W}| = \binom{N + K - 1}{K - 1}. $$ This is exact when the $T_k$ commute (e.g. pure translations, or commuting diagonal maps); otherwise it is an approximation that collapses orbits by transform-multiset.

The idea.md note says "every possible combination of transforms (commutative)", which corresponds to flavor (2).

1.3 Target point set

Let

$$ \mathcal{Q} ;=; {q_1, \dots, q_M} \subset \mathbb{R}^d $$

be a given target point cloud.


2. Loss function

We measure the discrepancy between $\mathcal{P}_N(\theta)$ and $\mathcal{Q}$ via a (squared) nearest-neighbor distance.

Define the one-sided Chamfer terms:

$$ D_{P \to Q}(\theta) ;=; \frac{1}{|\mathcal{P}N|} \sum{p \in \mathcal{P}N(\theta)} \min{q \in \mathcal{Q}} ; |p - q|_2^2, $$

$$ D_{Q \to P}(\theta) ;=; \frac{1}{|\mathcal{Q}|} \sum_{q \in \mathcal{Q}} \min_{p \in \mathcal{P}_N(\theta)} ; |p - q|_2^2. $$

The symmetric Chamfer loss is

$$ \mathcal{L}(\theta) ;=; \alpha, D_{P \to Q}(\theta) ;+; \beta, D_{Q \to P}(\theta), \qquad \alpha, \beta \ge 0. $$

Common choice: $\alpha = \beta = 1$. Using only $D_{P \to Q}$ ("forward" only) tends to let the model collapse onto a subset of $\mathcal{Q}$; using only $D_{Q \to P}$ ("backward" only) tends to leave stray generated points. The symmetric form is preferred.


3. Optimization problem

$$ \theta^\star ;=; \arg\min_{\theta} ; \mathcal{L}(\theta). $$

3.1 Optional constraints / regularizers

To keep the system well-posed and the orbit bounded, we may add:


4. Algorithm

Inputs:
    target points  Q      (M x d)
    num transforms K
    depth          N
    learning rate  eta
    iterations     T

Initialize:
    For k = 1..K:
        A_k <- small random matrix (e.g. 0.5 * I + noise)
        b_k <- small random vector

Enumerate words W:
    if commutative:
        W = all (n_1,...,n_K) with sum n_i = N         # multisets
    else:
        W = all (k_1,...,k_N) in {1..K}^N               # ordered

For t = 1..T:
    # Forward pass: build P_N(theta)
    P <- []
    for w in W:
        x <- 0
        for k_i in w:                                   # ordered case
            x <- A_{k_i} x + b_{k_i}
        P.append(x)

    # Loss
    For each p in P:    d_p <- min_q ||p - q||^2
    For each q in Q:    d_q <- min_p ||p - q||^2
    L <- mean(d_p) + mean(d_q)  (+ regularizers)

    # Backward pass
    Compute grad_theta L      (autodiff)
    theta <- theta - eta * grad_theta L

Return theta, P

4.1 Efficient commutative enumeration

In the commutative case, the orbit can be built incrementally:

Let $\mathcal{P}_N^{(c)} = {, T_1^{n_1} \cdots T_K^{n_K}(0) : \sum n_i = N ,}$ (interpreting the product in some fixed order, which is exact iff the $T_k$ commute). Then

$$ \mathcal{P}n^{(c)} ;=; \bigcup{k=1}^{K} T_k\bigl(\mathcal{P}_{n-1}^{(c)}\bigr), $$

with duplicates removed by multiset index. Implementation: index each point by $(n_1,\dots,n_K)$, compute it from any parent $(n_1,\dots,n_k-1,\dots,n_K)$ via $T_k$, and cache.

4.2 Nearest-neighbor computation


5. Notes and caveats