Inverse iterated-function-system fitting: gradient-optimize a small alphabet of affine maps so the depth-N orbit of the origin matches a target point cloud under a Chamfer metric.

Affine IFS Point-Set Fitting

Overview

We want to approximate a target point cloud in $\mathbb{R}^d$ by the orbit of the origin under all length-$N$ words in a small alphabet of affine transforms. The transforms are then optimized (e.g. via gradient descent) so that the generated orbit matches the target as closely as possible under a nearest-neighbor metric.

This is closely related to the inverse problem for iterated function systems (IFS), but restricted to a fixed depth $N$ and with an explicit (commutative-style) enumeration of all transform compositions.

1. Setup

1.1 Transforms

Let $K \in \mathbb{N}$ be the number of affine transforms. Each transform $T_k : \mathbb{R}^d \to \mathbb{R}^d$ has the form

$$ T_k(x) ;=; A_k, x ;+; b_k, \qquad A_k \in \mathbb{R}^{d \times d},; b_k \in \mathbb{R}^d. $$

Collect all parameters into

$$ \theta ;=; {(A_k, b_k)}_{k=1}^{K} ;\in; \mathbb{R}^{K(d^2 + d)}. $$

1.2 Generated point set

Fix a depth $N \in \mathbb{N}$ ("number of transforms applied"). For a multi-index (word) $w = (k_1, k_2, \dots, k_N) \in {1,\dots,K}^N$ define

$$ P_w(\theta) ;=; \bigl(T_{k_N} \circ T_{k_{N-1}} \circ \cdots \circ T_{k_1}\bigr)(0). $$

The full generated point set is

$$ \mathcal{P}_N(\theta) ;=; \bigl{, P_w(\theta) ;:; w \in \mathcal{W} ,\bigr}, $$

where $\mathcal{W}$ is the indexing set of words.

Two flavors of enumeration

Ordered words (non-commutative): $\mathcal{W} = {1,\dots,K}^N$, $|\mathcal{W}| = K^N$.
Commutative / multiset words: only the count of each transform matters, not the order. Then $$ \mathcal{W} ;=; \Bigl{ (n_1,\dots,n_K) \in \mathbb{Z}_{\ge 0}^K ;:; \sum_k n_k = N \Bigr}, \qquad |\mathcal{W}| = \binom{N + K - 1}{K - 1}. $$ This is exact when the $T_k$ commute (e.g. pure translations, or commuting diagonal maps); otherwise it is an approximation that collapses orbits by transform-multiset.

The idea.md note says "every possible combination of transforms (commutative)", which corresponds to flavor (2).

1.3 Target point set

Let

$$ \mathcal{Q} ;=; {q_1, \dots, q_M} \subset \mathbb{R}^d $$

be a given target point cloud.

2. Loss function

We measure the discrepancy between $\mathcal{P}_N(\theta)$ and $\mathcal{Q}$ via a (squared) nearest-neighbor distance.

Define the one-sided Chamfer terms:

$$ D_{P \to Q}(\theta) ;=; \frac{1}{|\mathcal{P}N|} \sum{p \in \mathcal{P}N(\theta)} \min{q \in \mathcal{Q}} ; |p - q|_2^2, $$

$$ D_{Q \to P}(\theta) ;=; \frac{1}{|\mathcal{Q}|} \sum_{q \in \mathcal{Q}} \min_{p \in \mathcal{P}_N(\theta)} ; |p - q|_2^2. $$

The symmetric Chamfer loss is

$$ \mathcal{L}(\theta) ;=; \alpha, D_{P \to Q}(\theta) ;+; \beta, D_{Q \to P}(\theta), \qquad \alpha, \beta \ge 0. $$

Common choice: $\alpha = \beta = 1$. Using only $D_{P \to Q}$ ("forward" only) tends to let the model collapse onto a subset of $\mathcal{Q}$; using only $D_{Q \to P}$ ("backward" only) tends to leave stray generated points. The symmetric form is preferred.

3. Optimization problem

$$ \theta^\star ;=; \arg\min_{\theta} ; \mathcal{L}(\theta). $$

3.1 Optional constraints / regularizers

To keep the system well-posed and the orbit bounded, we may add:

Contractivity: $|A_k|_2 \le 1 - \varepsilon$ for all $k$ (or a soft penalty $\sum_k \max(0, |A_k|_2 - (1-\varepsilon))^2$).
Parameter regularization: $\lambda_A \sum_k |A_k|_F^2 + \lambda_b \sum_k |b_k|_2^2$.
Diversity: penalize pairs $(T_i, T_j)$ that are too similar, e.g. $-\mu \sum_{i<j} |(A_i,b_i) - (A_j,b_j)|^2$ clipped.

4. Algorithm

Inputs:
    target points  Q      (M x d)
    num transforms K
    depth          N
    learning rate  eta
    iterations     T

Initialize:
    For k = 1..K:
        A_k <- small random matrix (e.g. 0.5 * I + noise)
        b_k <- small random vector

Enumerate words W:
    if commutative:
        W = all (n_1,...,n_K) with sum n_i = N         # multisets
    else:
        W = all (k_1,...,k_N) in {1..K}^N               # ordered

For t = 1..T:
    # Forward pass: build P_N(theta)
    P <- []
    for w in W:
        x <- 0
        for k_i in w:                                   # ordered case
            x <- A_{k_i} x + b_{k_i}
        P.append(x)

    # Loss
    For each p in P:    d_p <- min_q ||p - q||^2
    For each q in Q:    d_q <- min_p ||p - q||^2
    L <- mean(d_p) + mean(d_q)  (+ regularizers)

    # Backward pass
    Compute grad_theta L      (autodiff)
    theta <- theta - eta * grad_theta L

Return theta, P

4.1 Efficient commutative enumeration

In the commutative case, the orbit can be built incrementally:

Let $\mathcal{P}_N^{(c)} = {, T_1^{n_1} \cdots T_K^{n_K}(0) : \sum n_i = N ,}$ (interpreting the product in some fixed order, which is exact iff the $T_k$ commute). Then

$$ \mathcal{P}n^{(c)} ;=; \bigcup{k=1}^{K} T_k\bigl(\mathcal{P}_{n-1}^{(c)}\bigr), $$

with duplicates removed by multiset index. Implementation: index each point by $(n_1,\dots,n_K)$, compute it from any parent $(n_1,\dots,n_k-1,\dots,n_K)$ via $T_k$, and cache.

4.2 Nearest-neighbor computation

For small $|\mathcal{P}_N|, |\mathcal{Q}|$: full pairwise distance matrix, $O(|\mathcal{P}_N|,|\mathcal{Q}|,d)$, fully differentiable.
For larger sets: KD-tree / approximate NN for the index lookup, then recompute the chosen distances exactly so gradients still flow.

5. Notes and caveats

Commutativity assumption. Generic affine maps do not commute, so "every possible combination (commutative)" is only literally correct for commuting families (e.g. all translations, or all maps sharing a common eigenbasis). For general $T_k$, flavor (2) should be understood as "index orbits by transform-multiset and pick a canonical application order" — an explicit modeling choice.
Size blow-up. $|\mathcal{P}_N| = K^N$ (ordered) or $\binom{N+K-1}{K-1}$ (commutative). Both grow fast; keep $K, N$ small or subsample words.
Non-convexity. The loss is highly non-convex; use multiple random restarts, annealing of $N$ (start small, grow), or curriculum on $\mathcal{Q}$.
Relation to IFS. As $N \to \infty$ with contractive $T_k$, the orbit of $0$ densifies in the attractor of the IFS ${T_k}$. So this method is effectively fitting an IFS attractor to $\mathcal{Q}$, truncated at depth $N$.

Fractal Learning