The math · in math notation
What the team has proved
Every theorem the team has formalized, in standard mathematical notation with diagrams. No code — just math, the way you'd see it in a textbook.
1. The composition idea — where matrices come from
The textbook starts here: a linear map $T : V \to W$ between vector spaces is an arrow, and arrows compose. If $S : W \to U$ is a second linear map, you can do $T$ first and then $S$, producing a new arrow $S \circ T : V \to U$. That's all.
Now choose bases: $b_V = \{b_1, \ldots, b_n\}$ for $V$, $b_W = \{w_1, \ldots, w_p\}$ for $W$, $b_U = \{u_1, \ldots, u_m\}$ for $U$. A linear map $T$ is fully determined by what it does to the basis vectors: each $T(b_j)$ is a vector in $W$, expressible as a column of $p$ scalars in $b_W$. Write those columns side-by-side and you have a $p \times n$ matrix: $$[T]_{b_W \leftarrow b_V} \in \R^{p \times n}, \qquad \text{column } j = \text{coordinates of } T(b_j) \text{ in } b_W.$$ Same for $S$: a $m \times p$ matrix. The question is then: what matrix represents $S \circ T$, and what's its formula in terms of the matrices of $S$ and $T$?
2. Why matrix multiplication is the formula it is
This is the load-bearing derivation. Compute $(S \circ T)(b_j)$ in two steps, expanding through the bases:
$$ \begin{aligned} (S \circ T)(b_j) &= S\bigl(T(b_j)\bigr) \\ &= S\!\left(\textstyle\sum_k B_{kj}\, w_k\right) && \text{[expand $T(b_j)$ in $b_W$]} \\ &= \textstyle\sum_k B_{kj}\, S(w_k) && \text{[linearity of $S$]} \\ &= \textstyle\sum_k B_{kj}\, \bigl(\sum_i A_{ik}\, u_i\bigr) && \text{[expand $S(w_k)$ in $b_U$]} \\ &= \textstyle\sum_i \!\left(\sum_k A_{ik}\, B_{kj}\right) u_i. && \text{[swap the sums]} \end{aligned} $$
The coefficient of $u_i$ is the $(i, j)$ entry of the matrix of $S \circ T$. So the matrix of the composite is the matrix whose $(i, j)$ entry is $\sum_k A_{ik} B_{kj}$. That's the row-times-column formula. It is forced.
For any choice of bases $b_V$, $b_W$, $b_U$ and linear maps $T : V \to W$, $S : W \to U$: $$\bigl[\, S \circ T \,\bigr]_{b_U \leftarrow b_V} \;=\; \bigl[\, S \,\bigr]_{b_U \leftarrow b_W} \cdot \bigl[\, T \,\bigr]_{b_W \leftarrow b_V}.$$ Equivalently, entry-wise: $$\bigl(\,[S \circ T]\,\bigr)_{ij} \;=\; \sum_{k=1}^{p} A_{ik}\, B_{kj}.$$
A 2×2 worked product, with the row–column highlight
To see the formula in action, take $A = \begin{pmatrix} 2 & 3 \\ 1 & 4 \end{pmatrix}$ and $B = \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix}$. The $(1, 2)$ entry of $AB$ is row 1 of $A$ dotted with column 2 of $B$:
$(AB)_{12} = 2 \cdot 6 + 3 \cdot 8 = 12 + 24 = 36$. Highlighted row of $A$ times highlighted column of $B$.
3. Why $AB \ne BA$ — concrete witness
Function composition isn't commutative. "First put on socks, then shoes" is not the same as "first put on shoes, then socks." Matrix multiplication inherits exactly that.
Let
$$A = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}, \qquad B = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}.$$
Then by direct computation,
$$AB = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}, \qquad BA = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} = A. $$
So $AB \ne BA$. The proof in Lean is one tactic (decide), since both sides reduce to canonical integer matrices.
4. Algebra laws as one-line corollaries
Every algebraic law of matrices follows from the corresponding fact about linear maps, plus §6.3. The chain is short enough that you can prove it by pointing at it.
Function level
Composition is associative: $$(S \circ T) \circ R = S \circ (T \circ R).$$ Sums distribute on the right: $$S \circ (T_1 + T_2) = S \circ T_1 + S \circ T_2.$$ On the left: $$(S_1 + S_2) \circ T = S_1 \circ T + S_2 \circ T.$$ The identity composes trivially: $$\mathrm{id} \circ T = T = T \circ \mathrm{id}.$$
Matrix level (apply §6.3)
Matrix multiplication is associative: $$(AB)C = A(BC).$$ Right-distributive: $$A(B_1 + B_2) = AB_1 + AB_2.$$ Left-distributive: $$(A_1 + A_2)B = A_1B + A_2B.$$ Identity: $$I A = A = A I.$$
The Lean proofs are 5–6 line rewrite chains — mathlib has direct one-liners, but the team took the longer route deliberately to keep the prose-level argument visible in the formal proof.
5. Change of basis — same map, different coordinates
The same linear map $T : V \to V$ is represented by different matrices in different bases. The relationship between them is conjugation by the change-of-basis matrix $P$:
For any two bases $B, B'$ of $V$ and any linear map $T : V \to V$: $$[T]_{B'} \;=\; P^{-1}\,[T]_B\,P, \qquad \text{where} \quad P = [\mathrm{id}]_{B \leftarrow B'}.$$
Two invariants that survive — and why eigenvalues matter
If $A' = P^{-1} A P$, then $\det A' = \det A$ and $\mathrm{tr}\, A' = \mathrm{tr}\, A$. Same for the characteristic polynomial. So determinant, trace, and the eigenvalues are properties of the underlying linear map, not coordinate artifacts. That's why eigenvalues matter: they're the basis-independent quantities. Anything that doesn't survive similarity is bookkeeping.
A 2×2 example, formalized: the map $T(x, y) = (2x + y,\, x + 2y)$ has matrix $\begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$ in the standard basis. Change to the eigenbasis $\{(1, 1),\, (1, -1)\}$ and the same map becomes diagonal: $$\begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} \;\xrightarrow{\;P^{-1} \cdot\, \cdot\, P\;}\; \begin{pmatrix} 3 & 0 \\ 0 & 1 \end{pmatrix}.$$ Same map. Different coordinates. The eigenbasis is the basis where $T$ acts independently on each axis.
6. EML — the operator and its identities
Andrzej Odrzywołek (Jagiellonian University, April 2026) defined a single binary operator and claimed every elementary function on $\R$ can be expressed as a finite tree built from this operator and the constant $1$. The operator is:
Sanity check at the canonical input: $\eml(0, 1) = e^0 - \ln 1 = 1 - 0 = 1$. So you can produce $1$ from itself. That's enough to bootstrap everything else.
$e^x = \eml(x, 1)$
$\ln x \;=\; \eml\bigl(1,\;\eml(\eml(1,\,x),\,1)\bigr)$
7. Building $-,\,+,\,\times$ from EML — and why the domains shrink
Now you can compose. From $\exp$ and $\ln$ — which are themselves $\eml$-trees — you can build subtraction, negation, addition, and multiplication. Each construction is just substitution: replace each named function with its $\eml$-tree, expand. The trees grow.
The depth grows fast, but that's not the interesting part. The interesting part is that each operation requires its inputs to satisfy a tighter constraint than its components do. Where $\ln x$ wants $x > 0$, the construction $x - y$ inherits that. Where the negation $(e - x) - e$ wants the inner $e - x > 0$, it pushes the constraint $x < e$ onto its input. Building $x + y$ on top of negation tightens even further.
sub domain
neg domain
add domain
$\ln x$: $x > 0$
$x - y$: $x > 0$
$-x$: $x < e$
$x + y$: $x > 0$ and $y < e$
$x \cdot y$: $x > 1$ and $y < e^e$
This is the structural worry: even if every operation works on some region, the obvious tree for any particular function might fail on the rest of $\R$. That worry has a name — the obstruction — and the team formalized one concrete instance.
8. EML's first concrete obstruction
Take negation. The obvious EML construction is $\mathrm{neg}(x) = (e - x) - e$, which agrees with $-x$ when $x < e$. What happens at $x = e + 1$?
The inner $\mathrm{sub}(e, x)$ correctly computes $e - (e+1) = -1$. The outer $\mathrm{sub}(\,\cdot\,,\, e)$ then needs to compute $\mathrm{eml}(\ln(-1),\, e^e)$. But $\ln$ on a negative number is mathlib's junk value $0$ — the convention extends $\ln$ to be $0$ on $(-\infty, 0]$, so the formula evaluates without error but produces the wrong number.
Plugging in: $$\mathrm{neg}(e + 1) \;=\; \eml(0, e^e) \;=\; e^0 - \ln(e^e) \;=\; 1 - e \;\approx\; -1.72.$$ The expected value was $-(e + 1) \approx -3.72$. They differ by $2$.
$$\exists\, x \in \R \;:\quad \mathrm{neg}(x) \ne -x. \qquad \text{Witness: } x = e + 1.$$
9. The MinimalBasis framework — Tier 2 in math notation
The team didn't stop at the EML obstruction. They generalized. MinimalBasis.lean defines an abstract framework that any "small set of operators" universality question can instantiate.
An operator basis over a carrier set $\alpha$ is a triple $(\mathcal{O}, \mathrm{ar}, \llbracket - \rrbracket)$ where:
- $\mathcal{O}$ is a set of operator symbols,
- $\mathrm{ar} : \mathcal{O} \to \N$ assigns each operator an arity,
- $\llbracket - \rrbracket : \prod_{o \in \mathcal{O}} \alpha^{\mathrm{ar}(o)} \to \alpha$ interprets each operator as a function on $\alpha$.
For an operator basis $B$ over $\alpha$ and an arity $n$, the set of terms $\mathrm{Term}_B(n)$ is the smallest set such that:
$\dfrac{c \in \alpha}{\;\mathrm{const}(c) \in \mathrm{Term}_B(n)\;}\quad\text{(constants)}$ $\qquad$ $\dfrac{i \in \{1, \ldots, n\}}{\;\mathrm{var}(i) \in \mathrm{Term}_B(n)\;}\quad\text{(variables)}$
$\dfrac{o \in \mathcal{O}, \;\; t_1, \ldots, t_{\mathrm{ar}(o)} \in \mathrm{Term}_B(n)}{\;\mathrm{app}(o; t_1, \ldots, t_{\mathrm{ar}(o)}) \in \mathrm{Term}_B(n)\;}\quad\text{(application)}$
A term $T \in \mathrm{Term}_B(n)$ globally represents a function $f : \alpha^n \to \alpha$ iff $\llbracket T \rrbracket(\vec x) = f(\vec x)$ for every $\vec x \in \alpha^n$.
An obstruction theory over $B$ is a family of properties $\mathcal{C}_n \subseteq (\alpha^n \to \alpha)$ such that:
(C1) Every constant function lies in $\mathcal{C}_n$.
(C2) Every projection $\pi_i : \vec x \mapsto x_i$ lies in $\mathcal{C}_n$.
(C3) If $f_1, \ldots, f_{\mathrm{ar}(o)} \in \mathcal{C}_n$, then $\vec x \mapsto \llbracket o \rrbracket(f_1(\vec x), \ldots, f_{\mathrm{ar}(o)}(\vec x)) \in \mathcal{C}_n$.
For any term $T \in \mathrm{Term}_B(n)$ and any obstruction theory $\mathcal{C}$ over $B$: $$\llbracket T \rrbracket \;\in\; \mathcal{C}_n.$$ Direct structural induction over the construction of $T$.
If $f \notin \mathcal{C}_n$, then no term $T \in \mathrm{Term}_B(n)$ globally represents $f$.
10. Headline — no EML-tree represents the quintic root
Plug in the EML basis. Choose the obstruction class $\mathcal{C}_n = \{f : \alpha^n \to \alpha \mid f \text{ has solvable monodromy}\}$. The structural induction needs three pieces:
Discharged. Every constant function has solvable monodromy. Every projection has solvable monodromy. Both proved via the concrete hasSolvableMonodromy definition in Sheffer/Foundations/SolvableMonodromy.lean, built on the multivalued-analytic infrastructure in MultivaluedAnalytic.lean and the group-action layer in MonodromyGroup.lean. Both were once axioms; both are now theorems.
Remaining. Khovanskii's lemma — if $f$ and $g$ have solvable monodromy, so does $\eml(f, g)$. The Galois fact — the generic quintic root function has $S_5$ monodromy. Both are theorems with established proofs in the analytic-Galois literature; both are still axiomatized in the Lean source.
The function $\mathrm{quinticRoot} : (a_1, \ldots, a_5) \mapsto$ a real root of $f^5 + a_1 f^4 + a_2 f^3 + a_3 f^2 + a_4 f + a_5$, where one exists, has monodromy group $S_5$. Since $S_5$ is not solvable (it contains $A_5$, which is simple non-abelian), $\mathrm{quinticRoot} \notin \mathcal{C}_5$. This is the second of the two remaining Path-B axioms.
Putting the pieces together:
$\nexists\, T \in \mathrm{Term}_{B_{\eml}}(5) \;:\quad \llbracket T \rrbracket = \mathrm{quinticRoot}.$
The proof is the bridge lemma applied to $\mathrm{quinticRoot}$ plus the two remaining Path-B axioms. The const-and-var clauses of the structural induction are now theorems, built from the concrete monodromy infrastructure in Sheffer/Foundations/.
This is the formalization of Stylewarning's informal critique. It's the headline math that the team has produced. The Path-B axioms wrap the analytic-monodromy content as a black box; Path A would discharge them once mathlib has the underlying theory. Either way, the framework, the structural induction, the bridge lemma, and the headline theorem are on disk and machine-checked.
11. PolyAct — the same machinery, fully constructive
The framework supports multiple substrates. The EML / quintic-root story makes that point with axioms wrapping the analytic content; polynomial-activation networks, formalized in Sheffer/Examples/PolyAct.lean, make the same kind of point with no axioms at all. The analytic-continuation barrier is replaced by elementary polynomial degree arithmetic, and a single mathlib lemma about polynomial composition does all the work. The result is also about a real architecture — networks whose activation functions are polynomials of bounded degree.
The expressibility frontier below has a draggable version on the interactive page — slide $k$ and $d$ and watch $x^N$ enter or leave the representable region.
For an integer $k \geq 0$, let $\mathrm{PolyActOp}_k$ be the set of univariate real polynomials of degree at most $k$: $$\mathrm{PolyActOp}_k \;=\; \{\, p \in \R[X] \;\mid\; \deg p \leq k \,\}.$$ The polynomial-activation basis $B_{\mathrm{Poly},k}$ over $\R$ has one operator type ($\mathrm{PolyActOp}_k$), arity $1$, and evaluator $$\llbracket p \rrbracket(\vec{x}) \;=\; p(x_1).$$ A term over $B_{\mathrm{Poly},k}$ with one input variable is then a finitary composition: $p_1 \circ p_2 \circ \cdots \circ p_d \circ x$, where each $p_i$ has degree at most $k$.
For a term $T \in \mathrm{Term}_{B_{\mathrm{Poly},k}}(1)$, define recursively: $$\mathrm{polyDepth}(T) \;=\; \begin{cases} 0 & T = \mathrm{const}(c) \text{ or } T = \mathrm{var}(0) \\ 1 + \mathrm{polyDepth}(T') & T = \mathrm{app}(p;\, T') \end{cases}$$ and extract the underlying polynomial: $$\mathrm{toPoly}(T) \;=\; \begin{cases} C(c) & T = \mathrm{const}(c) \\ X & T = \mathrm{var}(0) \\ p \circ \mathrm{toPoly}(T') & T = \mathrm{app}(p;\, T'). \end{cases}$$
By structural induction, $\mathrm{toPoly}(T).\mathrm{eval}(x) = \llbracket T \rrbracket(\vec x)$ — the extracted polynomial computes the same thing as the term.
For any term $T$ over the polynomial-activation basis $B_{\mathrm{Poly},k}$: $$\deg(\mathrm{toPoly}(T)) \;\leq\; k^{\mathrm{polyDepth}(T)}.$$ Equivalently: every term of structural depth at most $d$ evaluates to a polynomial of degree at most $k^d$ (assuming $k \geq 1$).
- Constants and variables (depth 0): $C(c)$ has degree $0 \leq k^0$; $X$ has degree $1 \leq k^0$ when $k \geq 1$.
- Application (depth $1 + d'$): mathlib's
Polynomial.natDegree_comp_legives $\deg(p \circ q) \leq \deg(p) \cdot \deg(q)$. Apply: $\deg(p \circ \mathrm{toPoly}(T')) \leq k \cdot k^{d'} = k^{1+d'}$.
Polynomial.natDegree_comp_le applied to each level.$\nexists\, T \in \mathrm{Term}_{B_{\mathrm{Poly},2}}(1) \;:\quad \mathrm{polyDepth}(T) \leq 3 \;\wedge\; \forall x \in \R.\; \llbracket T \rrbracket(x) = x^9.$
- Suppose such a $T$ exists. By the degree-bound theorem, $\mathrm{toPoly}(T)$ has degree at most $2^3 = 8$.
- Since $\mathrm{toPoly}(T).\mathrm{eval}(x) = x^9$ at every $x \in \R$, the polynomial $\mathrm{toPoly}(T) - X^9$ has infinitely many roots. A polynomial in $\R[X]$ with infinitely many roots is the zero polynomial, so $\mathrm{toPoly}(T) = X^9$.
- But $\deg(X^9) = 9 > 8$. Contradiction.
What's notable about this result, relative to the EML / quintic-root headline:
- Zero axioms. The Path-B axioms that wrap monodromy theory in the EML case (two remaining, two already discharged via
Sheffer/Foundations/SolvableMonodromy.lean) have no analog here. The proof uses one mathlib lemma (Polynomial.natDegree_comp_le) and induction. - Same framework. The Sheffer machinery —
OperatorBasis,Term,representsGlobally— is reused unchanged. PolyAct is an instance, not a parallel framework. - An AI-architectural function class. Polynomial activations are a real (if not dominant) neural-network architecture. The headline says: a depth-3 network whose activations are at most quadratic cannot compute $x^9$. That's a falsifiable property of the architecture, not an abstract claim.
- Pedagogically simpler. The argument fits in three steps a calculus student can follow. EML's headline needs Galois theory and analytic continuation; PolyAct's needs only the multiplicative-degree law for polynomial composition.
12. NAND — Sheffer's positive completeness
The third instantiation. Where EML and PolyAct produce obstruction results — no term represents this function — NAND produces a completeness result: every Boolean function is representable as a NAND-tree. The framework supports both polarities.
Let $\alpha = \mathrm{Bool}$ and define one binary operator $\bar{\wedge}$ (NAND) by $$\bar{\wedge}(a, b) \;=\; \neg(a \wedge b).$$ The NAND basis $B_{\bar{\wedge}}$ has one operator type with one element, arity $2$, evaluator $\llbracket \bar{\wedge} \rrbracket(a, b) = \neg(a \wedge b)$.
Three Boolean primitives expressed as NAND-trees and proved to represent the standard logical operations:
$$\mathrm{notTree}(x) \;=\; \bar{\wedge}(x, x) \;=\; \neg x.$$ $$\mathrm{andTree}(x, y) \;=\; \bar{\wedge}(\bar{\wedge}(x, y),\, \bar{\wedge}(x, y)) \;=\; x \wedge y.$$ $$\mathrm{orTree}(x, y) \;=\; \bar{\wedge}(\bar{\wedge}(x, x),\, \bar{\wedge}(y, y)) \;=\; x \vee y. \qquad \text{(De Morgan)}$$
cases <;> simp. Bool has only two values, so two nested case-splits exhaust the space; the simplifier finishes from there. No axioms.For every $n \geq 0$ and every Boolean function $f : \mathrm{Bool}^n \to \mathrm{Bool}$, there exists a term $T \in \mathrm{Term}_{B_{\bar{\wedge}}}(n)$ such that $T$ globally represents $f$: $$\forall n, f.\;\;\exists T \in \mathrm{Term}_{B_{\bar{\wedge}}}(n).\;\; \forall \vec{x}.\; \llbracket T \rrbracket(\vec{x}) = f(\vec{x}).$$
true, build a minterm (a conjunction of literals — variables or their negations) that fires exactly on that row; OR the minterms together; rewrite the whole expression as a NAND-tree using the three direct constructions from the previous card. Helpers bigAnd, bigOr, minTerm, dnf plus the supporting lemmas minTerm_self_eval and minTerm_diff_eval close the case-split. Sheffer 1913, formalized with zero axioms.OperatorBasis + Term + representsGlobally machinery is polarity-agnostic — both questions are stated identically and proved separately.
13. Arithmetic circuits, VP, VNP
A different track from the Sheffer framework's three substrate examples: the foundations of algebraic complexity theory. Arithmetic circuits, polynomial-size families, and the classes VP and VNP that Valiant introduced in 1979 to study the complexity of polynomial computation. Arithmetic-circuit definitions, the VP and VNP classes, and the permanent's VNP membership via Ryser's formula are all on disk, proved without axioms.
For a commutative semiring $R$ and a variable type $\sigma$, an arithmetic circuit $C \in \mathrm{ArithmeticCircuit}_{R,\sigma}$ is freely generated by: $$\dfrac{i \in \sigma}{\mathrm{input}(i) \in \mathrm{ArithmeticCircuit}_{R,\sigma}} \quad \dfrac{c \in R}{\mathrm{const}(c) \in \mathrm{ArithmeticCircuit}_{R,\sigma}}$$ $$\dfrac{a, b \in \mathrm{ArithmeticCircuit}_{R,\sigma}}{\mathrm{add}(a, b) \in \mathrm{ArithmeticCircuit}_{R,\sigma}} \quad \dfrac{a, b \in \mathrm{ArithmeticCircuit}_{R,\sigma}}{\mathrm{mul}(a, b) \in \mathrm{ArithmeticCircuit}_{R,\sigma}}.$$ Three associated functions:
- $\mathrm{size}(C) \in \N$ — the number of
add/mulinternal gates. - $\mathrm{eval}(\mathrm{env}, C) \in R$ — evaluate $C$ at an input assignment $\mathrm{env} : \sigma \to R$.
- $\mathrm{toMvPolynomial}(C) \in R[\sigma]$ — extract the multivariate polynomial $C$ computes.
Type* rather than fixed to Fin n. This lets us state permanent ∈ VNP directly against MvPolynomial (Fin n × Fin n) R without an awkward Fin (n*n) encoding.A function $p : \N \to \N$ is polynomially bounded if there exist constants $c, d \in \N$ such that $p(n) \leq c \cdot n^d + c$ for all $n$.
A polynomial family $f = (f_n)_{n \in \N}$ where $f_n \in R[\mathrm{Fin}\, n]$ is in $\mathrm{VP}_R$ iff there exists a sequence of arithmetic circuits $(C_n)_{n \in \N}$ and a polynomially-bounded $p$ such that: $$\forall n.\;\; \mathrm{toMvPolynomial}(C_n) = f_n \;\wedge\; \mathrm{size}(C_n) \leq p(n).$$
$f \in \mathrm{VNP}_R$ iff there exists $g \in \mathrm{VP}_R$ over an extended variable type $\mathrm{Fin}\,(n + m(n))$ — for some polynomially-bounded $m$ — such that $f$ is a Boolean projection of $g$: $$f_n(\vec{x}) \;=\; \sum_{\vec{y} \in \{0, 1\}^{m(n)}} g_{n + m(n)}(\vec{x}, \vec{y}).$$
For all $R$ and all $f$: $$f \in \mathrm{VP}_R \;\;\Longrightarrow\;\; f \in \mathrm{VNP}_R.$$
For an $n \times n$ matrix of formal variables $X : \mathrm{Fin}\,n \times \mathrm{Fin}\,n \to R$, the permanent is the multivariate polynomial $$\mathrm{perm}_n(X) \;=\; \sum_{\sigma \in S_n} \prod_{i = 1}^{n} X_{i,\, \sigma(i)} \;\;\in\;\; R[\mathrm{Fin}\,n \times \mathrm{Fin}\,n].$$
Ryser (1963) gave an alternative formula for the permanent: $$\mathrm{perm}(X) \;=\; (-1)^n \sum_{S \subseteq [n]} (-1)^{|S|} \prod_{i = 1}^{n} \!\left( \sum_{j \in S} X_{i, j} \right).$$ Where the naive permutation sum has $n!$ terms, Ryser's sum has $2^n$ — and after reorganization, it admits a polynomial-size arithmetic circuit over $n + n^2$ variables: $n$ Boolean variables encoding the subset $S$, and the $n^2$ formal entries of $X$.
The formalization decomposes into two parts: the Ryser identity itself (the auxiliary polynomial summed over the Boolean cube equals the permanent) and the polynomial-size circuit witness for that auxiliary polynomial. Both are proved without axioms; the chain to the headline collapses to six lines once they are in hand. Explicit circuit size: $2n^2 + 5n + 2$, satisfying IsPolyBound with constant $k = 9$. See the proofs page for the Lean source.