What Is Deep Learning?

📅 2026-02-10 | ⏱️ 4 min read

📖 Part of Series: intro-to-deep-learning

1. What Is Deep Learning? (Current)

Context

I've been working through Andrej Karpathy's micrograd — a tiny autograd engine in ~100 lines of Python. It's one of the cleanest explanations of backpropagation I've found: no framework abstractions, just the raw math wired up in code.

To make sure I actually understand it (and not just pattern-matching off Karpathy's code), I'm reimplementing everything in Midori. If I can port it without looking at the Python, I probably get it. These posts are my notes from that process.

The Core Insight

Deep learning, at its core, decomposes into two ideas:

Function composition — chain simple ops (add, multiply, tanh) into a computation graph
The chain rule — use calculus to figure out how to tweak inputs to improve the output

The engine that automates step 2 is called autograd (automatic differentiation). That's what micrograd implements, and what I'm rebuilding here.

Computation Graph

Any expression can be drawn as a DAG (directed acyclic graph). For example, $e = \tanh(a \cdot b + c)$ :

a ----\
       (*) ----\
b ----/        (+) ---> tanh ---> e
              /
c -----------/

Each node stores three things:

data — the computed value (forward pass)
grad — $\frac{\partial e}{\partial \text{this node}}$ (filled during backward pass)
op — how this node was produced, so we can derive the local gradient

This is the standard tape-based autodiff representation. micrograd uses a Value class with _backward closures; I'll use a Graph array with Op tags instead (no closures needed).

Chain Rule Refresher

Given composed functions $y = f(x)$ , $z = g(y)$ , $L = h(z)$ :

$\frac{dL}{dx} = \frac{dL}{dz} \cdot \frac{dz}{dy} \cdot \frac{dy}{dx}$

Concrete example: $f(x) = (2x + 3)^2$ . Decompose as $u = 2x + 3$ , $f = u^2$ :

$\frac{df}{dx} = \frac{df}{du} \cdot \frac{du}{dx} = 2u \cdot 2 = 4u$

At $x = 1$ : $u = 5$ , so $\frac{df}{dx} = 20$ .

Nothing new if you've taken multivariable calculus, but the key realization is: autograd does exactly this, automatically, for arbitrarily complex graphs. It records $u$ and $f$ as nodes, stores how they were produced, and walks the graph backward applying the chain rule at each step.

Forward Mode vs Reverse Mode

There are two ways to propagate derivatives through a computation graph:

Forward mode starts at an input and pushes derivatives forward through each operation. Given $a \to b \to c \to L$ , it computes $\frac{\partial b}{\partial a}$ , then $\frac{\partial c}{\partial a}$ , then $\frac{\partial L}{\partial a}$ . One forward pass gives you the gradient w.r.t. one input. If you have $n$ parameters, you need $n$ passes.

Reverse mode starts at the output and pulls derivatives backward. It computes $\frac{\partial L}{\partial c}$ , then $\frac{\partial L}{\partial b}$ , then $\frac{\partial L}{\partial a}$ . One backward pass gives you the gradient w.r.t. every input.

In training, we want $\nabla_\theta L$ where $\theta \in \mathbb{R}^n$ — one scalar loss, many parameters. Reverse mode gets all $n$ gradients in a single pass. Forward mode would need $n$ passes. This is why backpropagation is reverse-mode autodiff.

Sanity Check

$f = (2x + 3)^2$ and its derivative, computed manually:

Loading interactive playground...

Set x = 2.0 → expect u = 7, f = 49, df/dx = 28.

This explicit calculation is trivial, but the goal of our autograd engine is to let us write the forward pass (f = (2.0*x + 3.0)^2) and have it automatically derive and evaluate that df/dx = 28 for us, without us ever having to hardcode the chain rule ourselves.

What's Coming

These are the building blocks. In the next parts, I'll define the Value/Op/Graph data structures, implement topological sort + backward pass, and finally train a neuron with SGD (stochastic gradient descent) — all without importing any library beyond basic I/O and math.

References

Andrej Karpathy, micrograd — the reference implementation this series is based on
Karpathy, neural networks: zero to hero — lecture series