The proof that π = 4

Recreational math: why a troll proof involving circles is less wrong than it seems.

Jun 05, 2025

In an article published here last week, I discussed the perils of thinking about infinity as a number. More specifically, I criticized the structure of some of the elementary proofs that 0.9999… = 1.

As a teaching prop, I wheeled out the following equation:

\(\underbrace{1 - 1}_{= \ 0} + \underbrace{1 - 1}_{= \ 0} + \underbrace{1 - 1}_{= \ 0} \ \ \ldots = 0\)

This is an endless sum of alternating +1 and -1 terms. Pairwise, they all work out to zero, so the equation seems to make sense.

At the same time, there’s no risk of running out of terms in an infinite sum, so it seems harmless to shift the annotations one position to the right:

\(1 \ \ \underbrace{-1 + 1}_{= \ 0} \ \ \underbrace{-1 + 1}_{= \ 0}\ \ \underbrace{-1 + 1}_{= \ 0} \ \ \ldots = 0\)

This seems to be saying that 1 = 0. Oops.

The reason I like this “proof” is that it’s hard to reflexively dismiss. A common reaction is “oh yeah, but this left an unpaired - 1 at infinity”, but what does this even mean? Can we devise any way to probe what’s the last term of an infinite sum? If it can’t be measured, why does it matter? Does it even exist?…

In the earlier article, we concluded that in contexts like these, infinity must be understood as a process metaphor, not a quantity. It’s not as much about an infinite number of steps as it is about an unknowable number of steps.

Got it, are we done now?

Well, sort of. Thinking of infinity as a process is not enough; you also have to consider what aspects of the process need to be examined in the first place. Consider the following troll proof that π = 4:

We begin by drawing a circle with a diameter of 1; it follows that the circumference of this circle is 1π. We then draw a 1×1 square around the circle. The perimeter of the square is a sum of the lengths of its sides: 1 + 1 + 1 + 1 = 4.

Next, we “fold back” small sections near the corners of the square. We trim and reorient segments of length a such that the inverted corner just touches the circumference of the circle. Critically, this operation doesn’t change the perimeter of the outer shape. Each of the remaining long edges has a length of b = 1 - 2a; the newly-added corner sections are 8a in total. The sum of 4b + 8a works out back to 4.

Yet, the outer shape is now evidently a better approximation of the circle. If we perform another iteration, folding back the eight protruding corners, we get even closer to a circle without changing the perimeter in any way. If we keep doing this forever, the seemingly inescapable conclusion is that we’ll get infinitesimally close to the shape of a unit circle while keeping the perimeter of 4. In other words, the circle’s circumference must be also equal to 4. Or, to put it even more bluntly: π = 4.

Help me, Mr. Internet!

If you ask why the proof is wrong, the usual response on math forums is something along the lines of:

I get it: mathematics doesn’t concern itself with intuition or reality. It operates in a closed universe of axioms; the main thrust of the discipline is to make these axioms as abstract as possible, and specify them as precisely as possible. So, if you don’t want to learn the lingo of mathematical analysis, what are you doing here?

At the same time, you might be one of these entitled bozos who just want to know why the π = 4 proof is wrong. If so, don’t get fixated on the “infinity” part just yet. The first step is figuring out if the construction method is valid in the first place.

We can start by distilling the troll proof to a simpler but functionally identical case — trying to find the length of the diagonal of a 1×1 square:

We have a diagonal of some unknown length. We make the first approximation with a path consisting of a single horizontal segment and a single vertical segment (arrows, left). The overall length of this path is 1 + 1 = 2.

Next, similarly to the circle scenario, we fold back the corner where the two segments intersect. This gives us four identical sections, each ½ long (middle diagram); the overall length of the stairstep path remains 4 · ½ = 2. We keep going; the shape gets closer and closer to the diagonal, but the walking distance along the jagged path remains constant. As before, the apparent conclusion is that the diagonal has a length of 2 — and not the ~1.41 value you can measure with a ruler or calculate from the Pythagorean theorem.

So, what’s wrong with these proofs? Again, a reasonable suspicion is that this construction process doesn’t properly converge on the target shape. It helps to express this idea more precisely: we can analyze the pointwise distance between the stairstep approximation and the diagonal. The following diagram should help:

On the left, I marked the peak distance between the diagonal and the initial approximation; this is labeled x. If we look at the rotated view in the lower part of the figure, the actual distance changes linearly from 0 to x (and back). So, finding the average deviation is akin to calculating the average water level in a bucket that’s steadily filling up from empty to full. The average deviation is simply 50% of the maximum. I’m going to invent a symbol for the average error and write ε_shape= x/2.

In the center panel, the situation repeats: we have two triangles that are precisely half the size of the previous one. Within the span of each of these triangles, peak deviation is x/2, so the average is x/4; this doesn’t change if we line up both tringles side-by-side. The calculated average deviation is ε_shape= x/4. Finally, after one more iteration (right), we get ε_shape= x/8.

The deviation remaining after iteration c can be generalized as:

\(\varepsilon_{shape} = \frac{x}{2^c}\)

In this equation, x is just some finite constant value; we could calculate it, but we don’t need to. Either way, the expression robustly moves toward zero as we iterate — so from this perspective, our shape approximation algorithm looks fine.

So what’s wrong?

There are two ways to answer this. The first answer is purely geometric: if the construction method is fine, then our assumptions about the outcome must be wrong. Point-wise proximity and walking-path distance are frequently correlated, but they don’t need to be. You can probably take many routes from home to work or school that are geometrically distant, but have similar lengths. Conversely, two nearby walking paths can have vastly different lengths if one is straight as an arrow and the other zig-zags a lot.

To analyze this a bit more precisely, let’s have a look at the following diagram:

On the left, we have a diagonal of some length n and a two-segment path (total length 2). The resulting path error — the walking-distance difference between the two routes — is ε_path= 2 - n.

Next, let’s have a look at the middle diagram. Here, the length of the diagonal is obviously the same as before (n), while the stairstep curve has a length of 4 · ½; this yields ε_path= 4 · ½ - n = 2 - n — no change from before. The situation repeats on the right: ε_path= 8 · ¼ - n = 2 - n. That is to say, ε_pathappears independent of ε_shape; it remains constant (and pretty big) as we iterate.

We can observe that the diagonal is smooth, while the stairstep approximation we’re building is increasingly jagged. In each iteration, the size of every “detour” we’re taking is halved, but the number of detours is doubled — so the total overhead associated with this route doesn’t change. In other words, it’s not a contradiction that the shapes can get arbitrarily close without converging on a similar length.

You mentioned two issues?

Yes; the other problem is more subtle and open to some interpretation.

In a nutshell, in standard analysis — the prevailing flavor of mathematical fiction used to deal with infinity in algebraic contexts — most attempts to formally analyze the scenario would show that our increasingly jagged curve somehow collapses to a smooth diagonal (or a smooth circle) the moment we start talking about the limit “at infinity”.

The simplest way to illustrate the problem is to have another look at the earlier formula for the pointwise error between the stairstep pattern and the diagonal:

\(\varepsilon_{shape} = \frac{x}{2^c}\)

We’d be forgiven to say that as c (the iteration count) tends to infinity, the value of ε_shape becomes infinitely small. It’s not wrong, but this kind of talk is verboten: as outlined in the earlier article, infinitesimals have no place on the real number line. In standard mathematical discourse, “infinitely close to zero” and “equal to zero” are effectively the same, so the limit is zero. And if ε_shape = 0, then we must conclude that the two figures are exactly the same.

This also implies that “at infinity” — and not a moment sooner — the length of the constructed curve must jump from 2 to √2 (in the case of a diagonal), or from 4 or π (in the case of a circle).

The apparent collapse of our kinda-would-be-fractal doesn’t have any profound meaning; it’s just an outcome of an thought experiment in a framework where numbers must be finite, but processes can continue without end. This asymmetry can produce wacky results elsewhere, too; the earlier case of 0.9999… = 1 is another manifestation of the same phenomenon.

If we’re in a philosophical mood, we could insist that the geometric fine structure of the curve survives, just becomes too small to ever exert any influence on real numbers. That’s not just grasping at straws: there are nonstandard analysis approaches that allow infinitesimals and would keep the two curves distinguishable — at least for some definitions of infinity.

I write well-researched, original articles about geek culture, electronic circuit design, algorithms, and more. This day and age, it’s increasingly difficult to reach willing readers via social media and search. If you like the content, please subscribe!

Ben Hekster

Jun 5

Another interpretation of the “infinity times infinitesimal error” statement is just to realize that the staircase only _appears_ to converge to the diagonal; if you zoom in closely enough you see that the staircase never in fact actually converges. So there is no real mystery; it’s just a matter of scale

Expand full comment

Iustin Pop

For some reason, to me, this makes sense intuitively, while on your previous article I mentioned that 0.(9) still bothers me somehow.

3 more comments...

lcamtuf’s thing

Discussion about this post