Why does linear interpolation always underestimate square roots?

Question

If we estimate a square root using the so-called Babylonian method, the result is always overestimated and the reason obvious: we are ignoring the quadratic component of the solution. However, if we approximate a square root use a quick linear interpolation, the result is always underestimated. For example, approximating √55 would give us 111/15 = 7.4 when the true value is 7.416... (The Babylonian method would give 7.428...) As one commentator alluded to, this is because the root function is half-parabolic, but that really only explains the what. (The parabola is a visual representation of the function.)

I am trying to have a better intuition as to why the function is concave curvilinear rather than linear. I know that it is, but I thought that for a simple function like the square root, there might be a geometric intuition as to why it is. An equivalent question might be why the same operation applied to a square would underestimate the y.

Hint: draw the graph and look at whether it's cup or cap shaped, so above or below its tangent. — Ethan Bolker
– Ethan Bolker, Commented Oct 3 at 21:28
Could you briefly describe what you mean by linear interpolation? Where did the $111$ come from any why did you divide by $15$. — fleablood
– fleablood, Commented Oct 3 at 21:35
@fleablood I think OP interpolates between $\sqrt{49}=7$ and $\sqrt{64}=8$ via $7 + \frac{55-49}{64-49} = 7 + \frac{6}{15}$. — angryavian
– angryavian, Commented Oct 3 at 22:08
You're basically asking why $\sqrt{x}$ is a concave function. The calculus answer is its second derivative is negative. Or maybe you're asking why concave functions are like that. — J.G.
– J.G., Commented Oct 3 at 22:30

David K · Accepted Answer · 2025-10-04 06:07:48Z

This might be what you're thinking of. I chose a somewhat simpler example than the one in the question so that it would be easier to draw.

You can represent $4 = \sqrt{16}$ and $5 = \sqrt{25}$ by constructing two squares sharing a common vertex as shown below. Square $ABCD$ has edges of length $4$ and area $16$; square $AEFG$ has edges of length $5$ and area $25.$

The L-shaped region in the upper right is the difference between the two squares; it has area $9.$ We can imagine adding half that area to the smaller square to get a square of area $20.5.$ The area $20.5$ then is exactly halfway between the areas of the two squares shown above.

Linear interpolation is the idea that if you take an input halfway between two known input values you'll get a result halfway between the two known results. Under this hypothesis, for an input $20.5$ which is halfway between the given inputs $16$ and $25,$ you should get an output $4.5$ which is halfway between the known outputs $4$ and $5.$

But the result of interpolating a square of side $4.5$ is shown below.

By placing $P$ at the midpoint of $BE$ and building a square on side $AP,$ we separated the L-shaped region $BEFGDC$ into two smaller L-shaped regions, one of area $4.25$ and one of area $4.75.$ The problem is that even though the two Ls have the same width across each arm of the L, the L on the upper right is longer than the L on the lower left, so its area is larger.

The square of side $4.5$ doesn't have enough area to represent the square root of $20.5.$ Its side is actually the square root of $20.25.$ You need a slightly larger square to represent the square root of $20.5.$

It turns out at $AP'$ (the edge of the new square) is approximately $4.5277,$ but the exact value is less important for your question than the fact that $4.5$ is too small.

Moving beyond the specific example above, let's consider more generally why splitting the distance $BE$ in a certain proportion never partitions the area of the L-shaped region in the same proportion, and in fact always has too little area in the lower left partition and too much in the upper right partition.

In the figure above we have two squares $ABCD$ and $AEFG$ of arbitrary sizes sharing a common vertex. We place point $P$ to divide the segment $BE$ in an arbitrary ratio $BP : PE$ and build a square $APQR$ on side $AP.$

If linear interpolation gave a completely accurate result, we would find that the ratio of areas of the L-shaped regions was $$ \operatorname{Area}(BPQRDC) : \operatorname{Area}(PEFGRQ) \stackrel?= BP : PE. $$

Now let's place $S$ between $E$ and $F$ so that $ES = BC$ and place $T$ between $G$ and $F$ so that $GT = DC.$ Then all four trapezoids shown in the figure have congruent pairs of bases, so their areas are proportional to their heights:

$$ \operatorname{Area}(DRQC) : \operatorname{Area}(RGTQ) = \operatorname{Area}(BPQC) : \operatorname{Area}(PESQ) = BP : PE. $$

The problem with linear interpolation is that $$ \operatorname{Area}(BPQRDC) = \operatorname{Area}(DRQC) + \operatorname{Area}(BPQC) $$ but $$ \operatorname{Area}(PEFGRQ) = \operatorname{Area}(RGTQ) + \operatorname{Area}(PESQ) + \operatorname{Area}(SFTQ), $$ and the area of the kite-shaped quadrilateral $SFTQ$ is always positive. So the second L-shaped region is too large for the ratio of areas to be equal to $BP : PE.$ And because the second (upper right) L-shaped region is always the one that's too large, we always need to replace $P$ by a point $P'$ closer to $E$ (so $AP' > AP$) in order to get the correct ratio of L-shaped regions. That's why $AP,$ the "square root" we get by linear interpolation, is always smaller than $AP',$ the true square root.

The following diagram might be even more obvious, although less symmetric.

In this figure the side $CD$ of the small square is extended to intersect the large square at $S$ and the middle-sized square at $T,$ so $PT = ES = BC.$ The side $PQ$ of the middle-sized square is extended to intersect the large square at $U.$ In this way the L-shaped region is partitioned into rectangles with $$ \operatorname{Area}(DRQT) : \operatorname{Area}(RGUQ) = \operatorname{Area}(BPTC) : \operatorname{Area}(PEST) = BP : PE. $$ But there is still some leftover area in the upper-right L shape in the rectangle $STUF,$ so the upper-right L shape is too large to be in the proper proportion with the lower-left L shape; this implies that $P$ is too close to $B$ and the linearly interpolated answer is too small.

As a bonus, here's a geometric construction of square roots. To take the square roots of $x_1,$ $x_2,$ and $x_3$ where $x_1 < x_2 < x_3,$ put $L,$ $O,$ $X_1,$ $X_2,$ and $X_3,$ along a straight line in that sequence so that $LO = 1,$ $OX_1 = x_1,$ $OX_1 = x_2,$ and $OX_1 = x_3.$ Construct another line $\ell$ through $O$ perpendicular to $LO$ and construct semicircles on diameters $LX_1,$ $LX_2,$ and $LX_3$ intersecting the line $\ell$ at $R_1,$ $R_2,$ and $R_3$ respectively. Then $OR_1 = y_1 = \sqrt{x_1},$ $OR_2 = y_2 = \sqrt{x_2},$ and $OR_3 = y_3 = \sqrt{x_3}.$

Note that if we wanted to interpolate $y_2$ between $y_1$ and $y_3$ according to the ratios of $x_1,$ $x_2,$ and $x_3,$ we would have made the line $R_1R_3$ parallel to $X_1X_3$ instead of perpendicular and we would have drawn straight lines through the points instead of semicircles. So the square root construction is quite different from the linear interpolation construction.

Dan · Accepted Answer · 2025-10-03 22:59:02Z

7

Because the graph of $y = \sqrt{x}$ is concave down. In calculus terms, $f''(x) = -\frac{1}{4x\sqrt{x}}$ is negative on its entire domain. Thus, the secant line interpolating any two points on the curve will fall below the curve within that interval.

Perhaps a visual will help. On the plot below, blue = square root, and orange = linear interpolation for $x \in [1, 9]$.

edited Oct 3 at 22:59

answered Oct 3 at 22:41

Dan

19.3k3 gold badges30 silver badges55 bronze badges

$\begingroup$ Thank you. That explains the what very clearly, but I am really trying to find a mathematical intuition as to why, perhaps a geometric one. $\endgroup$

POD
– POD

2025-10-03 22:52:49 +00:00
Commented Oct 3 at 22:52
5

$\begingroup$ @POD Re "perhaps a geometric one" Isn't "concave down" a sufficiently geometric explanation? $\endgroup$

njuffa
– njuffa

2025-10-03 22:56:47 +00:00
Commented Oct 3 at 22:56
1

$\begingroup$ This is a geometric one. The line segment between $(49,7)$ and $(64,8)$ lies below the function, and linear interpolation is the value of the line. @POD $\endgroup$

Thomas Andrews
– Thomas Andrews

2025-10-03 23:04:08 +00:00
Commented Oct 3 at 23:04
1

$\begingroup$ @POD: Attempting to answer your question with a question: If you want to double the area of a square, why don't you double its side length? $\endgroup$

Dan
– Dan

2025-10-04 00:16:05 +00:00
Commented Oct 4 at 0:16
1

$\begingroup$ Best comment so far. Thank you, @Dan. That is probably not what you were expecting, but it is the closest thing that I have heard to providing a neat intuition. We all know that the relationship is curvilinear and concave, but I am trying to explain to myself, in a rigorous and likely geometric way, why that is the case. Your question, although not answering mine directly, I think has led me to a neat geometric explanation. $\endgroup$

POD
– POD

2025-10-04 00:28:16 +00:00
Commented Oct 4 at 0:28

| Show 4 more comments

fleablood · Accepted Answer · 2025-10-04 15:58:12Z

Okay, you talked about "mathematical intuition".

I'm assuming you've the image of Dan's answer (although for my brain I think a graph of $y = x^2$ rather than $y = \sqrt x$ would help my particular intuition better.)

A "concave" function is one in which... graphically.... If you connect two points on the graph with a line the points of the graph of the function will all lie below (or above) all the points of a line.

[So this would mean, for $y=x^2$ then point the points of the line $(x,55)$ would lie above the point $(x,x^2)$ so $x^2 < 55$ and $x < \sqrt{55}$]

And ... "really" .... concave (up) would mean: For two points $(w, f(w))$ and $(v, f(v))$ with $w < v$ then for all $u: w < u < v$ the points of the line $y_x=f(w) + \frac {f(v) - f(w)}{v-w}(x-w)$ will have $y_x > f(x)$.

[And this literally says that linear interpolation of a convex up, such as $f(x)=x^2$, will always estimate too large and of a convex down, such as $f(x)=\sqrt x$, will always estimate too small. That is literally the definition.]

So now to the point of this post and "mathematical intuition". Why should $f(x) =x^2$ be concave up, and conversally, why should $f(x) = \sqrt x$ be concave down?

Well, the sophisticated answer is "derivatives".

But for me, an intuitive answer is: As we poke $x\to x+d$ where $d$ is just a teeny bit, then $x^2$ will increase to $(x+d)^2 = x^2 + 2xd + d^2$ and increase of $2xd + d^2$. This is not a constant rate. If it were a constant rate it would be a line. But this is not a constant rate and as $x$ increases $x^2$ will increase by larger amounts. In other words we "pack more in". This is why the graph of a parabola $f(x) =x^2$ has such "bulgy" shape.

Stack Exchange Network

Why does linear interpolation always underestimate square roots?

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Why does linear interpolation always underestimate square roots?

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions