One way we can measure the error of our estimating line is to sum all the
individual differences, or errors, between the estimated points.
Let
be the individual values of the
estimated points and Y
be the raw data values.
Figure 1 shows an example.

Figure 1
| Y |
| diff |
|---|---|---|
| 8 | 6 | 2 |
| 1 | 5 | -4 |
| 6 | 4 | 2 |
| total error | 0 | |
Figure 2 shows another example.

Figure 2
| Y |
| diff |
|---|---|---|
| 8 | 2 | 6 |
| 1 | 5 | -4 |
| 6 | 8 | -2 |
| total error | 0 | |
It also has a zero sum of error as shown from the above table.
A visual comparison between the two figures shows that the regression line in Figure 1 fits the three data points better than the ine in Figure 2. However the process of summing the individual differences in the above 2 tables indicates that both lines describe the data equally well. Therefore we can conclude that the process of summing individual differences for calculating the error is not a reliable way to judge the goodness of fit of an estimating line.
The problem with adding the individual errors is the canceling effect of the positive and negative values. From this, we might deduce that the proper criterion for judging the goodness of fit would be to add the absolute values of each error. The following table shows a comparison between the absolute values of Figure 1 and Figure 2.
| Figure 1 | Figure 2 | ||||
|---|---|---|---|---|---|
| Y |
| abs. diff | Y |
| abs. diff |
| 8 | 6 | 2 | 8 | 2 | 6 |
| 1 | 5 | 4 | 1 | 5 | 4 |
| 6 | 4 | 2 | 6 | 8 | 2 |
| total error | 8 | total error | 12 | ||
Since the absolute error for Figure 1 is smaller than that for Figure 2, we have confirmed our intuitive impression that the estimating line in Figure 1 is the better fit.
Figure 3 and Figure 4 below show another scenarios.

Figure 3

Figure 4
The following table shows the calculations of absolute values of the errors.
| Figure 3 | Figure 4 | ||||
|---|---|---|---|---|---|
| Y |
| abs. diff | Y |
| abs. diff |
| 4 | 4 | 0 | 4 | 5 | 1 |
| 7 | 3 | 4 | 7 | 4 | 3 |
| 2 | 2 | 0 | 2 | 3 | 1 |
| total error | 4 | total error | 5 | ||
We have added the absolute values of the errors and found that the estimating line in Figure 3 is a better fit than the line in Figure 4. Intuitively, however, it appears that the line in Figure 4 is the better fit line, because it has been moved vertically to take the middle point into consideration. Figure 3, on the other hand, seems to ignore the middle point completely.
Because The sum of the absolute values does not stress the magnitude of the error.
In effect, we want to find a way to penalize large absolute errors, so that we can avoid them. We can accomplish this if we square the individual errors before we add them. Squaring each term accomplishes two goals:
| Figure 3 | Figure 4 | ||||||
|---|---|---|---|---|---|---|---|
| Y |
| abs diff | square diff | Y |
| abs diff | square diff |
| 4 | 4 | 0 | 0 | 4 | 5 | 1 | 1 |
| 7 | 3 | 4 | 16 | 7 | 4 | 3 | 9 |
| 2 | 2 | 0 | 0 | 2 | 3 | 1 | 1 |
| sum of squares | 16 | sum of squares | 11 | ||||
Since we are looking for the estimating line that minimizes the sum of the squares of the errors, we call this the Least Squares Method.