Linear Regression

Date: 2021-07-11 | Author: Jason Eveleth

Table of Contents

Theory

We are trying to find the line of best fit given a collection of coordinates. For example, {(x1,y1),(x2,y2),(x3,y3),… }\{(x_1, y_1), (x_2,y_2), (x_3, y_3), \dots\}. We would like to find m,bm,b such that y=mx+by = mx + b minimizes the sum of the residuals squared. By that I mean we want to minimize a function L(α)=∑i(yi−(mxi+b))2L(\alpha) = \sum_i (y_i - (mx_i + b))^2, by finding the best mm and bb.

We will use the notation from here. Here are some fun facts we'll need to know:

  1. ∑ixi2=∥x∥2\sum_i x_i^2 =\lVert x \rVert^2

  2. ∥x∥2=xTx\lVert x \rVert^2 = x^Tx

  3. (A+B)T=AT+BT(A + B)^T = A^T + B^T and (AB)=BTAT(AB)=B^TA^T.

  4. D(uTx)=uTD ( u^Tx) = u^T, D(xTu)=uTD (x^Tu) = u^T, and D(xTx)=D(xˉTx)+D(xTxˉ)D(x^Tx) = D(\bar{x}^Tx) + D(x^T\bar{x}).[1]

We need to set up some definitions. Let

y=[y1y2y3y4⋮],X=[x11x21x31x41⋮⋮],α=[mb].y = \begin{bmatrix}y_1\\ y_2\\ y_3\\ y_4\\ \vdots\end{bmatrix}, X = \begin{bmatrix}x_1 & 1\\ x_2 & 1\\ x_3 & 1\\ x_4 & 1\\ \vdots & \vdots\end{bmatrix}, \alpha = \begin{bmatrix}m\\b\end{bmatrix}.

So our goal is:

argminα L(α)\underset{\alpha}{\text{argmin}}\, L(\alpha) =argminα ∑((yi)−(mxi+b))2=\underset{\alpha}{\text{argmin}}\, \sum ((y_i) - (m x_i + b))^2

Apply fun fact number #1:

=argminα ∥[(y1−(mx1+b))(y2−(mx2+b))(y3−(mx3+b))(y4−(mx4+b))⋮]∥2=\underset{\alpha}{\text{argmin}}\, \left\lVert \begin{bmatrix}(y_1 - (m x_1 + b))\\ (y_2 - (m x_2 + b))\\ (y_3 - (m x_3 + b))\\ (y_4 - (m x_4 + b))\\ \vdots\end{bmatrix}\right\rVert^2 =argminα ∥y−Xα∥2=\underset{\alpha}{\text{argmin}}\, \lVert y - X\alpha\rVert^2

Apply fun fact #2:

=argminα (y−Xα)T(y−Xα)=\underset{\alpha}{\text{argmin}}\, (y-X\alpha)^T(y - X\alpha)

Apply fun fact #3:

=argminα (yT−αTXT)(y−Xα)=\underset{\alpha}{\text{argmin}}\, (y^T - \alpha^TX^T)(y - X\alpha) =argminα yTy−αTXTy−yTXα+αTXTXα=\underset{\alpha}{\text{argmin}}\, y^Ty - \alpha^TX^Ty - y^TX\alpha + \alpha^TX^TX\alpha

To minimize this, we find the critical points of the function. We do this by finding where the derivative of LL with respect to α\alpha is 00.

0=L′(α)=D(yTy−αTXTy−yTXα+αTXTXα)=D(yTy)−D(αTXTy)−D(yTXα)+D(αTXTXα) \begin{align*} 0 &= L'(\alpha)\\ &= D (y^Ty - \alpha^TX^Ty - y^TX\alpha + \alpha^TX^TX\alpha)\\ &=D (y^Ty) - D (\alpha^TX^Ty) - D (y^TX\alpha) + D (\alpha^TX^TX\alpha)\\ \end{align*}

Apply fun fact #4:

=0−(XTy)T−yTX+D(αTXTXαˉ)+D(αˉTXTXα)=−yTX−yTX+(XTXα)T+αTXTX=−2yTX+αT(XTX)T+αTXTX=−2yTX+αTXTX+αTXTX=−2yTX+2αTXTX \begin{align*} &= 0 - (X^Ty)^T - y^TX + D (\alpha^TX^TX\bar{\alpha}) + D (\bar{\alpha}^TX^TX\alpha)\\ &= -y^TX - y^TX + (X^TX\alpha)^T + \alpha^TX^TX\\ &=-2y^TX + \alpha^T(X^TX)^T + \alpha^TX^TX\\ &=-2y^TX + \alpha^TX^TX + \alpha^TX^TX\\ &=-2y^TX + 2\alpha^TX^TX\\ \end{align*}

And now for the glorious part (who am I kidding, this whole thing has been glorious),

2yTX=2αTXTX2y^TX=2\alpha^TX^TX XTy=(XTX)αX^Ty=(X^TX)\alpha (XTX)−1XTy=α=[mb](X^TX)^{-1}X^Ty = \alpha = \begin{bmatrix}m & b\end{bmatrix}

Examples

Say we are given this set of coordinates: {(1.3,0.8),(3.2,3.5),(5.6,6.4),(8.5,7.7)}\{(1.3, 0.8), (3.2, 3.5), (5.6, 6.4), (8.5, 7.7)\}. Then we can either do the arithmetic by hand (from the last equation) or open julia and give it the vectors (some output omitted):

julia> y = [0.8; 3.5; 6.4; 7.7]
julia> X = [1.3 1; 3.2 1; 5.6 1; 8.5 1]
julia> inv(X' * X) * X' * y
2-element Vector{Float64}:
 0.9628
 0.1228

Which gives us our desired line of best fit: y=0.962823x+0.122874y = 0.962823x + 0.122874.

[1] The bar in αˉ\bar{\alpha} or xˉ\bar{x} means treat it as a constant, in the same way, that you would if you were doing the product rule with single-valued functions: D(f(x)g(x))=D(f(x)gˉ(x))+D(fˉ(x)g(x))=(Df(x))g(x)+(Dg(x))f(x)D(f(x)g(x)) = D(f(x)\bar{g}(x)) + D(\bar{f}(x) g(x)) = (Df(x))g(x) + (Dg(x))f(x).
© Jason Eveleth 2023 · Powered by Franklin.jl · Last modified: December 31, 2024 Page Source