Linear Least Squares Methods with R: An Algebraic Approach - Part I

Linear Least-Squares Method with R - Part I

Linear Least Squares Methods with R: An Algebraic Approach - Part I

Algebraic Approach Principles

The least-squares method is one of the most well-known linear optimization methods because of its flexibility. Furthermore, it gives a reasonable approximation of a given function. Among the diverse applications that it can be used for is Regression in statistical learning, Direct Linear Transformation methods in projective geometry, and so on. We will demonstrate the principles of least squares methods and implement examples in R in this article.

$\mathbf{x}$ $A$ $\mathbf{y}$ is the output variable. That is, the model for the equation is given by

\begin{matrix} (1) & y = A x, \end{matrix}

$\eqref{eq:ref1}$ can be expressed as

\begin{matrix} (2) & [\begin{matrix} y_{1} \\ y_{2} \\ ⋮ \\ y_{i} \end{matrix}] = [\begin{matrix} a_{1, 1} & a_{1, 2} & \dots & a_{1, i} \\ a_{2, 1} & a_{2, 2} & \dots & a_{2, i} \\ ⋮ & ⋮ & \dots & ⋮ \\ a_{j, 1} & a_{j, 2} & \dots & a_{j, i} \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{i} \end{matrix}] \end{matrix}

$\mathbf{x}$ $A$ $\mathbf{y}$ . To do that, we consider the equivalent estimated model input and output given by

\begin{matrix} (3) & \hat{y} = A \hat{x} . \end{matrix}

We then define a minimization criterion. In this case we use a quadratic minimization criterion, since we guarantee to locate global minima, that is,

\begin{matrix} (4) & min_{x} ∥ y - \hat{y} ∥^{2} = min_{x} ∥ y - A \hat{x} ∥^{2} . \end{matrix}

$\mathbf{\hat y}$ $\mathbf{y}$ $\eqref{eq:ref3}$ as follows

\begin{matrix} (5) & \begin{aligned} e_{r r} & = y - \hat{y} \\ = y - A \hat{x} . \end{aligned} \end{matrix}

$\eqref{eq:ref4}$ in matrix notation we have

\begin{matrix} (6) & ∥ e_{r r} ∥^{2} = (y - A \hat{x})^{T} (y - A \hat{x}), \end{matrix}

$\eqref{eq:ref6}$ we obtain the following procedure

\begin{matrix} (7) & \begin{aligned} (y - A \hat{x})^{T} (y - A \hat{x}) & = \\ y^{T} y - (A \hat{x})^{T} y - y^{T} (A \hat{x}) + (A \hat{x})^{T} (A \hat{x}) & = \\ y^{T} y - {\hat{x}}^{T} A^{T} y - {\hat{x}}^{T} A^{T} y + {\hat{x}}^{T} A^{T} A \hat{x} & = \\ y^{T} y - 2 {\hat{x}}^{T} A^{T} y + {\hat{x}}^{T} A^{T} A \hat{x} . \end{aligned} \end{matrix}

$\eqref{eq:ref7}$ we perform

\begin{matrix} (8) & min_{\hat{x}} {y^{T} y - 2 {\hat{x}}^{T} A^{T} y + {\hat{x}}^{T} A^{T} A \hat{x}}, \end{matrix}

that is

\begin{matrix} (9) & \begin{aligned} \frac{\partial}{\partial \hat{x}} {y^{T} y - 2 {\hat{x}}^{T} A^{T} y + {\hat{x}}^{T} A^{T} A \hat{x}} & = 0, \\ - 2 A^{T} y + 2 A^{T} A \hat{x} & = 0, \\ A^{T} A \hat{x} & = A^{T} y \\ \hat{x} & = (A^{T} A)^{- 1} A^{T} y \\ \hat{x} & = A^{†} y . \end{aligned} \end{matrix}

$\eqref{eq:ref9}$ the well know general expression of Linear Least Squares

\begin{matrix} (10) & \begin{aligned} \hat{x} & = (A^{T} A)^{- 1} A^{T} y \\ \hat{x} & = A^{†} y, \end{aligned} \end{matrix}

$\mathbf{\hat x}$ $A^\dagger$ $A$ .

Example 1: Worked Example with R

Consider the following model for data generation specified as


# Data generator functions
fdata <- function(t) {
  -2.50*t^2 + 1.75*t + 232.50 
}

$t$ $1$ $10$ and generate simulated data using


xxxxxxxxxx
# Generate and plot observed data
x <- 1:10
ydata <- fdata(x)
plot(x, ydata, pch=19, panel.first = grid(),
     col='red', xlab = 't in seconds', 
     ylab = 'Observed data in Meters')
curve(fdata, from = 1, to = 10, col = 'darkred',
      lty=5, add = T)

The above code, results in the following plot which simulates some particle moving a certain amount of meter through the time.

$\mathbf{y}$ $\mathbf{\hat x}$ .

t	y
1	231.75
2	226
3	215.25
4	199.5
5	178.75
6	153
7	122.25
8	86.5
9	45.75
10	0

We suspect the model better fit is quadratic, that is


xxxxxxxxxx
# Fit Model
fmodel <- function(t) {
  c(t^2, t, 1)
}

Also we define support functions to compute pseudo inverse as


xxxxxxxxxx
# Pseudo Inverse compute function
pseudo.inv <- function(M){
  solve(t(M)%*%M)%*%t(M)
}

$A$ as


xxxxxxxxxx
# Coefficients estimation
x.hat <- pseudo.inv(A)%*%matrix(ydata)

to obtain the same values from the original model


xxxxxxxxxx
[1,]  -2.50
[2,]   1.75
[3,] 232.50

Under ideal conditions, if we observed a phenomenon, modeled it, and estimated its inputs, we would obtain its exact value, but this is not the case in reality. Consider now, the original model with additive noise as follows


xxxxxxxxxx
# Additive noise to original observed data
y.noise_obs <- ydata + rnorm(10,1,10)
A <- do.call(rbind, lapply(x, fmodel))

$\eqref{eq:ref3}$ we perform


xxxxxxxxxx
# Coefficients estimation
x.hat <- pseudo.inv(A)%*%matrix(y.noise_obs)
y.hat <- function(xhat, t) xhat[1]*t^2 + xhat[2]*t + xhat[3]

where we use the following snippet to view the results


xxxxxxxxxx
# View results
curve(fdata, from = 1, to = 10, col = 'darkblue', lty=5, panel.first = grid(),xlab = 't in seconds', 
      ylab = 'Observed data in Meters')
curve(y.hat(x.hat, x), from = 1, to = 10, 
      col = 'darkred', add = T)
points(x, y.noise_obs, pch=19, col='blue')

Where the red line is the estimated model and the blue dashed line is the original and without noise model.

Conclusion

$\beta$ coefficients in linear models for ordinary least squares regression problems.

Variance concepts in the context of parametric programming with Java

Variance concepts in the context of parametric programming with Java By Obed Rios (5/7/2023 ) Revision 1.0 Abstract In Java, the concepts of variance are related to how the type parameters of a class or interface are related to each other when the class or interface is sub-typed or implemented. The key difference between invariant and covariant in the context of Java generics is how they handle sub-typing relationships. Invariant types do not allow assignments between different type parameters, while covariant types can accept a specified type or any of its sub-types. In addition contravariance enables you to use a more general type (super type) in a generic type or method that would normally require a more specific type (sub-type). In this work, we show explicitly the concepts of variance in the context of Java Generics. Introduction Parametric variance refers to the relationship between the type parameters of a class or interface and their subtypes. It defines how subtyping is i...

The Hyper-Real Domain

Search This Blog