#LinearRegression in Matrix Notation for #R

Handcrafted algorithms

It is a very good exercise to write your algorithms from scratch to see what is really happening. In this post I used matrix notation in R to run linear regression.

Simulated data

First, we need some data. I wrote a simple function to simulate our X from Y.

makeX <- function(Y, n){
  X <- matrix(1, ncol=1, nrow = dim(Y)[1])
  for (i in 1:n){
    X1 <- Y + rnorm(100, sd = 0.25) + sample(seq(-1,1,001), 100, replace = T)
    X <- cbind(X, X1)
  }
  return(X)
}

One dependent variable

We start with a very simple model.

Y <- matrix(rnorm(100), ncol=1)
X <- makeX(Y,1)

The cool thing is that we can find the parameters P (i.e. intercept and slope) with one single matrix equation:

P = (X^tX)^-1X^tY

In R this looks like this:

P <- solve((t(X) %*% X)) %*% t(X) %*% Y

And now we can run it and plot the results:

E <- Y - X %*% PYhat <- X %*% Pplot(X[,2], Y)abline(P[1,1], P[2,1], col="red")points(x= X[,2], y = Yhat, col="green")

$Multivariate regression$

The next two examples do the same but with more variables and with non-linear terms.

### same as multivariant regressionX <- makeX(Y,10)P <- solve((t(X) %*% X)) %*% t(X) %*% YE <- Y - X %*% PYhat <- X %*% Pplot(X[,2], Y)abline(P[1,1], P[2,1], col="red")points(x= X[,2], y = Yhat, col="green")

Expanding the linear model

### adding non-linear terms
Y <- matrix(sample(100,100, replace = T), ncol=1) # now Y has only positive values

## I changed the simulation function a little bit to get nice plots.
makeX2 <- function(Y, n){
  X <- matrix(1, ncol=1, nrow = dim(Y)[1])
  for (i in 1:n){
    X1 <- sqrt(8*Y) +
    rnorm(dim(Y)[1], sd = 1)
    X <- cbind(X, X1)
  }
  return(X)
}

X <- makeX2(Y,1)
P <- solve((t(X) %*% X)) %*% t(X) %*% Y
E <- Y - X %*% P
Yhat <- X %*% P

plot(X[,2], Y)
abline(P[1,1], P[2,1], col="red")

X <- cbind(X, X[,2]^2) # squared first variable
P <- solve((t(X) %*% X)) %*% t(X) %*% Y
E <- Y - X %*% P
Yhat <- X %*% P

curve(P[3,1]*x^2 + P[2,1]*x + P[1,1], col="green", add=T)

Political Data Science

Dieses Blog durchsuchen