Handcrafted algorithms
It is a very good exercise to write your algorithms from scratch to see what is really happening. In this post I used matrix notation in R to run linear regression.
Simulated data
First, we need some data. I wrote a simple function to simulate our X from Y.
makeX <- function(Y, n){
X <- matrix(1, ncol=1, nrow = dim(Y)[1])
for (i in 1:n){
X1 <- Y + rnorm(100, sd = 0.25) + sample(seq(-1,1,001), 100, replace = T)
X <- cbind(X, X1)
}
return(X)
}
X <- matrix(1, ncol=1, nrow = dim(Y)[1])
for (i in 1:n){
X1 <- Y + rnorm(100, sd = 0.25) + sample(seq(-1,1,001), 100, replace = T)
X <- cbind(X, X1)
}
return(X)
}
One dependent variable
We start with a very simple model.
Y <- matrix(rnorm(100), ncol=1)
X <- makeX(Y,1)
X <- makeX(Y,1)
The cool thing is that we can find the parameters P (i.e. intercept and slope) with one single matrix equation:
Expanding the linear model
### adding non-linear terms
Y <- matrix(sample(100,100, replace = T), ncol=1) # now Y has only positive values
## I changed the simulation function a little bit to get nice plots.
makeX2 <- function(Y, n){
X <- matrix(1, ncol=1, nrow = dim(Y)[1])
for (i in 1:n){
X1 <- sqrt(8*Y) +
rnorm(dim(Y)[1], sd = 1)
X <- cbind(X, X1)
}
return(X)
}
X <- makeX2(Y,1)
P <- solve((t(X) %*% X)) %*% t(X) %*% Y
E <- Y - X %*% P
Yhat <- X %*% P
plot(X[,2], Y)
abline(P[1,1], P[2,1], col="red")
X <- cbind(X, X[,2]^2) # squared first variable
P <- solve((t(X) %*% X)) %*% t(X) %*% Y
E <- Y - X %*% P
Yhat <- X %*% P
curve(P[3,1]*x^2 + P[2,1]*x + P[1,1], col="green", add=T)
Y <- matrix(sample(100,100, replace = T), ncol=1) # now Y has only positive values
## I changed the simulation function a little bit to get nice plots.
makeX2 <- function(Y, n){
X <- matrix(1, ncol=1, nrow = dim(Y)[1])
for (i in 1:n){
X1 <- sqrt(8*Y) +
rnorm(dim(Y)[1], sd = 1)
X <- cbind(X, X1)
}
return(X)
}
X <- makeX2(Y,1)
P <- solve((t(X) %*% X)) %*% t(X) %*% Y
E <- Y - X %*% P
Yhat <- X %*% P
plot(X[,2], Y)
abline(P[1,1], P[2,1], col="red")
X <- cbind(X, X[,2]^2) # squared first variable
P <- solve((t(X) %*% X)) %*% t(X) %*% Y
E <- Y - X %*% P
Yhat <- X %*% P
curve(P[3,1]*x^2 + P[2,1]*x + P[1,1], col="green", add=T)
Kommentare
Kommentar veröffentlichen