# Matrix Algebra

Rafael A. Irizarry
October 9, 2014

### Motivation

• Our data is usually saved in matrices.

• In the gene expression example data, $$X$$, was $$8793 \times 208$$

• Say you want to compute the average of each column

### Matrix Notation

• Let $$X$$ be a data matrix with $$N$$ rows and $$p$$ columns

• The vector of column averages:

$\left( \frac{1}{N}\sum_{i=1}^N X_{i,1}, \dots, \frac{1}{N}\sum_{i=1}^N X_{i,p}\right)$

can be written like this $$AX$$ with $$A=(\frac{1}{N},\dots,\frac{1}{N})$$

### Matrix Notation

Here are the matrix operations $AX = \begin{pmatrix} \frac{1}{N} & \frac{1}{N} & \dots & \frac{1}{N} \end{pmatrix} \begin{pmatrix} X_{1,1}&\dots & X_{1,p} \\ X_{2,1}&\dots & X_{2,p} \\ & \vdots & \\ X_{N,1}&\dots & X_{N,p} \\ \end{pmatrix}$

$= \begin{pmatrix} \bar{X}_1 & \bar{X}_2 & \dots &\bar{X}_N \end{pmatrix}$

### Matrix Notation

Here are the row averages:

$XB = \begin{pmatrix} X_{1,1}&\dots & X_{1,p} \\ X_{2,1}&\dots & X_{2,p} \\ & \vdots & \\ X_{N,1}&\dots & X_{N,p} \\ \end{pmatrix} \begin{pmatrix} 1/p\\ 1/p\\ \vdots\\ 1/p\\ \end{pmatrix} = \begin{pmatrix} \bar{X}_1 \\ \bar{X}_2 \\ \vdots\\ \bar{X}_q\\ \end{pmatrix}$

### Matrix Notation (more general)

$AX = \begin{pmatrix} a_1 & a_2 & \dots & a_N \end{pmatrix} \begin{pmatrix} X_{1,1}&\dots & X_{1,p} \\ X_{2,1}&\dots & X_{2,p} \\ & \vdots & \\ X_{N,1}&\dots & X_{N,p} \\ \end{pmatrix}$

$= \begin{pmatrix} \sum_{i=1}^N a_i X_{i,1} & \dots & \sum_{i=1}^N a_i X_{i,p} \end{pmatrix}$ Convenient and fast for coding

### Matrix Notation

• Say for each gene you want to compute
1. the average across all samples
2. the difference between males and females

### Matrix Notation

$\begin{pmatrix} X_{1,1}&\dots & X_{1,p} \\ X_{2,1}&\dots & X_{2,p} \\ & \vdots & \\ X_{N,1}&\dots & X_{N,p} \\ \end{pmatrix} \begin{pmatrix} 1/p&a_1\\ 1/p&a_2\\ \vdots\\ 1/p&a_p\\ \end{pmatrix}$ with \begin{align*} a_j &= \begin{cases} -1/N_F & \text{if female } \\ 1/N_M & \text{if male} \end{cases} \end{align*}

with $$N_F$$=number of females and $$N_M$$=number of males.

### Matrix Notation (even more general)

$AX = \begin{pmatrix} a_{1,1} & a_{1,2} & \dots & a_{1,N}\\ a_{2,1} & a_{2,2} & \dots & a_{2,N}\\ & & \vdots & \\ a_{M,1} & a_{M,2} & \dots & a_{M,N}\\ \end{pmatrix} \begin{pmatrix} X_{1,1}&\dots & X_{1,p} \\ X_{2,1}&\dots & X_{2,p} \\ & \vdots & \\ X_{N,1}&\dots & X_{N,p} \\ \end{pmatrix}$

$= \begin{pmatrix} \sum_{i=1}^N a_{1,i} X_{i,1} & \dots & \sum_{i=1}^N a_{1,i} X_{i,p}\\ & \vdots & \\ \sum_{i=1}^N a_{M,i} X_{i,1} & \dots & \sum_{i=1}^N a_{M,i} X_{i,p} \end{pmatrix}$

### Regression Models

We used regression with our Baseball data. For example

$Y_i= \beta_0 + X_{i}\beta_1 + \varepsilon_i, i=1,\dots,N$

with $$Y_i$$ = runs for team $$i$$ and $$X_{i}$$ = BB for team $$i$$.

We need to find the $$\beta_0,\beta_1$$ pair that minimizes

$\sum_{i=1}^N (Y_i- \beta_0 - X_i\beta_1)^2$

How do we do this?

### Linear Algebra Representation

We can write it like this:

$\begin{pmatrix} Y_1\\ Y_2\\ \vdots\\ Y_N\\ \end{pmatrix} = \begin{pmatrix} 1\\ 1\\ \vdots \\ 1\\ \end{pmatrix} \beta_0 + \begin{pmatrix} X_{1}\\ X_{2}\\ \vdots \\ X_{N}\\ \end{pmatrix} \beta_1+ \begin{pmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_N\\ \end{pmatrix}\\$

The $$Y$$ column is a point in $$\mathbb{R}^N$$

The $$1$$ and $$X$$ vectors define a subspace, each point defined by the pair $$(\beta_0,\beta_1)\in \mathbb{R}^2$$.

### Projections

Recall: The projection of the point $$Y$$ to the smaller space minimizes the distance.

### Simplest example

$$\vec{y} \in \mathbb{R}^N$$ and $$L \subset \mathbb{R}^N$$ is the space spanned by

$\vec{v}=\begin{pmatrix} 1\\ \vdots \\ 1\end{pmatrix}; L = \{ c \vec{v}; c \in \mathbb{R}\}$

What $$c$$ minimizes distance between $$c\vec{v}$$ and $$\vec{y}$$?

Answer: The projection of $$\vec{y}$$ onto $$L$$: $\mbox{Proj}_L(\vec{y}) = \hat{c} \vec{v}$

So how do we find $$\hat{c}$$?

### Simplest example

The projection is orthogonal to the space so:

$(\vec{y}-\hat{c}\vec{v}) \cdot \vec{v} = 0$

$\vec{y}\cdot\vec{v} - \hat{c}\vec{v}\cdot\vec{v} = 0$

$\hat{c} = \frac{\vec{y}\cdot\vec{v}} {\vec{v}\cdot\vec{v}}$

### Simplest example

$\hat{c} = \frac{\vec{y}\cdot\vec{v}} {\vec{v}\cdot\vec{v}}$

Going back to our original notation this means $$\hat{\beta}_0 = \bar{Y}$$

In this case calculus would have been just as easy:

$\frac{\partial}{\partial\beta_0}\sum_{i=1}^N (Y_i - \beta_0)^2 = 0 \implies - 2 \sum_{i=1}^N (Y_i - \beta_0) = 0 \implies$

$N \beta_0 = \sum_{i=1}^N Y_i \implies \hat{\beta_0}=\bar{Y}$

### Regression Model

Make it slightly more complicated.

The space is now

$L = \{ \beta_0 \vec{v}_0 + \beta_1 \vec{v}_1 ; \vec{\beta}=(\beta_0,\beta_1) \in \mathbb{R}^2 \}$

with

$\vec{v}_0= \begin{pmatrix} 1\\ 1\\ \vdots \\ 1\\ \end{pmatrix} \mbox{ and } \vec{v}_1= \begin{pmatrix} X_{1}\\ X_{2}\\ \vdots \\ X_{N}\\ \end{pmatrix}$

### The Transpose

The columns become the rows

$X = \begin{pmatrix} X_{1,1}&\dots & X_{1,p} \\ X_{2,1}&\dots & X_{2,p} \\ & \vdots & \\ X_{N,1}&\dots & X_{N,p} \\ \end{pmatrix} , \, X^\top = \begin{pmatrix} X_{1,1}&\dots & X_{N,1} \\ X_{1,2}&\dots & X_{N,2} \\ & \vdots & \\ X_{1,p}&\dots & X_{N,p} \\ \end{pmatrix}$

### Variance

This helps us define projections but also:

If $$X$$ is $$N\times p$$ matrix of features for which the columns have average 0,

$S = \frac{1}{N} X^\top X$

• is called the sample covariance matrix
• $$S_{i,i}$$ is the variance of column $$i$$
• $$S_{i,j}$$ entry is the covariance between columns $$i$$ and $$j$$.
• $\frac{S_{i,j}}{\sqrt{S_{i,i}S_{j,j}}}$ is the correlation

### Regression Model

Let $$X = [ \vec{v}_0 \,\, \vec{v}_1]$$. It's a $$N\times 2$$ matrix

Any point $$L$$ can be writen as $$X\beta$$.

The orthogonal projection must have

$$X^\top (\vec{y}-X\vec{\beta}) = 0$$

$X^\top X \vec{\beta}= X^\top y$

$\vec{\beta}= (X^\top X)^{-1}X^\top y$

### Regression Models

In general, we can write this

$Y_i= \beta_0 + X_{i,1}\beta_1 + \dots X_{i,p} \beta_p + \varepsilon_i, i=1,\dots,N$ as

$\begin{pmatrix} Y_1\\ Y_2\\ \vdots\\ Y_N\\ \end{pmatrix} = \begin{pmatrix} 1\\ 1\\ \vdots \\ 1\\ \end{pmatrix} \beta_0 + \begin{pmatrix} X_{1,1}\\ X_{2,1}\\ \vdots \\ X_{N,1}\\ \end{pmatrix} \beta_1+ \dots + \begin{pmatrix} X_{1,p}\\ X_{2,p}\\ \vdots \\ X_{N,p}\\ \end{pmatrix}\beta_p + \begin{pmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_N\\ \end{pmatrix}\\$

### Regression Models

We can simplify to:

$Y = X\beta + \varepsilon$

with $Y = \begin{pmatrix} Y_1\\ \vdots\\ Y_N\\ \end{pmatrix}, \, X=\begin{pmatrix} 1&X_{1,1}&\dots&X_{1,p}\\ 1&X_{2,1}&\dots&X_{2,p}\\ & \vdots & \\ 1&X_{N,1}&\dots&X_{N,p} \end{pmatrix}, \, \beta = \begin{pmatrix} \beta_1\\ \vdots\\ \beta_p \end{pmatrix}, \, \varepsilon = \begin{pmatrix} \varepsilon_1\\ \vdots\\ \varepsilon_N \end{pmatrix}\\ \\$

### Regression Model

Solution is still

$\hat{\beta} = (X^\top X)^{-1} X^\top Y$

### Singular Value Decomposition (SVD)

We have a data matrix with $$N$$ observations (rows) and $$p$$ columns.

We can write our data matrix as

$X = U D V^\top , \mbox{ with }$

• $$U$$ is $$N\times p$$ with $$U^\top U=I$$
• $$V$$ is $$p\times p$$ with $$V^\top V=I$$ and
• $$D$$ is $$p \times p$$ diagonal matrix

We are changing basis: $$XV = UD$$

And can change back $$XVV^\top = X$$

### SVD

• Say the columns of $$X$$ are perfectly correlated

• Then we really have just one column, the SVD catches this:

 X = U[:,0] d[0,0] Vh[0,:]

### Motivating SVD

Consider this simple linear combination:

$\Delta_j = (Y_{1,j}, \dots, Y_{N,j})'(1/N_1,\dots,1/N_1,-1/N_2,\dots,-1/N_2) =$

$\frac{1}{N_1}\sum_{i=1}^{N_1} Y_{i,j} - \frac{1}{N_2}\sum_{i=N_1+1}^{N} Y_{i,j}, N_2=N-N_1$

Try to find the $$N_1$$ that maximizes the variance

$\frac{1}{p}\sum_{j=1}^p \Delta_j^2$

### Applications of SVD

Plot of standardized petal width and petal length columns