태그 보관물: affine layer

Back propgation에서의 전치행렬(transpose matrix) – 2편

1편이 너무 길어 져서 \frac{\partial L}{\partial W} = X^T \cdot \frac{\partial L}{\partial Y}에 대한 유도는 여기로 나누었다.

이제 \frac{\partial Y}{\partial W}를 보면, 얘들도 모두 matrix이니 \frac {\partial Y}{\partial W}는 다음과 같이 생겼다.

\frac{\partial Y}{\partial W} = \begin{bmatrix}\frac{\partial Y}{\partial w_{1,1}} & \frac{\partial Y}{\partial w_{1,2}} & \frac{\partial Y}{\partial w_{1,3}} \\\\ \frac{\partial Y}{\partial w_{2,1}} & \frac{\partial Y}{\partial w_{2,2}} & \frac{\partial Y}{\partial w_{2,3}} \end{bmatrix}

첫번째 원소인 \frac{\partial Y}{\partial w_{1,1}}을 구하기 위해 이전 처럼, W에 대해 Y로 편미분하면 다음과 같이된다.

\frac{\partial Y}{\partial w_{1,1}}=\begin{bmatrix}x_{1,1} & 0 & 0 \\\\ x_{2,1} & 0 & 0\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{1,2}}=\begin{bmatrix}0 & x_{1,1} & 0 \\\\ 0 & x_{2,1} & 0\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{1,3}}=\begin{bmatrix}0 & 0 & x_{1,1} \\\\ 0 & 0 & x_{2,1}\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{2,1}}=\begin{bmatrix}x_{1,2} & 0 & 0 \\\\ x_{2,2} & 0 & 0\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{2,2}}=\begin{bmatrix}0 & x_{1,2} & 0 \\\\ 0 & x_{2,2} & 0\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{2,3}}=\begin{bmatrix}0 & 0 & x_{1,2} \\\\ 0 & 0 & x_{2,2}\end{bmatrix}

Matrix W의 각 원소들 역시 scalar이므로 1편에서 X의 경우 처럼, 다음과 같이 나타낼 수 있다.

\frac{\partial L}{\partial w_{1,1}} = \sum_{i=1}{N} \sum_{j=1}{M}\frac{\partial L}{\partial y_{i,j}} \cdot \frac{\partial y_{i,j}}{\partial w_{1,1}}

\frac{\partial L}{\partial w_{1,1}} = (\frac{\partial L}{\partial y_{1,1}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,1}} \times x_{2,1}) \\\\ \frac{\partial L}{\partial w_{1,2}} = (\frac{\partial L}{\partial y_{1,2}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,2}} \times x_{2,1}) \\\\ \frac{\partial L}{\partial w_{1,3}} = (\frac{\partial L}{\partial y_{1,3}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,3}} \times x_{2,1}) \\\\ \frac{\partial L}{\partial w_{2,1}} = (\frac{\partial L}{\partial y_{1,1}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,1}} \times x_{2,2}) \\\\ \frac{\partial L}{\partial w_{2,2}} = (\frac{\partial L}{\partial y_{1,2}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,2}} \times x_{2,2}) \\\\ \frac{\partial L}{\partial w_{2,3}} = (\frac{\partial L}{\partial y_{1,3}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,3}} \times x_{2,2})

이것을 2X3인 matrix로 나타내면

\frac{\partial L}{\partial W} = \begin{bmatrix}(\frac{\partial L}{\partial y_{1,1}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,1}} \times x_{2,1}) & (\frac{\partial L}{\partial y_{1,2}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,2}} \times x_{2,1}) & (\frac{\partial L}{\partial y_{1,3}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,3}} \times x_{2,1}) \\\\ (\frac{\partial L}{\partial y_{1,1}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,1}} \times x_{2,2}) & (\frac{\partial L}{\partial y_{1,2}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,2}} \times x_{2,2}) & (\frac{\partial L}{\partial y_{1,3}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,3}} \times x_{2,2}) \end{bmatrix}

Matrix X와 W원소의 위치를 바꿔서 나타내면

\frac{\partial L}{\partial W} = \begin{bmatrix}(x_{1,1} \times \frac{\partial L}{\partial y_{1,1}}) + (x_{2,1} \times \frac{\partial L}{\partial y_{2,1}}) & (x_{1,1} \times \frac{\partial L}{\partial y_{1,2}}) + (x_{2,1} \times \frac{\partial L}{\partial y_{2,2}}) & (x_{2,1} \times \frac{\partial L}{\partial y_{1,3}}) + (x_{1,1} \times \frac{\partial L}{\partial y_{2,3}})  \\\\ (x_{1,2} \times \frac{\partial L}{\partial y_{1,1}}) + (x_{2,2} \times \frac{\partial L}{\partial y_{2,1}}) & (x_{1,2} \times \frac{\partial L}{\partial y_{1,2}}) + (x_{2,2} \times \frac{\partial L}{\partial y_{2,2}}) & (x_{1,2} \times \frac{\partial L}{\partial y_{1,3}}) + (x_{2,2} \times \frac{\partial L}{\partial y_{2,3}}) \end{bmatrix}

Matrix X와 Y로 구분하면

\frac{\partial L}{\partial W} = \begin{bmatrix}x_{1,1} & x_{2,1} \\\\ x_{1,2} & x_{2,2}\end{bmatrix} \cdot \begin{bmatrix}\frac{\partial L}{\partial y_{1,1}}  & \frac{\partial L}{\partial y_{1,2}} & \frac{\partial L}{\partial y_{1,3}} \\\\ \frac{\partial L}{\partial y_{2,1}} & \frac{\partial L}{\partial y_{2,2}} & \frac{\partial L}{\partial y_{2,3}}\end{bmatrix} = X^T \cdot \frac{\partial L}{\partial Y}이 성립한다.