Back propgation에서의 전치행렬(transpose matrix) – 2편

1편이 너무 길어 져서 [latex]\frac{\partial L}{\partial W} = X^T \cdot \frac{\partial L}{\partial Y}[/latex]에 대한 유도는 여기로 나누었다.

이제 [latex]\frac{\partial Y}{\partial W}[/latex]를 보면, 얘들도 모두 matrix이니 [latex]\frac {\partial Y}{\partial W}[/latex]는 다음과 같이 생겼다.

[latex]\frac{\partial Y}{\partial W} = \begin{bmatrix}\frac{\partial Y}{\partial w_{1,1}} & \frac{\partial Y}{\partial w_{1,2}} & \frac{\partial Y}{\partial w_{1,3}} \\\\ \frac{\partial Y}{\partial w_{2,1}} & \frac{\partial Y}{\partial w_{2,2}} & \frac{\partial Y}{\partial w_{2,3}} \end{bmatrix}[/latex]

첫번째 원소인 [latex]\frac{\partial Y}{\partial w_{1,1}}[/latex]을 구하기 위해 이전 처럼, W에 대해 Y로 편미분하면 다음과 같이된다.

[latex]\frac{\partial Y}{\partial w_{1,1}}=\begin{bmatrix}x_{1,1} & 0 & 0 \\\\ x_{2,1} & 0 & 0\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{1,2}}=\begin{bmatrix}0 & x_{1,1} & 0 \\\\ 0 & x_{2,1} & 0\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{1,3}}=\begin{bmatrix}0 & 0 & x_{1,1} \\\\ 0 & 0 & x_{2,1}\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{2,1}}=\begin{bmatrix}x_{1,2} & 0 & 0 \\\\ x_{2,2} & 0 & 0\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{2,2}}=\begin{bmatrix}0 & x_{1,2} & 0 \\\\ 0 & x_{2,2} & 0\end{bmatrix} \\\\ \frac{\partial Y}{\partial w_{2,3}}=\begin{bmatrix}0 & 0 & x_{1,2} \\\\ 0 & 0 & x_{2,2}\end{bmatrix}[/latex]

Matrix W의 각 원소들 역시 scalar이므로 1편에서 X의 경우 처럼, 다음과 같이 나타낼 수 있다.

[latex]\frac{\partial L}{\partial w_{1,1}} = \sum_{i=1}{N} \sum_{j=1}{M}\frac{\partial L}{\partial y_{i,j}} \cdot \frac{\partial y_{i,j}}{\partial w_{1,1}}[/latex]

[latex]\frac{\partial L}{\partial w_{1,1}} = (\frac{\partial L}{\partial y_{1,1}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,1}} \times x_{2,1}) \\\\ \frac{\partial L}{\partial w_{1,2}} = (\frac{\partial L}{\partial y_{1,2}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,2}} \times x_{2,1}) \\\\ \frac{\partial L}{\partial w_{1,3}} = (\frac{\partial L}{\partial y_{1,3}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,3}} \times x_{2,1}) \\\\ \frac{\partial L}{\partial w_{2,1}} = (\frac{\partial L}{\partial y_{1,1}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,1}} \times x_{2,2}) \\\\ \frac{\partial L}{\partial w_{2,2}} = (\frac{\partial L}{\partial y_{1,2}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,2}} \times x_{2,2}) \\\\ \frac{\partial L}{\partial w_{2,3}} = (\frac{\partial L}{\partial y_{1,3}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,3}} \times x_{2,2}) [/latex]

이것을 2X3인 matrix로 나타내면

[latex]\frac{\partial L}{\partial W} = \begin{bmatrix}(\frac{\partial L}{\partial y_{1,1}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,1}} \times x_{2,1}) & (\frac{\partial L}{\partial y_{1,2}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,2}} \times x_{2,1}) & (\frac{\partial L}{\partial y_{1,3}} \times x_{1,1}) + (\frac{\partial L}{\partial y_{2,3}} \times x_{2,1}) \\\\ (\frac{\partial L}{\partial y_{1,1}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,1}} \times x_{2,2}) & (\frac{\partial L}{\partial y_{1,2}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,2}} \times x_{2,2}) & (\frac{\partial L}{\partial y_{1,3}} \times x_{1,2}) + (\frac{\partial L}{\partial y_{2,3}} \times x_{2,2}) \end{bmatrix}[/latex]

Matrix X와 W원소의 위치를 바꿔서 나타내면

[latex]\frac{\partial L}{\partial W} = \begin{bmatrix}(x_{1,1} \times \frac{\partial L}{\partial y_{1,1}}) + (x_{2,1} \times \frac{\partial L}{\partial y_{2,1}}) & (x_{1,1} \times \frac{\partial L}{\partial y_{1,2}}) + (x_{2,1} \times \frac{\partial L}{\partial y_{2,2}}) & (x_{2,1} \times \frac{\partial L}{\partial y_{1,3}}) + (x_{1,1} \times \frac{\partial L}{\partial y_{2,3}}) \\\\ (x_{1,2} \times \frac{\partial L}{\partial y_{1,1}}) + (x_{2,2} \times \frac{\partial L}{\partial y_{2,1}}) & (x_{1,2} \times \frac{\partial L}{\partial y_{1,2}}) + (x_{2,2} \times \frac{\partial L}{\partial y_{2,2}}) & (x_{1,2} \times \frac{\partial L}{\partial y_{1,3}}) + (x_{2,2} \times \frac{\partial L}{\partial y_{2,3}}) \end{bmatrix}[/latex]

Matrix X와 Y로 구분하면

[latex]\frac{\partial L}{\partial W} = \begin{bmatrix}x_{1,1} & x_{2,1} \\\\ x_{1,2} & x_{2,2}\end{bmatrix} \cdot \begin{bmatrix}\frac{\partial L}{\partial y_{1,1}} & \frac{\partial L}{\partial y_{1,2}} & \frac{\partial L}{\partial y_{1,3}} \\\\ \frac{\partial L}{\partial y_{2,1}} & \frac{\partial L}{\partial y_{2,2}} & \frac{\partial L}{\partial y_{2,3}}\end{bmatrix} = X^T \cdot \frac{\partial L}{\partial Y}[/latex]이 성립한다.

댓글 남기기

이메일은 공개되지 않습니다. 필수 입력창은 * 로 표시되어 있습니다