Abstract Keywords
Citation Yao Qing-sheng.矩阵求导.FUTURE & CIVILIZATION Natural/Social Philosophy & Infomation Sciences,20240509. https://yaoqs.github.io/20240509/ju-zhen-qiu-dao/

矩阵求导的本质与分子布局、分母布局的本质(矩阵求导——本质篇)

前言

在一个多月前,针对有同学关于矩阵求导中分子布局、分母布局两者的区别的疑问,我写了如下的这篇答案。

虽然这篇答案给出了几个结论,但是写的没有很严谨,并没有说明矩阵求导本质分子布局分母布局本质

所以,在接下来这篇文章中,我将更严谨地说明矩阵求导的本质与分子布局分母布局本质。希望对初学的同学、想理解本质的同学提供一些帮助。

注1:看懂本文只需了解本科阶段高等数学的偏导如何求、本科阶段线性代数的矩阵*的定义,无需任何其他知识。*

注2:本文若无特殊说明,则约定向量均为*列向量,* x=[x1,x2,,xn]T\pmb{x}=[x_1,x_2,\cdots,x_n]^T

注3:本文仅考虑实数,不考虑复数。**

函数与标量、向量、矩阵[1]

考虑一个函数

function(input)\text{function}(\text{input})

针对 function\text{function} 的类型、 input\text{input} 的类型,我们可以将这个函数 funcion\text{funcion} 分为不同的种类。

1、 function\text{function} 是一个标量

我们称 function\text{function} 是一个**实值标量函数。**用细体小写字母 f 表示。

1.1 input\text{input} 是一个标量

我们称 function\text{function}变元是**标量。**用细体小写字母 x 表示。

例1:

f(x)=x+2(e.g.1)f(x)=x+2 (e.g.1)

1.2 input\text{input} 是一个向量

我们称 function\text{function}变元向量。粗体小写字母 x\pmb{x} 表示。

**例2:**设 x=[x1,x2,x3]T\pmb{x}=[x_1,x_2,x_3]^T

f(x)=a1x12+a2x22+a3x32+a4x1x2(e.g.2)f(\pmb{x})=a_1x_1^2+a_2x_2^2+a_3x_3^2+a_4x_1x_2 (e.g.2)

1.3 input\text{input} 是一个矩阵

我们称 function\text{function}变元矩阵。粗体大写字母 X\pmb{X} 表示。

**例3:**设 X3×2=(xij)i=1,j=13,2\pmb{X}_{3\times 2}=(x_{ij})_{i=1,j=1}^{3,2}

f(X)=a1x112+a2x122+a3x212+a4x222+a5x312+a6x322(e.g.3)f(\pmb{X})=a_1x_{11}^2+a_2x_{12}^2+a_3x_{21}^2+a_4x_{22}^2+a_5x_{31}^2+a_6x_{32}^2 (e.g.3)

2、function\text{function} 是一个向量

我们称 function\text{function} 是一个实向量函数 。用粗体小写字母 f\pmb{f} 表示。

含义f\pmb{f} 是由若干个 ff 组成的一个向量

同样地,变元分三种:标量、向量、矩阵。这里的符号仍与上面相同。

2.1 标量变元

例4:

f3×1(x)=[f1(x)f2(x)f3(x)]=[x+12x+13x2+1](e.g.4)\pmb{f}_{3\times1}(x)= \begin{bmatrix}f_1(x)\\ f_2(x)\\ f_3(x) \end{bmatrix} = \begin{bmatrix} x+1\\ 2x+1\\ 3x^2+1 \end{bmatrix} (e.g.4)

2.2 向量变元

**例5:**设 x=[x1,x2,x3]T\pmb{x}=[x_1,x_2,x_3]^T

f3×1(x)=[f1(x)f2(x)f3(x)]=[x1+x2+x3x12+2x2+2x3x1x2+x2+x3](e.g.5)\pmb{f}_{3\times1}(\pmb{x})= \begin{bmatrix} f_1(\pmb{x})\\ f_2(\pmb{x})\\ f_3(\pmb{x}) \end{bmatrix}= \begin{bmatrix} x_{1}+x_{2}+x_{3}\\ x_{1}^2+2x_{2}+2x_{3}\\ x_{1}x_{2}+x_{2}+x_{3} \end{bmatrix} (e.g.5)

2.3 矩阵变元

**例6:**设 X3×2=(xij)i=1,j=13,2\pmb{X}_{3\times 2}=(x_{ij})_{i=1,j=1}^{3,2}

f3×1(X)=[f1(X)f2(X)f3(X)]=[x11+x12+x21+x22+x31+x32x11+x12+x21+x22+x31+x32+x11x122x11+x12+x21+x22+x31+x32+x11x12](e.g.6)\pmb{f}_{3\times1}(\pmb{X})= \begin{bmatrix} f_1(\pmb{X})\\ f_2(\pmb{X})\\ f_3(\pmb{X})\\ \end{bmatrix} = \begin{bmatrix} x_{11}+x_{12}+x_{21}+x_{22}+x_{31}+x_{32}\\ x_{11}+x_{12}+x_{21}+x_{22}+x_{31}+x_{32}+x_{11}x_{12}\\ 2x_{11}+x_{12}+x_{21}+x_{22}+x_{31}+x_{32}+x_{11}x_{12} \end{bmatrix} (e.g.6)

3、function\text{function} 是一个矩阵

我们称 function\text{function} 是一个实矩阵函数 。用粗体大写字母 F\pmb{F} 表示。

含义F\pmb{F} 是由若干个 f 组成的一个矩阵

同样地,变元分三种:标量、向量、矩阵。这里的符号仍与上面相同。

3.1 标量变元

例7:

F3×2(x)=[f11(x)f12(x)f21(x)f22(x)f31(x)f32(x)]=[x+12x+2x2+12x2+1x3+12x3+1](e.g.7)\pmb{F}_{3\times2}(x)= \begin{bmatrix} f_{11}(x) & f_{12}(x)\\ f_{21}(x) & f_{22}(x)\\ f_{31}(x) & f_{32}(x)\\ \end{bmatrix} = \begin{bmatrix} x+1 & 2x+2\\ x^2+1 & 2x^2+1\\ x^3+1 & 2x^3+1 \end{bmatrix} (e.g.7)

3.2 向量变元

**例8:**设 x=[x1,x2,x3]T\pmb{x}=[x_1,x_2,x_3]^T

F3×2(x)=[f11(x)f12(x)f21(x)f22(x)f31(x)f32(x)]=[2x1+x2+x32x1+2x2+x32x1+2x2+x3x1+2x2+x32x1+x2+2x3x1+2x2+2x3](e.g.8)\pmb{F}_{3\times2}(\pmb{x})= \begin{bmatrix} f_{11}(\pmb{x}) & f_{12}(\pmb{x})\\ f_{21}(\pmb{x}) & f_{22}(\pmb{x})\\ f_{31}(\pmb{x}) & f_{32}(\pmb{x})\\ \end{bmatrix} = \begin{bmatrix} 2x_{1}+x_{2}+x_{3} & 2x_{1}+2x_{2}+x_{3} \\ 2x_{1}+2x_{2}+x_{3} & x_{1}+2x_{2}+x_{3} & \\ 2x_{1}+x_{2}+2x_{3} & x_{1}+2x_{2}+2x_{3} & \end{bmatrix} (e.g.8)

3.3 矩阵变元

**例9:**设 X3×2=(xij)i=1,j=13,2\pmb{X}_{3\times 2}=(x_{ij})_{i=1,j=1}^{3,2}

F3×2(X)=[f11(X)f12(X)f21(X)f22(X)f31(X)f32(X)]=[x11+x12+x21+x22+x31+x322x11+x12+x21+x22+x31+x323x11+x12+x21+x22+x31+x324x11+x12+x21+x22+x31+x325x11+x12+x21+x22+x31+x326x11+x12+x21+x22+x31+x32](e.g.9)\pmb{F}_{3\times2}(\pmb{X})= \begin{bmatrix} f_{11}(\pmb{X}) & f_{12}(\pmb{X})\\ f_{21}(\pmb{X}) & f_{22}(\pmb{X})\\ f_{31}(\pmb{X}) & f_{32}(\pmb{X})\\ \end{bmatrix}= \begin{bmatrix} x_{11}+x_{12}+x_{21}+x_{22}+x_{31}+x_{32} & 2x_{11}+x_{12}+x_{21}+x_{22}+x_{31}+x_{32}\\ 3x_{11}+x_{12}+x_{21}+x_{22}+x_{31}+x_{32} & 4x_{11}+x_{12}+x_{21}+x_{22}+x_{31}+x_{32}\\ 5x_{11}+x_{12}+x_{21}+x_{22}+x_{31}+x_{32} & 6x_{11}+x_{12}+x_{21}+x_{22}+x_{31}+x_{32} \end{bmatrix} (e.g.9)

4、总结

函数与标量、向量、矩阵

矩阵求导的本质

我们在高等数学[2]中学过,对于一个多元函数

例10:

f(x1,x2,x3)=x12+x1x2+x2x3(e.g.10)f(x_1,x_2,x_3)=x_1^2+x_1x_2+x_2x_3 (e.g.10)

我们可以将 f 对 x1,x2,x3x_1,x_2,x_3偏导分别求出来**,即:**

{fx1=2x1+x2fx2=x1+x3fx3=x2\begin{cases}\frac{\partial f}{\partial x_1} & = 2x_1+x_2 \\\\ \frac{\partial f}{\partial x_2} & = x_1+x_3 \\\\ \frac{\partial f}{\partial x_3} & = x_2 \end{cases}

矩阵求导也是一样的,本质就是 function\text{function} 中的每个 f 分别对变元中的每个元素逐个求偏导,只不过写成了向量、矩阵形式而已。

对于 (e.g.10) ,我们把得出的3个结果写成列向量形式:

f(x)x3×1=[fx1fx2fx3]=[2x1+x2x1+x3x2](1)\frac{\partial f(\pmb{x})}{\partial \pmb{x}_{3\times1}}= \begin{bmatrix} \frac{\partial f}{\partial x_1}\\ \frac{\partial f}{\partial x_2}\\ \frac{\partial f}{\partial x_3}\\ \end{bmatrix} = \begin{bmatrix} 2x_1+x_2\\ x_1+x_3\\ x_2 \end{bmatrix} (1)

一个矩阵求导以列向量形式展开的雏形就出现了。

当然我们也可以以行向量形式展开:

f(x)x3×1T=[fx1,fx2,fx3]=[2x1+x2,x1+x3,x2](2)\frac{\partial f(\pmb{x})}{\partial \pmb{x}_{3\times1}^T}= [\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \frac{\partial f}{\partial x_3} ]= [ 2x_1+x_2, x_1+x_3, x_2 ] (2)

所以,如果 function\text{function} 中有 m 个 f ,变元中有 n 个元素,那么,每个 f 对变元中的每个元素逐个求偏导后,我们就会产生 m×nm \times n 个结果。

这就是矩阵求导的本质。

至于这 m×nm \times n 个结果的布局,是写成行向量,还是写成列向量,还是写成矩阵,就是我们接下来要讨论的事情。

矩阵求导结果的布局

不严谨地说,从直观上看:

分子布局,就是分子是列向量形式,分母是行向量形式,如 (2) 式。如果这里的 function\text{function}实向量函数 f2×1\pmb{f}_{2\times 1} 的话,结果就是 2×32 \times 3 的矩阵了:

f2×1(x)x3×1T=[f1x1f1x2f1x3f2x1f2x2f2x3]2×3(3)\frac{\partial \pmb{f}_{2\times1}(\pmb{x})}{\partial \pmb{x}^T_{3\times1}}= \begin{bmatrix} \frac{\partial f_1}{\partial x_1}& \frac{\partial f_1}{\partial x_2}& \frac{\partial f_1}{\partial x_3}\\ \frac{\partial f_2}{\partial x_1}& \frac{\partial f_2}{\partial x_2}& \frac{\partial f_2}{\partial x_3}\end{bmatrix}_{2\times 3} (3)

分母布局,就是分母是列向量形式,分子是行向量形式,如 (1) 式。如果这里的 function\text{function}实向量函数 f2×1\pmb{f}_{2\times 1} 的话,结果就是 3×23 \times 2 的矩阵了:

f2×1T(x)x3×1=[f1x1f2x1f1x2f2x2f1x3f2x3]3×2(4)\frac{\partial \pmb{f}^T_{2\times1}(\pmb{x})}{\partial \pmb{x}_{3\times1}}= \begin{bmatrix} \frac{\partial f_1}{\partial x_1}& \frac{\partial f_2}{\partial x_1} \\ \frac{\partial f_1}{\partial x_2}& \frac{\partial f_2}{\partial x_2} \\ \frac{\partial f_1}{\partial x_3}& \frac{\partial f_2}{\partial x_3} \end{bmatrix}_{3\times 2} (4)

直观上理解了之后,我们针对不同类型的 function\text{function} ,不同类型的变元,给出严谨的布局说明。(这里不讨论标量变元的实值标量函数 f(x) ,因为结果就是一个元素嘛~)

1、向量变元的实值标量函数 f(x),x=[x1,x2,,xn]Tf(\pmb{x}) , \pmb{x}=[x_1,x_2,\cdots,x_n]^T

1.1 行向量偏导形式(又称行偏导向量形式)[3]

Dxf(x)=f(x)xT=[fx1,fx2,,fxn](5)\text{D}_{\pmb{x}}f(\pmb{x})= \frac{\partial f(\pmb{x})}{\partial \pmb{x}^T}= [ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \cdots, \frac{\partial f}{\partial x_n}] (5)

1.2 梯度向量形式(又称列向量偏导形式、列偏导向量形式)[4]

xf(x)=f(x)x=[fx1,fx2,,fxn]T(6)\nabla_{\pmb{x}}f(\pmb{x})= \frac{\partial f(\pmb{x})}{\partial \pmb{x}}= [ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \cdots, \frac{\partial f}{\partial x_n}]^T (6)

这两种形式互为转置

2、矩阵变元的实值标量函数 f(X),Xm×n=(xij)i=1,j=1m,nf(\pmb{X}) , \pmb{X}_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n}

先介绍一个符号 vec(X)\text{vec}(\pmb{X}) ,作用是将矩阵 X\pmb{X} 按列堆栈来向量化。

解释一下,vec(X)\text{vec}(\pmb{X}) 就是把矩阵 X\pmb{X} 的第 1 列,第 2 列,直到第 n 列取出来,然后按顺序组成一个列向量,即:

vec(X)=[x11,x21,,xm1,x12,x22,,xm2,,x1n,x2n,,xmn]T(7)\text{vec}({\pmb{X})}= [ x_{11},x_{21},\cdots,x_{m1},x_{12},x_{22},\cdots,x_{m2},\cdots,x_{1n},x_{2n},\cdots,x_{mn}]^T (7)

2.1 行向量偏导形式(又称行偏导向量形式)[3:1]

即先把矩阵变元 X按vec\pmb{X} 按 \text{vec} 向量化,转换成向量变元,再对该向量变元使用 (5) 式:

DvecXf(X)=f(X)vecT(X)=[fx11,fx21,,fxm1,fx12,fx22,,fxm2,,fx1n,fx2n,,fxmn](8)\text{D}_{\text{vec}\pmb{X}}f(\pmb{X})= \frac{\partial f(\pmb{X})}{\partial \text{vec}^T(\pmb{X})} = [ \frac{\partial f}{\partial x_{11}},\frac{\partial f}{\partial x_{21}},\cdots,\frac{\partial f}{\partial x_{m1}},\frac{\partial f}{\partial x_{12}},\frac{\partial f}{\partial x_{22}},\cdots,\frac{\partial f}{\partial x_{m2}},\cdots,\frac{\partial f} {\partial x_{1n}},\frac{\partial f}{\partial x_{2n}},\cdots,\frac{\partial f}{\partial x_{mn}} ] (8)

2.2 Jacobian\text{Jacobian} 矩阵形式[3:2]

即先把矩阵变元 X\pmb{X} 进行转置,再对转置后每个位置的元素逐个求偏导,结果布局和转置布局一样

DXf(X)=f(X)Xm×nT=[fx11fx21fxm1fx12fx22fxm2fx1nfx2nfxmn]n×m (9)\text{D}_{\pmb{X}}f(\pmb{X})= \frac{\partial f(\pmb{X})}{\partial \pmb{X}^T_{m\times n}}= \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \ (9)

2.3 梯度向量形式(又称列向量偏导形式、列偏导向量形式)[4:1]

即先把矩阵变元 X按vec\pmb{X} 按 \text{vec} 向量化,转换成向量变元,再对该变元使用 (6) 式:

vecXf(X)=f(X)vecX=[fx11,fx21,,fxm1,fx12,fx22,,fxm2,,fx1n,fx2n,,fxmn]T(10)\nabla_{\text{vec}\pmb{X}}f(\pmb{X})= \frac{\partial f(\pmb{X})}{\partial \text{vec}\pmb{X}} = [ \frac{\partial f}{\partial x_{11}},\frac{\partial f}{\partial x_{21}},\cdots,\frac{\partial f}{\partial x_{m1}},\frac{\partial f}{\partial x_{12}},\frac{\partial f}{\partial x_{22}},\cdots,\frac{\partial f}{\partial x_{m2}},\cdots,\frac{\partial f} {\partial x_{1n}},\frac{\partial f}{\partial x_{2n}},\cdots,\frac{\partial f}{\partial x_{mn}}]^T (10)

2.4 梯度矩阵形式[4:2]

直接对原矩阵变元 X\pmb{X}每个位置的元素逐个求偏导,结果布局和原矩阵布局一样

Xf(X)=f(X)Xm×n=[fx11fx12fx1nfx21fx22fx2nfxm1fxm2fxmn]m×n (11)\nabla_{\pmb{X}}f(\pmb{X})= \frac{\partial f(\pmb{X})}{\partial \pmb{X}_{m\times n}} = \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}} \\ \frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{2n}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{m\times n} \ (11)

2.5 一些发现

2.5.1 转置

(8) 式与 (10) 式互为转置; (9) 式与 (11) 式互为转置

2.5.2 相等

矩阵变元 X\pmb{X} 本身就是一个列向量 x=[x1,x2,,xn]T\pmb{x}=[x_1,x_2,\cdots,x_n]^T 时, (5) 式、 (8) 式、 (9) 式相等; (6) 式、 (10) 式、 (11) 式相等;当然,前三个式子与后三个式子互为转置

这一发现说明,对于向量变元的实值标量函数 f(x),x=[x1,x2,,xn]Tf(\pmb{x}) , \pmb{x}=[x_1,x_2,\cdots,x_n]^T,结果布局本质上有两种形式,一种是 Jacobian\text{Jacobian} 矩阵(已经成行向量了)形式,一种是梯度矩阵(已经成列向量了)形式。两种形式互为转置

3、矩阵变元的实矩阵函数 F(X),Xm×n=(xij)i=1,j=1m,nFp×q=(fij)i=1,j=1p,q\pmb{F}(\pmb{X}) , \pmb{X}_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n}, \pmb{F}_{p\times q}=(f_{ij})_{i=1,j=1}^{p,q}

3.1 Jacobian\text{Jacobian} 矩阵形式[5]

即先把矩阵变元 X按vec\pmb{X} 按 \text{vec}向量化,转换成向量变元:

vec(X)=[x11,x21,,xm1,x12,x22,,xm2,,x1n,x2n,,xmn]T(7)\text{vec}({\pmb{X})}= [ x_{11},x_{21},\cdots,x_{m1},x_{12},x_{22},\cdots,x_{m2},\cdots,x_{1n},x_{2n},\cdots,x_{mn}]^T (7)

再把实矩阵函数 F按vec\pmb{F} 按\text{vec}向量化,转换成实向量函数:

vec(F(X))=[f11(X),f21(X),,fp1(X),f12(X),f22(X),,fp2(X),,f1q(X),f2q(X),,fpq(X)]T(12)\text{vec}({\pmb{F}(\pmb{X}))}= [ f_{11}(\pmb{X}),f_{21}(\pmb{X}),\cdots,f_{p1}(\pmb{X}),f_{12}(\pmb{X}),f_{22}(\pmb{X}),\cdots,f_{p2}(\pmb{X}),\cdots,f_{1q}(\pmb{X}),f_{2q}(\pmb{X}),\cdots,f_{pq}(\pmb{X}) ]^T (12)

这样,我们就把一个矩阵变元的实矩阵函数 F(X)\pmb{F}(\pmb{X}) ,转换成了向量变元的实向量函数 f(x)\pmb{f}(\pmb{x}) 。接着,对照 (3) 式写出结果布局为 pq×mnpq\times mn 的矩阵:

DXF(X)=vecpq×1(F(X))vecmn×1TX=[f11x11f11x21f11xm1f11x12f11x22f11xm2f11x1nf11x2nf11xmnf21x11f21x21f21xm1f21x12f21x22f21xm2f21x1nf21x2nf21xmnfp1x11fp1x21fp1xm1fp1x12fp1x22fp1xm2fp1x1nfp1x2nfp1xmnf12x11f12x21f12xm1f12x12f12x22f12xm2f12x1nf12x2nf12xmnf22x11f22x21f22xm1f22x12f22x22f22xm2f22x1nf22x2nf22xmnfp2x11fp2x21fp2xm1fp2x12fp2x22fp2xm2fp2x1nfp2x2nfp2xmnf1qx11f1qx21f1qxm1f1qx12f1qx22f1qxm2f1qx1nf1qx2nf1qxmnf2qx11f2qx21f2qxm1f2qx12f2qx22f2qxm2f2qx1nf2qx2nf2qxmnfpqx11fpqx21fpqxm1fpqx12fpqx22fpqxm2fpqx1nfpqx2nfpqxmn]pq×mn (13)\text{D}_{\pmb{X}}\pmb{F}(\pmb{X}) =\frac{\partial \text{vec}_{pq\times 1}(\pmb{F}_{}(\pmb{X}))}{\partial \text{vec}^T_{mn\times 1}\pmb{X}} = \begin{bmatrix} \frac{\partial f_{11}}{\partial x_{11}}&\frac{\partial f_{11}}{\partial x_{21}}&\cdots&\frac{\partial f_{11}}{\partial x_{m1}}&\frac{\partial f_{11}}{\partial x_{12}}&\frac{\partial f_{11}}{\partial x_{22}}&\cdots&\frac{\partial f_{11}}{\partial x_{m2}}&\cdots&\frac{\partial f_{11}}{\partial x_{1n}}&\frac{\partial f_{11}}{\partial x_{2n}}&\cdots&\frac{\partial f_{11}}{\partial x_{mn}}\\ \frac{\partial f_{21}}{\partial x_{11}}&\frac{\partial f_{21}}{\partial x_{21}}&\cdots&\frac{\partial f_{21}}{\partial x_{m1}}&\frac{\partial f_{21}}{\partial x_{12}}&\frac{\partial f_{21}}{\partial x_{22}}&\cdots&\frac{\partial f_{21}}{\partial x_{m2}}&\cdots&\frac{\partial f_{21}}{\partial x_{1n}}&\frac{\partial f_{21}}{\partial x_{2n}}&\cdots&\frac{\partial f_{21}}{\partial x_{mn}}\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \frac{\partial f_{p1}}{\partial x_{11}}&\frac{\partial f_{p1}}{\partial x_{21}}&\cdots&\frac{\partial f_{p1}}{\partial x_{m1}}&\frac{\partial f_{p1}}{\partial x_{12}}&\frac{\partial f_{p1}}{\partial x_{22}}&\cdots&\frac{\partial f_{p1}}{\partial x_{m2}}&\cdots&\frac{\partial f_{p1}}{\partial x_{1n}}&\frac{\partial f_{p1}}{\partial x_{2n}}&\cdots&\frac{\partial f_{p1}}{\partial x_{mn}}\\ \frac{\partial f_{12}}{\partial x_{11}}&\frac{\partial f_{12}}{\partial x_{21}}&\cdots&\frac{\partial f_{12}}{\partial x_{m1}}&\frac{\partial f_{12}}{\partial x_{12}}&\frac{\partial f_{12}}{\partial x_{22}}&\cdots&\frac{\partial f_{12}}{\partial x_{m2}}&\cdots&\frac{\partial f_{12}}{\partial x_{1n}}&\frac{\partial f_{12}}{\partial x_{2n}}&\cdots&\frac{\partial f_{12}}{\partial x_{mn}}\\ \frac{\partial f_{22}}{\partial x_{11}}&\frac{\partial f_{22}}{\partial x_{21}}&\cdots&\frac{\partial f_{22}}{\partial x_{m1}}&\frac{\partial f_{22}}{\partial x_{12}}&\frac{\partial f_{22}}{\partial x_{22}}&\cdots&\frac{\partial f_{22}}{\partial x_{m2}}&\cdots&\frac{\partial f_{22}}{\partial x_{1n}}&\frac{\partial f_{22}}{\partial x_{2n}}&\cdots&\frac{\partial f_{22}}{\partial x_{mn}}\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \frac{\partial f_{p2}}{\partial x_{11}}&\frac{\partial f_{p2}}{\partial x_{21}}&\cdots&\frac{\partial f_{p2}}{\partial x_{m1}}&\frac{\partial f_{p2}}{\partial x_{12}}&\frac{\partial f_{p2}}{\partial x_{22}}&\cdots&\frac{\partial f_{p2}}{\partial x_{m2}}&\cdots&\frac{\partial f_{p2}}{\partial x_{1n}}&\frac{\partial f_{p2}}{\partial x_{2n}}&\cdots&\frac{\partial f_{p2}}{\partial x_{mn}}\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \frac{\partial f_{1q}}{\partial x_{11}}&\frac{\partial f_{1q}}{\partial x_{21}}&\cdots&\frac{\partial f_{1q}}{\partial x_{m1}}&\frac{\partial f_{1q}}{\partial x_{12}}&\frac{\partial f_{1q}}{\partial x_{22}}&\cdots&\frac{\partial f_{1q}}{\partial x_{m2}}&\cdots&\frac{\partial f_{1q}}{\partial x_{1n}}&\frac{\partial f_{1q}}{\partial x_{2n}}&\cdots&\frac{\partial f_{1q}}{\partial x_{mn}}\\ \frac{\partial f_{2q}}{\partial x_{11}}&\frac{\partial f_{2q}}{\partial x_{21}}&\cdots&\frac{\partial f_{2q}}{\partial x_{m1}}&\frac{\partial f_{2q}}{\partial x_{12}}&\frac{\partial f_{2q}}{\partial x_{22}}&\cdots&\frac{\partial f_{2q}}{\partial x_{m2}}&\cdots&\frac{\partial f_{2q}}{\partial x_{1n}}&\frac{\partial f_{2q}}{\partial x_{2n}}&\cdots&\frac{\partial f_{2q}}{\partial x_{mn}}\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \frac{\partial f_{pq}}{\partial x_{11}}&\frac{\partial f_{pq}}{\partial x_{21}}&\cdots&\frac{\partial f_{pq}}{\partial x_{m1}}&\frac{\partial f_{pq}}{\partial x_{12}}&\frac{\partial f_{pq}}{\partial x_{22}}&\cdots&\frac{\partial f_{pq}}{\partial x_{m2}}&\cdots&\frac{\partial f_{pq}}{\partial x_{1n}}&\frac{\partial f_{pq}}{\partial x_{2n}}&\cdots&\frac{\partial f_{pq}}{\partial x_{mn}}\\ \end{bmatrix}_{pq\times mn} \ (13)

3.2 梯度矩阵形式[6]

即先把矩阵变元 X按vec\pmb{X} 按 \text{vec}向量化,转换成向量变元:

vec(X)=[x11,x21,,xm1,x12,x22,,xm2,,x1n,x2n,,xmn]T(7)\text{vec}({\pmb{X})}= [ x_{11},x_{21},\cdots,x_{m1},x_{12},x_{22},\cdots,x_{m2},\cdots,x_{1n},x_{2n},\cdots,x_{mn}]^T (7)

再把实矩阵函数 F按vec\pmb{F} 按\text{vec}向量化,转换成实向量函数:

vec(F(X))=[f11(X),f21(X),,fp1(X),f12(X),f22(X),,fp2(X),,f1q(X),f2q(X),,fpq(X)]T(12)\text{vec}({\pmb{F}(\pmb{X}))}= [ f_{11}(\pmb{X}),f_{21}(\pmb{X}),\cdots,f_{p1}(\pmb{X}),f_{12}(\pmb{X}),f_{22}(\pmb{X}),\cdots,f_{p2}(\pmb{X}),\cdots,f_{1q}(\pmb{X}),f_{2q}(\pmb{X}),\cdots,f_{pq}(\pmb{X}) ]^T (12)

这样,我们就把一个矩阵变元的实矩阵函数 F(X)\pmb{F}(\pmb{X}) ,转换成了向量变元的实向量函数 f(x)\pmb{f}(\pmb{x}) 。接着,对照 (4) 式写出结果布局为 mn×pqmn \times pq 的矩阵:

XF(X)=vecpq×1T(F(X))vecmn×1X=[f11x11f21x11fp1x11f12x11f22x11fp2x11f1qx11f2qx11fpqx11f11x21f21x21fp1x21f12x21f22x21fp2x21f1qx21f2qx21fpqx21f11xm1f21xm1fp1xm1f12xm1f22xm1fp2xm1f1qxm1f2qxm1fpqxm1f11x12f21x12fp1x12f12x12f22x12fp2x12f1qx12f2qx12fpqx12f11x22f21x22fp1x22f12x22f22x22fp2x22f1qx22f2qx22fpqx22f11xm2f21xm2fp1xm2f12xm2f22xm2fp2xm2f1qxm2f2qxm2fpqxm2f11x1nf21x1nfp1x1nf12x1nf22x1nfp2x1nf1qx1nf2qx1nfpqx1nf11x2nf21x2nfp1x2nf12x2nf22x2nfp2x2nf1qx2nf2qx2nfpqx2nf11xmnf21xmnfp1xmnf12xmnf22xmnfp2xmnf1qxmnf2qxmnfpqxmn]mn×pq (14)\nabla_{\pmb{X}}\pmb{F}(\pmb{X}) =\frac{\partial \text{vec}_{pq\times 1}^T(\pmb{F}_{}(\pmb{X}))}{\partial \text{vec}_{mn\times 1}\pmb{X}} = \begin{bmatrix} \frac{\partial f_{11}}{\partial x_{11}}&\frac{\partial f_{21}}{\partial x_{11}}&\cdots&\frac{\partial f_{p1}}{\partial x_{11}}&\frac{\partial f_{12}}{\partial x_{11}}&\frac{\partial f_{22}}{\partial x_{11}}&\cdots&\frac{\partial f_{p2}}{\partial x_{11}}&\cdots&\frac{\partial f_{1q}}{\partial x_{11}}&\frac{\partial f_{2q}}{\partial x_{11}}&\cdots&\frac{\partial f_{pq}}{\partial x_{11}}\\ \frac{\partial f_{11}}{\partial x_{21}}&\frac{\partial f_{21}}{\partial x_{21}}&\cdots&\frac{\partial f_{p1}}{\partial x_{21}}&\frac{\partial f_{12}}{\partial x_{21}}&\frac{\partial f_{22}}{\partial x_{21}}&\cdots&\frac{\partial f_{p2}}{\partial x_{21}}&\cdots&\frac{\partial f_{1q}}{\partial x_{21}}&\frac{\partial f_{2q}}{\partial x_{21}}&\cdots&\frac{\partial f_{pq}}{\partial x_{21}}\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \frac{\partial f_{11}}{\partial x_{m1}}&\frac{\partial f_{21}}{\partial x_{m1}}&\cdots&\frac{\partial f_{p1}}{\partial x_{m1}}&\frac{\partial f_{12}}{\partial x_{m1}}&\frac{\partial f_{22}}{\partial x_{m1}}&\cdots&\frac{\partial f_{p2}}{\partial x_{m1}}&\cdots&\frac{\partial f_{1q}}{\partial x_{m1}}&\frac{\partial f_{2q}}{\partial x_{m1}}&\cdots&\frac{\partial f_{pq}}{\partial x_{m1}}\\ \frac{\partial f_{11}}{\partial x_{12}}&\frac{\partial f_{21}}{\partial x_{12}}&\cdots&\frac{\partial f_{p1}}{\partial x_{12}}&\frac{\partial f_{12}}{\partial x_{12}}&\frac{\partial f_{22}}{\partial x_{12}}&\cdots&\frac{\partial f_{p2}}{\partial x_{12}}&\cdots&\frac{\partial f_{1q}}{\partial x_{12}}&\frac{\partial f_{2q}}{\partial x_{12}}&\cdots&\frac{\partial f_{pq}}{\partial x_{12}}\\ \frac{\partial f_{11}}{\partial x_{22}}&\frac{\partial f_{21}}{\partial x_{22}}&\cdots&\frac{\partial f_{p1}}{\partial x_{22}}&\frac{\partial f_{12}}{\partial x_{22}}&\frac{\partial f_{22}}{\partial x_{22}}&\cdots&\frac{\partial f_{p2}}{\partial x_{22}}&\cdots&\frac{\partial f_{1q}}{\partial x_{22}}&\frac{\partial f_{2q}}{\partial x_{22}}&\cdots&\frac{\partial f_{pq}}{\partial x_{22}}\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \frac{\partial f_{11}}{\partial x_{m2}}&\frac{\partial f_{21}}{\partial x_{m2}}&\cdots&\frac{\partial f_{p1}}{\partial x_{m2}}&\frac{\partial f_{12}}{\partial x_{m2}}&\frac{\partial f_{22}}{\partial x_{m2}}&\cdots&\frac{\partial f_{p2}}{\partial x_{m2}}&\cdots&\frac{\partial f_{1q}}{\partial x_{m2}}&\frac{\partial f_{2q}}{\partial x_{m2}}&\cdots&\frac{\partial f_{pq}}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \frac{\partial f_{11}}{\partial x_{1n}}&\frac{\partial f_{21}}{\partial x_{1n}}&\cdots&\frac{\partial f_{p1}}{\partial x_{1n}}&\frac{\partial f_{12}}{\partial x_{1n}}&\frac{\partial f_{22}}{\partial x_{1n}}&\cdots&\frac{\partial f_{p2}}{\partial x_{1n}}&\cdots&\frac{\partial f_{1q}}{\partial x_{1n}}&\frac{\partial f_{2q}}{\partial x_{1n}}&\cdots&\frac{\partial f_{pq}}{\partial x_{1n}}\\ \frac{\partial f_{11}}{\partial x_{2n}}&\frac{\partial f_{21}}{\partial x_{2n}}&\cdots&\frac{\partial f_{p1}}{\partial x_{2n}}&\frac{\partial f_{12}}{\partial x_{2n}}&\frac{\partial f_{22}}{\partial x_{2n}}&\cdots&\frac{\partial f_{p2}}{\partial x_{2n}}&\cdots&\frac{\partial f_{1q}}{\partial x_{2n}}&\frac{\partial f_{2q}}{\partial x_{2n}}&\cdots&\frac{\partial f_{pq}}{\partial x_{2n}}\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \frac{\partial f_{11}}{\partial x_{mn}}&\frac{\partial f_{21}}{\partial x_{mn}}&\cdots&\frac{\partial f_{p1}}{\partial x_{mn}}&\frac{\partial f_{12}}{\partial x_{mn}}&\frac{\partial f_{22}}{\partial x_{mn}}&\cdots&\frac{\partial f_{p2}}{\partial x_{mn}}&\cdots&\frac{\partial f_{1q}}{\partial x_{mn}}&\frac{\partial f_{2q}}{\partial x_{mn}}&\cdots&\frac{\partial f_{pq}}{\partial x_{mn}}\\ \end{bmatrix}_{mn \times pq} \ (14)

3.3 一些发现

3.3.1 转置

(13) 式与 (14) 式互为转置

3.3.2 相等1

实矩阵函数 F\pmb{F} 本身是一个实值标量函数 f 时, (8) 式、(13) 式相等; (10) 式、 (14) 式相等;当然,前两个式子与后两个式子互为转置

这一发现说明,对于矩阵变元的实值标量函数 f(X),Xm×n=(xij)i=1,j=1m,nf(\pmb{X}) , \pmb{X}_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n},结果布局本质上有四种形式,第一种是 Jacobian\text{Jacobian} 矩阵(已经成行向量了)形式,第二种是梯度矩阵(已经成列向量了)形式,第三种是 Jacobian\text{Jacobian} 矩阵(就是矩阵)形式,第四种是梯度矩阵(就是矩阵)形式。第一种和第二种形式互为转置,第三种和第四种形式互为转置

3.3.3 相等2

矩阵变元 X\pmb{X} 本身就是一个列向量 x=[x1,x2,,xn]T\pmb{x}=[x_1,x_2,\cdots,x_n]^T 时, 同时实矩阵函数 F\pmb{F} 本身是一个实值标量函数 f 时, (5) 式、 (8) 式、 (9) 式、 (13) 式相等; (6) 式、 (10) 式、 (11) 式、 (14) 式相等;当然,前四个式子与后四个式子互为转置

这一发现仍说明,对于向量变元的实值标量函数 f(x),x=[x1,x2,,xn]Tf(\pmb{x}) , \pmb{x}=[x_1,x_2,\cdots,x_n]^T,结果布局本质上有两种形式,一种是 Jacobian\text{Jacobian} 矩阵(已经成行向量了)形式,一种是梯度矩阵(已经成列向量了)形式。两种形式互为转置

4、矩阵变元的实向量函数 f(X)\pmb{f}(\pmb{X})向量变元的实向量函数 f(x)\pmb{f}(\pmb{x})向量变元的实矩阵函数 F(x)\pmb{F}(\pmb{x})

这三个都可以看做是矩阵变元的实矩阵函数 F(X)\pmb{F}(\pmb{X}) ,可使用**3、**进行计算(因为向量就是一种特殊的矩阵)。

分子布局、分母布局的本质

看到这里,相信同学们对矩阵求导结果的布局有了很全面的了解了,无非就是分子的转置、向量化分母的转置、向量化,它们的各种组合而已。

结合上述知识,我们总结:

1、分子布局的本质:分子是标量列向量、矩阵向量化后的列向量;分母是标量、列向量转置后的行向量、矩阵的转置矩阵、矩阵向量化后的列向量转置后的行向量。包含 (5) 式、 (8) 式、 (9) 式、 (13) 式。

2、分母布局的本质:分子是标量、列向量转置后的行向量、矩阵向量化后的列向量转置后的行向量;分母是标量列向量矩阵自己、矩阵向量化后的列向量。包含 (6) 式、 (10) 式、 (11) 式、 (14) 式。

思考一下,其实我们可以再简洁一些:谁转置了,就是另一方的布局。分子转置了,就是分母布局;分母转置了,就是分子布局。

最终,我们列一个表格,总结分子布局、分子布局的本质:

分子布局、分母布局的本质

本文到这里就结束了,希望对大家有帮助。如果有时间的话,后面我会再发一篇文章,来进行**若干常见矩阵求导公式的数学推导。**欢迎大家点赞、关注、收藏、转发噢~

矩阵求导系列其他文章:

对称矩阵的求导,以多元正态分布的极大似然估计为例(矩阵求导——补充篇) - Iterator的文章 - 知乎

矩阵求导公式的数学推导(矩阵求导——进阶篇) - Iterator的文章 - 知乎

矩阵求导公式的数学推导(矩阵求导——基础篇) - Iterator的文章 - 知乎

参考

矩阵求导公式的数学推导(矩阵求导——基础篇)

前言

1、看本文之前请务必先看这篇文章:

下文以"本质篇"指代上面这篇文章。

2、本文介绍向量变元实值标量函数矩阵变元实值标量函数最基础的矩阵求导公式的数学推导。掌握了这些最基础的推导,才能理解之后的那些千变万化的技巧

3、进阶的技巧(矩阵的迹tr(A)\mathrm{tr}(\pmb{A})一阶实矩阵微分dX\mathrm{d}\pmb{X}会在下一篇讲,本篇涉及。

4、本文使用的符号与本质篇相同。

5、看懂本文需要了解本质篇所提及的知识,以及了解本科阶段线性代数中矩阵乘法向量内积的知识**,无需任何其他知识。**

**6、**有一个矩阵求导的网站,大家可以验证自己算的结果是否正确。

向量变元的实值标量函数

f(x),x=[x1,x2,,xn]Tf(\pmb{x}),\pmb{x}=[x_1,x_2,\cdots,x_n]^T

我们使用梯度向量形式,即 (本质篇_6) 式

xf(x)=f(x)x=[fx1,fx2,,fxn]T(本质6)\nabla_{\pmb{x}}f(\pmb{x})= \frac{\partial f(\pmb{x})}{\partial \pmb{x}}= [ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \cdots, \frac{\partial f}{\partial x_n}]^T (本质篇_6)

1、四个法则

1.1 常数求导[1:1]

与一元函数常数求导相同:结果为零向量

cx=0n×1(1)\frac{\partial c}{ \partial \pmb{x}}=\pmb{0}_{n \times 1} (1)

其中, c 为常数。

证明:

cx=[cx1cx2cxn]=[000]=0n×1(2)\frac{\partial{c}}{\partial{\pmb{x}}} = \begin{bmatrix} \frac{\partial{c}}{\partial{x_1}} \\ \frac{\partial{c}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{c}}{\partial{x_n}} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0\end{bmatrix} =\pmb{0}_{n \times 1}(2)

证毕。

1.2 线性法则[1:2]

与一元函数求导线性法则相同:相加再求导等于求导再相加,常数提外面

[c1f(x)+c2g(x)]x=c1f(x)x+c2g(x)x(3)\frac{\partial{[c_1f(\pmb{x})+c_2g(\pmb{x})]}}{\partial{\pmb{x}}} = c_1\frac{\partial f(\pmb{x})}{\partial{\pmb{x}}} + c_2\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} (3)

其中, c1,c2c_1,c_2 为常数。

证明:

[c1f(x)+c2g(x)]x=[(c1f+c2g)x1(c1f+c2g)x2(c1f+c2g)xn]=[c1fx1+c2gx1c1fx2+c2gx2c1fxn+c2gxn]=c1[fx1fx2fxn]+c2[gx1gx2gxn]=c1f(x)x+c2g(x)x(4)\frac{\partial{[c_1f(\pmb{x})+c_2g(\pmb{x})]}}{\partial{\pmb{x}}} = \begin{bmatrix} \frac{\partial{(c_1f+c_2g)}}{\partial{x_1}} \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_n}} \end{bmatrix} = \begin{bmatrix} c_1\frac{\partial{f}}{\partial{x_1}}+c_2\frac{\partial{g}}{\partial{x_1}} \\ c_1\frac{\partial{f}}{\partial{x_2}}+c_2\frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ c_1\frac{\partial{f}}{\partial{x_n}}+c_2\frac{\partial{g}}{\partial{x_n}} \end{bmatrix} =c_1\begin{bmatrix} \frac{\partial{f}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{f}}{\partial{x_n}} \end{bmatrix} + c_2\begin{bmatrix} \frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{g}}{\partial{x_n}} \end{bmatrix} =c_1\frac{\partial f(\pmb{x})}{\partial{\pmb{x}}} + c_2\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} (4)

证毕。

1.3 乘积法则[1]

与一元函数求导乘积法则相同:前导后不导 前不导后导

[f(x)g(x)]x=f(x)xg(x)+f(x)g(x)x(5)\frac{\partial{[f(\pmb{x})g(\pmb{x})]}}{\partial{\pmb{x}}} = \frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) +f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} (5)

证明:

[f(x)g(x)]x=[(fg)x1(fg)x2(fg)xn]=[fx1g+fgx1fx2g+fgx2fxng+fgxn]=[fx1fx2fxn]g+f[gx1gx2gxn]=f(x)xg(x)+f(x)g(x)x(6)\frac{\partial{[f(\pmb{x})g(\pmb{x})]}}{\partial{\pmb{x}}} = \begin{bmatrix} \frac{\partial{(fg)}}{\partial{x_1}} \\ \frac{\partial{(fg)}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{(fg)}}{\partial{x_n}} \end{bmatrix} = \begin{bmatrix} \frac{\partial{f}}{\partial{x_1}}g+f\frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}}g+f\frac{\partial{g}}{\partial{x_2}}\\ \vdots \\ \frac{\partial{f}}{\partial{x_n}}g+f\frac{\partial{g}}{\partial{x_n}} \end{bmatrix} =\begin{bmatrix} \frac{\partial{f}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{f}}{\partial{x_n}} \end{bmatrix}g + f\begin{bmatrix} \frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{g}}{\partial{x_n}} \end{bmatrix} =\frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) +f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} (6)

证毕。

1.4 商法则[1]

与一元函数求导商法则相同:(上导下不导 上不导下导)除以(下的平方):

[f(x)g(x)]x=1g2(x)[f(x)xg(x)f(x)g(x)x](7)\frac{\partial{[\frac{f(\pmb{x})}{g(\pmb{x})}]}}{\partial{\pmb{x}}} = \frac{1}{g^2(\pmb{x})}\begin{bmatrix} \frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) -f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}}\end{bmatrix} (7)

其中, g(x)0g(\pmb{x})\neq0

证明:

[f(x)g(x)]x=[(fg)x1(fg)x2(fg)xn]=[1g2(fx1gfgx1)1g2(fx2gfgx2)1g2(fxngfgxn)]=1g2([fx1fx2fxn]gf[gx1gx2gxn])=1g2(x)[f(x)xg(x)f(x)g(x)x](8)\frac{\partial{[\frac{f(\pmb{x})}{g(\pmb{x})}]}}{\partial{\pmb{x}}} = \begin{bmatrix} \frac{\partial{(\frac{f}{g})}}{\partial{x_1}} \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_n}} \end{bmatrix} = \begin{bmatrix} \frac{1}{g^2}( \frac{\partial f}{\partial{x_1}}g -f\frac{\partial g}{\partial{x_1}} ) \\ \frac{1}{g^2}( \frac{\partial f}{\partial{x_2}}g -f\frac{\partial g}{\partial{x_2}} )\\ \vdots \\ \frac{1}{g^2}( \frac{\partial f}{\partial{x_n}}g -f\frac{\partial g}{\partial{x_n}} ) \end{bmatrix} = \frac{1}{g^2}\begin{pmatrix} \begin{bmatrix} \frac{\partial{f}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{f}}{\partial{x_n}} \end{bmatrix}g - f\begin{bmatrix} \frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \end{pmatrix} =\frac{1}{g^2(\pmb{x})}\begin{bmatrix} \frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) -f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}}\end{bmatrix} (8)

证毕。

2、几个公式

2.1

(xTa)x=(aTx)x=a(9)\frac{\partial( \pmb{x}^T \pmb{a})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{a}^T\pmb{x})}{\partial{\pmb{x}}} = \pmb{a} (9)

其中, a\pmb{a} 为常数向量, a=(a1,a2,,an)T\pmb{a}=(a_1,a_2,\cdots,a_n)^T

证明:

(xTa)x=(aTx)x=(a1x1+a2x2++anxn)x=[(a1x1+a2x2++anxn)x1(a1x1+a2x2++anxn)x2(a1x1+a2x2++anxn)xn]=[a1a2an]=a(10)\frac{\partial( \pmb{x}^T \pmb{a})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{a}^T\pmb{x})}{\partial{\pmb{x}}} = \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{\pmb{x}}} = \begin{bmatrix} \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{x_1}} \\ \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{x_2}} \\ \vdots \\ \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{x_n}} \end{bmatrix} = \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{bmatrix} = \pmb{a} (10)

证毕。

2.2

(xTx)x=2x(11)\frac{\partial( \pmb{x}^T \pmb{x})}{\partial{\pmb{x}}} = 2\pmb{x} (11)

证明:

(xTx)x=(x12+x22++xn2)x=[(x12+x22++xn2)x1(x12+x22++xn2)x2(x12+x22++xn2)xn]=[2x12x22xn]=2[x1x2xn]=2x(12)\frac{\partial( \pmb{x}^T \pmb{x})}{\partial{\pmb{x}}} = \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{\pmb{x}}} = \begin{bmatrix} \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{x_1}} \\ \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{x_2}} \\ \vdots \\ \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{x_n}} \end{bmatrix} = \begin{bmatrix} 2x_1 \\ 2x_2 \\ \vdots \\ 2x_n \end{bmatrix} = 2\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = 2\pmb{x} (12)

证毕。

2.3

(xTAx)x=Ax+ATx(13)\frac{\partial( \pmb{x}^T \pmb{A}\pmb{x})}{\partial{\pmb{x}}} = \pmb{A}\pmb{x}+\pmb{A}^T \pmb{x} (13)

其中, An×n是常数矩阵,An×n=(aij)i=1,j=1n,n\pmb{A}_{n \times n} 是常数矩阵, \pmb{A}_{n \times n}=(a_{ij})_{i=1,j=1}^{n,n}

证明:

\frac{\partial( \pmb{x}^T \pmb{A}\pmb{x})}{\partial{\pmb{x}}} = \frac{\partial(a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n \ +a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n \ + \cdots \ +a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n)}{\partial{\pmb{x}}} \=\=\=+\=+= \pmb{A}\pmb{x}+\pmb{A}^T \pmb{x} (14)

证毕。

2.4

(aTxxTb)x=abTx+baTx(15)\frac{\partial( \pmb{a}^T\pmb{x}\pmb{x}^T\pmb{b})}{\partial{\pmb{x}}} = \pmb{a}\pmb{b}^T\pmb{x}+\pmb{b}\pmb{a}^T\pmb{x} (15)

其中, a,b\pmb{a},\pmb{b} 为常数向量, a=(a1,a2,,an)T,b=(b1,b2,,bn)T\pmb{a}=(a_1,a_2,\cdots,a_n)^T,\pmb{b}=(b_1,b_2,\cdots,b_n)^T

证明:

因为 aTx=xTa,xTb=bTx\pmb{a}^T\pmb{x}=\pmb{x}^T\pmb{a},\pmb{x}^T\pmb{b}=\pmb{b}^T\pmb{x} ,所以有

(aTxxTb)x=(xTabTx)x(16)\frac{\partial( \pmb{a}^T\pmb{x}\pmb{x}^T\pmb{b})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{x}^T\pmb{a}\pmb{b}^T\pmb{x})}{\partial{\pmb{x}}} (16)

又因为 abTn×n\pmb{a}\pmb{b}^T 是 n \times n 常数矩阵,由 (13) 式得:

(aTxxTb)x=(xTabTx)x=abTx+baTx(17)\frac{\partial( \pmb{a}^T\pmb{x}\pmb{x}^T\pmb{b})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{x}^T\pmb{a}\pmb{b}^T\pmb{x})}{\partial{\pmb{x}}}=\pmb{a}\pmb{b}^T\pmb{x}+\pmb{b}\pmb{a}^T\pmb{x} (17)

证毕。

矩阵变元的实值标量函数

f(X),Xm×n=(xij)i=1,j=1m,nf(\pmb{X}),\pmb{X}_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n}

我们使用梯度矩阵形式,即 (本质篇_11) 式

Xf(X)=f(X)Xm×n=[fx11fx12fx1nfx21fx22fx2nfxm1fxm2fxmn]m×n (本质篇_11)\nabla_{\pmb{X}}f(\pmb{X})= \frac{\partial f(\pmb{X})}{\partial \pmb{X}_{m\times n}} = \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}} \\ \frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{2n}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{m\times n} \ (本质篇\_11)

1、四个法则

1.1 常数求导[1:3]

与一元函数常数求导相同:结果为零矩阵

cX=0m×n(18)\frac{\partial c}{ \partial \pmb{X}}=\pmb{0}_{m \times n} (18)

其中, c 为常数。

证明:

cX=[cx11cx12cx1ncx21cx22cx2ncxm1cxm2cxmn]m×n=[000000000]m×n=0m×n(19)\frac{\partial{c}}{\partial{\pmb{X}}} = \begin{bmatrix} \frac{\partial{c}}{\partial{x_{11}}}&\frac{\partial{c}}{\partial{x_{12}}}&\cdots&\frac{\partial{c}}{\partial{x_{1n}}} \\ \frac{\partial{c}}{\partial{x_{21}}}&\frac{\partial{c}}{\partial{x_{22}}}&\cdots&\frac{\partial{c}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{c}}{\partial{x_{m1}}}&\frac{\partial{c}}{\partial{x_{m2}}}&\cdots&\frac{\partial{c}}{\partial{x_{mn}}} \end{bmatrix}_{m \times n} = \begin{bmatrix} 0&0&\cdots&0 \\ 0&0&\cdots&0 \\ \vdots &\vdots & \vdots & \vdots\\ 0&0&\cdots&0 \end{bmatrix}_{m \times n} =\pmb{0}_{m \times n}(19)

证毕。

1.2 线性法则[1:4]

与一元函数求导线性法则相同:相加再求导等于求导再相加,常数提外面

[c1f(X)+c2g(X)]X=c1f(X)X+c2g(X)X(20)\frac{\partial{[c_1f(\pmb{X})+c_2g(\pmb{X})]}}{\partial{\pmb{X}}} = c_1\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}} + c_2\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} (20)

其中, c1,c2c_1,c_2 为常数。

证明:

[c1f(X)+c2g(X)]X=[(c1f+c2g)x11(c1f+c2g)x12(c1f+c2g)x1n(c1f+c2g)x21(c1f+c2g)x22(c1f+c2g)x2n(c1f+c2g)xm1(c1f+c2g)xm2(c1f+c2g)xmn]=[c1fx11+c2gx11c1fx12+c2gx12c1fx1n+c2gx1nc1fx21+c2gx21c1fx22+c2gx22c1fx2n+c2gx2nc1fxm1+c2gxm1c1fxm2+c2gxm2c1fxmn+c2gxmn]=c1[fx11fx12fx1nfx21fx22fx2nfxm1fxm2fxmn]+c2[gx11gx12gx1ngx21gx22gx2ngxm1gxm2gxmn]=c1f(X)X+c2g(X)X(21)\frac{\partial{[c_1f(\pmb{X})+c_2g(\pmb{X})]}}{\partial{\pmb{X}}} = \begin{bmatrix} \frac{\partial{(c_1f+c_2g)}}{\partial{x_{11}}} & \frac{\partial{(c_1f+c_2g)}}{\partial{x_{12}}} &\cdots &\frac{\partial{(c_1f+c_2g)}}{\partial{x_{1n}}} \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_{21}}} & \frac{\partial{(c_1f+c_2g)}}{\partial{x_{22}}} &\cdots &\frac{\partial{(c_1f+c_2g)}}{\partial{x_{2n}}} \\ \vdots & \vdots& \vdots& \vdots \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_{m1}}} & \frac{\partial{(c_1f+c_2g)}}{\partial{x_{m2}}} &\cdots &\frac{\partial{(c_1f+c_2g)}}{\partial{x_{mn}}} \end{bmatrix} \\= \begin{bmatrix} c_1\frac{\partial{f}}{\partial{x_{11}}}+c_2\frac{\partial{g}}{\partial{x_{11}}}&c_1\frac{\partial{f}}{\partial{x_{12}}}+c_2\frac{\partial{g}}{\partial{x_{12}}}&\cdots&c_1\frac{\partial{f}}{\partial{x_{1n}}}+c_2\frac{\partial{g}}{\partial{x_{1n}}} \\ c_1\frac{\partial{f}}{\partial{x_{21}}}+c_2\frac{\partial{g}}{\partial{x_{21}}}&c_1\frac{\partial{f}}{\partial{x_{22}}}+c_2\frac{\partial{g}}{\partial{x_{22}}}&\cdots&c_1\frac{\partial{f}}{\partial{x_{2n}}}+c_2\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots & \vdots& \vdots& \vdots \\ c_1\frac{\partial{f}}{\partial{x_{m1}}}+c_2\frac{\partial{g}}{\partial{x_{m1}}}&c_1\frac{\partial{f}}{\partial{x_{m2}}}+c_2\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&c_1\frac{\partial{f}}{\partial{x_{mn}}}+c_2\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \\=c_1 \begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}&\frac{\partial{f}}{\partial{x_{12}}}&\cdots&\frac{\partial{f}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}&\frac{\partial{f}}{\partial{x_{22}}}&\cdots&\frac{\partial{f}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{f}}{\partial{x_{m1}}}&\frac{\partial{f}}{\partial{x_{m2}}}&\cdots&\frac{\partial{f}}{\partial{x_{mn}}} \end{bmatrix} + c_2\begin{bmatrix}\frac{\partial{g}}{\partial{x_{11}}}&\frac{\partial{g}}{\partial{x_{12}}}&\cdots&\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{g}}{\partial{x_{21}}}&\frac{\partial{g}}{\partial{x_{22}}}&\cdots&\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{g}}{\partial{x_{m1}}}&\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix}=c_1\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}} + c_2\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} (21)

证毕。

1.3 乘积法则[1]

与一元函数求导乘积法则相同:前导后不导 前不导后导

[f(X)g(X)]X=f(X)Xg(X)+f(X)g(X)X(22)\frac{\partial{[f(\pmb{X})g(\pmb{X})]}}{\partial{\pmb{X}}} = \frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) +f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} (22)

证明:

[f(X)g(X)]X=[(fg)x11(fg)x12(fg)x1n(fg)x21(fg)x22(fg)x2n(fg)xm1(fg)xm2(fg)xmn]=[fx11g+fgx11fx12g+fgx12fx1ng+fgx1nfx21g+fgx21fx22g+fgx22fx2ng+fgx2nfxm1g+fgxm1fxm2g+fgxm2fxmng+fgxmn]=[fx11fx12fx1nfx21fx22fx2nfxm1fxm2fxmn]g+f[gx11gx12gx1ngx21gx22gx2ngxm1gxm2gxmn]=f(X)Xg(X)+f(X)g(X)X(23)\frac{\partial{[f(\pmb{X})g(\pmb{X})]}}{\partial{\pmb{X}}} = \begin{bmatrix} \frac{\partial{(fg)}}{\partial{x_{11}}} & \frac{\partial{(fg)}}{\partial{x_{12}}} & \cdots & \frac{\partial{(fg)}}{\partial{x_{1n}}} \\ \frac{\partial{(fg)}}{\partial{x_{21}}} & \frac{\partial{(fg)}}{\partial{x_{22}}} & \cdots & \frac{\partial{(fg)}}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial{(fg)}}{\partial{x_{m1}}} & \frac{\partial{(fg)}}{\partial{x_{m2}}} & \cdots & \frac{\partial{(fg)}}{\partial{x_{mn}}} \end{bmatrix} = \begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}g+f\frac{\partial{g}}{\partial{x_{11}}} & \frac{\partial{f}}{\partial{x_{12}}}g+f\frac{\partial{g}}{\partial{x_{12}}} & \cdots & \frac{\partial{f}}{\partial{x_{1n}}}g+f\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}g+f\frac{\partial{g}}{\partial{x_{21}}} & \frac{\partial{f}}{\partial{x_{22}}}g+f\frac{\partial{g}}{\partial{x_{22}}} & \cdots & \frac{\partial{f}}{\partial{x_{2n}}}g+f\frac{\partial{g}}{\partial{x_{2n}}}\\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial{f}}{\partial{x_{m1}}}g+f\frac{\partial{g}}{\partial{x_{m1}}} & \frac{\partial{f}}{\partial{x_{m2}}}g+f\frac{\partial{g}}{\partial{x_{m2}}} & \cdots & \frac{\partial{f}}{\partial{x_{mn}}}g+f\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} =\begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}&\frac{\partial{f}}{\partial{x_{12}}}&\cdots&\frac{\partial{f}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}&\frac{\partial{f}}{\partial{x_{22}}}&\cdots&\frac{\partial{f}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{f}}{\partial{x_{m1}}}&\frac{\partial{f}}{\partial{x_{m2}}}&\cdots&\frac{\partial{f}}{\partial{x_{mn}}} \end{bmatrix}g + f\begin{bmatrix}\frac{\partial{g}}{\partial{x_{11}}}&\frac{\partial{g}}{\partial{x_{12}}}&\cdots&\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{g}}{\partial{x_{21}}}&\frac{\partial{g}}{\partial{x_{22}}}&\cdots&\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{g}}{\partial{x_{m1}}}&\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} =\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) +f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} (23)

证毕。

1.4 商法则[1]

与一元函数求导商法则相同:(上导下不导 上不导下导)除以(下的平方):

[f(X)g(X)]X=1g2(X)[f(X)Xg(X)f(X)g(X)X](24)\frac{\partial{[\frac{f(\pmb{X})}{g(\pmb{X})}]}}{\partial{\pmb{X}}} = \frac{1}{g^2(\pmb{X})} \begin{bmatrix} \frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) -f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}}\end{bmatrix} (24)

其中, g(X)0g(\pmb{X})\neq0

证明:

[f(X)g(X)]X=[(fg)x11(fg)x12(fg)x1n(fg)x21(fg)x22(fg)x2n(fg)xm1(fg)xm2(fg)xmn]=[1g2(fx11gfgx11)1g2(fx12gfgx12)1g2(fx1ngfgx1n)1g2(fx21gfgx21)1g2(fx22gfgx22)1g2(fx2ngfgx2n)1g2(fxm1gfgxm1)1g2(fxm2gfgxm2)1g2(fxmngfgxmn)]=1g2([fx11fx12fx1nfx21fx22fx2nfxm1fxm2fxmn]gf[gx11gx12gx1ngx21gx22gx2ngxm1gxm2gxmn])=1g2(X)[f(X)Xg(X)f(X)g(X)X](25)\frac{\partial{[\frac{f(\pmb{X})}{g(\pmb{X})}]}}{\partial{\pmb{X}}} = \begin{bmatrix} \frac{\partial{(\frac{f}{g})}}{\partial{x_{11}}} & \frac{\partial{(\frac{f}{g})}}{\partial{x_{12}}}&\cdots&\frac{\partial{(\frac{f}{g})}}{\partial{x_{1n}}} \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_{21}}} & \frac{\partial{(\frac{f}{g})}}{\partial{x_{22}}}&\cdots&\frac{\partial{(\frac{f}{g})}}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_{m1}}} & \frac{\partial{(\frac{f}{g})}}{\partial{x_{m2}}}&\cdots&\frac{\partial{(\frac{f}{g})}}{\partial{x_{mn}}} \end{bmatrix} = \begin{bmatrix} \frac{1}{g^2}( \frac{\partial f}{\partial{x_{11}}}g -f\frac{\partial g}{\partial{x_{11}}} ) & \frac{1}{g^2}( \frac{\partial f}{\partial{x_{12}}}g -f\frac{\partial g}{\partial{x_{12}}} ) & \cdots & \frac{1}{g^2}( \frac{\partial f}{\partial{x_{1n}}}g -f\frac{\partial g}{\partial{x_{1n}}} ) \\ \frac{1}{g^2}( \frac{\partial f}{\partial{x_{21}}}g -f\frac{\partial g}{\partial{x_{21}}} ) & \frac{1}{g^2}( \frac{\partial f}{\partial{x_{22}}}g -f\frac{\partial g}{\partial{x_{22}}} ) & \cdots & \frac{1}{g^2}( \frac{\partial f}{\partial{x_{2n}}}g -f\frac{\partial g}{\partial{x_{2n}}} )\\ \vdots & \vdots & \vdots & \vdots \\ \frac{1}{g^2}( \frac{\partial f}{\partial{x_{m1}}}g -f\frac{\partial g}{\partial{x_{m1}}} ) & \frac{1}{g^2}( \frac{\partial f}{\partial{x_{m2}}}g -f\frac{\partial g}{\partial{x_{m2}}} ) & \cdots & \frac{1}{g^2}( \frac{\partial f}{\partial{x_{mn}}}g -f\frac{\partial g}{\partial{x_{mn}}} ) \end{bmatrix} = \frac{1}{g^2}\begin{pmatrix} \begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}&\frac{\partial{f}}{\partial{x_{12}}}&\cdots&\frac{\partial{f}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}&\frac{\partial{f}}{\partial{x_{22}}}&\cdots&\frac{\partial{f}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{f}}{\partial{x_{m1}}}&\frac{\partial{f}}{\partial{x_{m2}}}&\cdots&\frac{\partial{f}}{\partial{x_{mn}}} \end{bmatrix}g - f \begin{bmatrix}\frac{\partial{g}}{\partial{x_{11}}}&\frac{\partial{g}}{\partial{x_{12}}}&\cdots&\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{g}}{\partial{x_{21}}}&\frac{\partial{g}}{\partial{x_{22}}}&\cdots&\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{g}}{\partial{x_{m1}}}&\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \end{pmatrix} = \frac{1}{g^2(\pmb{X})}\begin{bmatrix} \frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) -f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}}\end{bmatrix} (25)

证毕。

2、几个公式

2.1

(aTXb)X=abT(26)\frac{\partial( \pmb{a}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T (26)

其中, am×1,bn×1为常数向量,a=(a1,a2,,am)T,b=(b1,b2,,bn)T\pmb{a}_{m \times 1},\pmb{b}_{n \times 1} 为常数向量,\pmb{a}_=(a_1,a_2,\cdots,a_m)^T,\pmb{b}=(b_1,b_2,\cdots,b_n)^T

证明(右击公式,选择在新标签页中打开图片,公式就可以放大了~)

\frac{\partial( \pmb{a}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} = \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\ +\cdots \ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{\pmb{X}}} =_{m \times n} =_{m \times n} =[b_1,b_2,\cdots,b_n] = \pmb{a}\pmb{b}^T (27)

证毕。

2.2

(aTXTb)X=baT(28)\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{b}\pmb{a}^T (28)

其中, an×1,bm×1为常数向量,a=(a1,a2,,an)T,b=(b1,b2,,bm)T\pmb{a}_{n \times 1},\pmb{b}_{m \times 1} 为常数向量,\pmb{a}_=(a_1,a_2,\cdots,a_n)^T,\pmb{b}=(b_1,b_2,\cdots,b_m)^T

证明:

因为标量的转置等于标量自己,所以有

(aTXTb)X=(aTXTb)TX=(bTXa)X(29)\frac{\partial(\pmb{a}^T\pmb{X}^T\pmb{b})}{\partial\pmb{X}}=\frac{\partial(\pmb{a}^T\pmb{X}^T\pmb{b})^T}{\partial\pmb{X}}=\frac{\partial(\pmb{b}^T\pmb{X}\pmb{a})}{\partial\pmb{X}} (29)

由 (26) 式得:

(aTXTb)X=(bTXa)X=baT(30)\frac{\partial(\pmb{a}^T\pmb{X}^T\pmb{b})}{\partial\pmb{X}}=\frac{\partial(\pmb{b}^T\pmb{X}\pmb{a})}{\partial\pmb{X}} = \pmb{b}\pmb{a}^T (30)

证毕。

2.3

(aTXXTb)X=abTX+baTX(31)\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} (31)

其中, am×1,bm×1为常数向量,a=(a1,a2,,am)T,b=(b1,b2,,bm)T\pmb{a}_{m \times 1},\pmb{b}_{m \times 1} 为常数向量,\pmb{a}_=(a_1,a_2,\cdots,a_m)^T,\pmb{b}=(b_1,b_2,\cdots,b_m)^T

证明(右击公式,选择在新标签页中打开图片,公式就可以放大了~)

\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \ +\cdots \ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{\pmb{X}}} \=_{m \times n} \=\=+=+=[b_1, b_2, \cdots, b_m]+[a_1, a_2, \cdots, a_m]= \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} (32)

证毕。

2.4

(aTXTXb)X=XbaT+XabT(33)\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} = \pmb{X}\pmb{b}\pmb{a}^T+\pmb{X}\pmb{a}\pmb{b}^T (33)

其中, an×1,bn×1为常数向量,a=(a1,a2,,an)T,b=(b1,b2,,bn)T\pmb{a}_{n \times 1},\pmb{b}_{n \times 1} 为常数向量,\pmb{a}_=(a_1,a_2,\cdots,a_n)^T,\pmb{b}=(b_1,b_2,\cdots,b_n)^T

证明:

我们来看一下 (本质篇_9) 式:

DXf(X)=f(X)Xm×nT=[fx11fx21fxm1fx12fx22fxm2fx1nfx2nfxmn]n×m (本质篇_9)\text{D}_{\pmb{X}}f(\pmb{X})= \frac{\partial f(\pmb{X})}{\partial \pmb{X}^T_{m\times n}} = \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \ (本质篇\_9)

再来看一下 (本质篇_11) 式:

Xf(X)=f(X)Xm×n=[fx11fx12fx1nfx21fx22fx2nfxm1fxm2fxmn]m×n (本质篇_11)\nabla_{\pmb{X}}f(\pmb{X})= \frac{\partial f(\pmb{X})}{\partial \pmb{X}_{m\times n}} = \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}} \\ \frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{2n}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{m\times n} \ (本质篇\_11)

正如本质篇_三._2.5.1 总结的那样,这两个结果互为转置,即:

f(X)Xm×nT=(f(X)Xm×n)T(34)\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}^T_{m\times n}} = (\frac{\partial f(\pmb{X})}{\partial{\pmb{X}_{m\times n}}})^T (34)

所以,我们把 (31) 式中的分母的矩阵变元写为转置,就有:

(aTXXTb)XT=((aTXXTb)X)T=(abTX+baTX)T=XTbaT+XTabT(35)\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}^T} = (\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}})^T = (\pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X})^T = \pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T (35)

对于 (33) 式,我们将其写为如下形式:

(aTXTXb)X=(aT(XT)(XT)Tb)(XT)T(36)\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} =\frac{\partial( \pmb{a}^T(\pmb{X}^T)(\pmb{X}^T)^T\pmb{b})}{\partial{(\pmb{X}}^T)^T} (36)

然后对 (36) 式使用 (35) 式,得:

(aTXTXb)X=(aT(XT)(XT)Tb)(XT)T=(XT)TbaT+(XT)TabT=XbaT+XabT(37)\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} =\frac{\partial( \pmb{a}^T(\pmb{X}^T)(\pmb{X}^T)^T\pmb{b})}{\partial{(\pmb{X}}^T)^T} = (\pmb{X}^T)^T\pmb{b}\pmb{a}^T+(\pmb{X}^T)^T\pmb{a}\pmb{b}^T = \pmb{X}\pmb{b}\pmb{a}^T+\pmb{X}\pmb{a}\pmb{b}^T (37)

证毕。

本文到这里就结束了,相信大家也和我一样,会觉的后面那几个求导公式,如果按照定义去推导的话,十分的麻烦而且容易出错。

所以, 在下一篇文章中,我们将介绍向量变元实值标量函数矩阵变元实值标量函数进阶的矩阵求导的技巧:矩阵的迹 tr(A)\mathrm{tr}(\pmb{A})一阶实矩阵微分 dX\mathrm{d}\pmb{X} ,它们可以极大地化简我们的推导过程。

矩阵求导系列其他文章:

对称矩阵的求导,以多元正态分布的极大似然估计为例(矩阵求导——补充篇) - Iterator的文章 - 知乎

矩阵求导公式的数学推导(矩阵求导——进阶篇) - Iterator的文章 - 知乎

矩阵求导的本质与分子布局、分母布局的本质(矩阵求导——本质篇) - Iterator的文章 - 知乎

参考

  1. [7](#ref_1_0)bcdefgh张贤达《矩阵分析与应用(第二版)》P147

矩阵求导公式的数学推导(矩阵求导——进阶篇)

前言

1、看本文之前请务必按照顺序先看这两篇文章:

下文以"本质篇",“基础篇”指代上面这两篇文章。

2、本文介绍向量变元实值标量函数矩阵变元实值标量函数进阶的矩阵求导的技巧:矩阵的迹 tr(A)\mathbb{tr}(\pmb{A})一阶实矩阵微分 dX\mathrm{d}\pmb{X}。(文中的推导过程会使用到矩阵变元实矩阵函数,但矩阵变元实矩阵函数求导本文不会涉及)

**3、**本文章和前两篇文章的区别是什么,分别在什么时候使用:

答:我们知道,在高等数学中,导数的原始定义其实是在求极限,但我们在实际求导的过程中,不会真正去使用定义去求的,而是使用各种我们已知的比如幂函数、指数函数等的求导公式与乘积法则、复合法则等。矩阵求导也是类似的,我们在实际求导过程中,不会真正使用本质篇与基础篇的定义法去求,而是使用本文中的各种法则去求。

4、本文使用的符号与本质篇、基础篇相同。

5、看懂本文需要了解本质篇、基础篇所提及的知识,以及了解本科阶段线性代数中行列式伴随矩阵、逆矩阵的知识**,以及了解本科阶段高等数学中的微分全微分的知识,无需任何其他知识。**

6、本文前两节 一. 矩阵的迹 二. 微分与全微分是矩阵求导的前置知识,如果你已经很熟悉了,可以直接跳到 **三. 矩阵的微分。(**不过还是建议看一遍,加深印象)

**7、**有一个矩阵求导的网站,大家可以验证自己算的结果是否正确。

矩阵的迹[1]

1、定义

n×nn \times n方阵 An×n\pmb{A}_{n \times n} 的主对角线元素之和就叫矩阵 A\pmb{A} 的迹(trace),记作 tr(A)\mathbb{tr}(\pmb{A}) ,即:

An×n=[a11a12a1na21a22a2nan1an2ann]n×n\pmb{A}_{n \times n}= \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \\ \end{bmatrix}_{n \times n}

A\pmb{A} 的迹为:

tr(A)=a11+a22++ann=i=1naii(1)\mathbb{tr}(\pmb{A})=a_{11} + a_{22} + \cdots + a_{nn} = \sum_{i=1}^n{a_{ii}} (1)

注意:只有方阵才有迹。

2、一些性质(很重要,下文需要用到,建议熟记

2.1 标量的迹

对于一个标量 xx ,可以看成是 1×11 \times 1 的矩阵,它的迹就是它自己。

x=tr(x)(2)x=\mathbb{tr}(x) (2)

2.2 线性法则

相加再求迹等于求迹再相加,标量提外面

tr(c1A+c2B)=c1tr(A)+c2tr(B)(3)\mathbb{tr}(c_1\pmb{A}+c_2\pmb{B}) = c_1\mathbb{tr}(\pmb{A})+c_2\mathbb{tr}(\pmb{B}) (3)

其中, c1,c2c_1,c_2 为标量。

证明:

tr(c1A+c2B)=tr[c1a11+c2b11c1a12+c2b12c1a1n+c2b1nc1a21+c2b21c1a22+c2b22c1a2n+c2b2nc1an1+c2bn1c1an2+c2bn2c1ann+c2bnn]=(c1a11+c2b11)+(c1a22+c2b22)++(c1ann+c2bnn)=c1(a11+a22++ann)+c2(b11+b22++bnn)=c1tr(A)+c2tr(B)(4)\mathbb{tr}(c_1\pmb{A}+c_2\pmb{B}) = \mathbb{tr} \begin{bmatrix} c_1a_{11}+c_2b_{11} & c_1a_{12}+c_2b_{12} & \cdots & c_1a_{1n}+c_2b_{1n} \\ c_1a_{21}+c_2b_{21} & c_1a_{22}+c_2b_{22} & \cdots & c_1a_{2n}+c_2b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ c_1a_{n1}+c_2b_{n1} & c_1a_{n2}+c_2b_{n2} & \cdots & c_1a_{nn}+c_2b_{nn} \\ \end{bmatrix} = (c_1a_{11}+c_2b_{11})+(c_1a_{22}+c_2b_{22})+\cdots + (c_1a_{nn}+c_2b_{nn}) = c_1(a_{11}+a_{22}+\cdots+a_{nn}) + c_2(b_{11}+b_{22}+\cdots+b_{nn}) = c_1\mathbb{tr}(\pmb{A})+c_2\mathbb{tr}(\pmb{B}) (4)

证毕。

2.3 转置

转置的迹等于原矩阵的迹

tr(A)=tr(AT)(5)\mathbb{tr}(\pmb{A})=\mathbb{tr}(\pmb{A}^T) (5)

证明:

因为转置不会改变主对角线的元素,故成立。

证毕。

2.4 乘积的迹的本质

对于两个阶数都是 m×n的矩阵Am×n,Bm×nm \times n 的矩阵\pmb{A}_{m\times n},\pmb{B}_{m\times n}, 其中一个矩阵乘以(左乘右乘都可以)另一个矩阵的转置迹,本质是 Am×n,Bm×n\pmb{A}_{m\times n},\pmb{B}_{m\times n} 两个矩阵对应位置的元素相乘并相加,可以理解为向量的点积在矩阵上的推广,即:

tr(ABT)=a11b11+a12b12++a1nb1n+a21b21+a22b22++a2nb2n++am1bm1+am2bm2++amnbmn(6)\mathbb{tr}(\pmb{A}\pmb{B}^T) = a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n} + a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n} + \cdots + a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} (6)

证明:

tr(ABT)=tr([a11a12a1na21a22a2nam1am2amn][b11b21bm1b12b22bm2b1nb2nbmn])=tr[a11b11+a12b12++a1nb1n不用管不用管不用管a21b21+a22b22++a2nb2n不用管ddots不用管不用管am1bm1+am2bm2++amnbmn]m×m=a11b11+a12b12++a1nb1n+a21b21+a22b22++a2nb2n++am1bm1+am2bm2++amnbmn(7)\mathbb{tr}(\pmb{A}\pmb{B}^T) =\mathbb{tr}\begin{pmatrix} \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix} \begin{bmatrix} b_{11} & b_{21} & \cdots & b_{m1} \\ b_{12} & b_{22} & \cdots & b_{m2} \\ \vdots & \vdots & \vdots & \vdots \\ b_{1n} & b_{2n} & \cdots & b_{mn} \\ \end{bmatrix} \end{pmatrix} \\= \mathbb{tr} \begin{bmatrix} a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n} & 不用管 & \cdots & 不用管 \\ 不用管 & a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n} & \cdots & 不用管 \\ \vdots & \vdots & \\ddots & \vdots \\ 不用管 & 不用管 & \cdots & a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} \\ \end{bmatrix}_{m \times m} \\= a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n} + a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n} + \cdots + a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} (7)

证毕。

2.5 交换律

矩阵乘积位置互换,迹不变

tr(AB)=tr(BA)(8)\mathbb{tr}(\pmb{A}\pmb{B})= \mathbb{tr}(\pmb{B}\pmb{A}) (8)

其中, Am×n,Bn×m\pmb{A}_{m \times n},\pmb{B}_{n \times m}

证明:

Bn×m看做是(BT)m×n\pmb{B}_{n \times m} 看做是 (\pmb{B}^T)_{m \times n} 的转置。由乘积的迹的本质,即 (6) 式可知,无论乘积怎么交换顺序, Am×n(BT)m×n\pmb{A}_{m \times n} 与 (\pmb{B}^T)_{m \times n} 对应位置的元素相乘并相加,永远是不变的。

证毕。

2.6 更多矩阵的交换律

tr(ABC)=tr(CAB)=tr(BCA)(9)\mathbb{tr}(\pmb{A}\pmb{B}\pmb{C})=\mathbb{tr}(\pmb{C}\pmb{A}\pmb{B})=\mathbb{tr}(\pmb{B}\pmb{C}\pmb{A}) (9)

其中, Am×n,Bn×p,Cp×m\pmb{A}_{m \times n},\pmb{B}_{n \times p},\pmb{C}_{p \times m}

证明:

把两个矩阵的乘积看做一个矩阵,和另外的一个矩阵应用交换律即可。

证毕。

2.7 熟练使用

tr(ABT)=tr(BTA)=tr(ATB)=tr(BAT)(10)\mathbb{tr}(\pmb{A}\pmb{B}^T) = \mathbb{tr}(\pmb{B}^T\pmb{A}) = \mathbb{tr}(\pmb{A}^T\pmb{B}) = \mathbb{tr}(\pmb{B}\pmb{A}^T) (10)

其中, Am×n,Bm×n\pmb{A}_{m \times n},\pmb{B}_{m \times n}

证明:

第一个和第二个是交换律,第二个和三个是转置,第三个和第四个是交换律。

证毕。

微分与全微分

我们先来复习一下本科阶段所学的高等数学中的微分与全微分。

1、一元函数的微分

1.1 普通函数的微分[2]

y=f(x)yy=f(x) , y 可导,则其微分为:

dy=df(x)=f(x)dx(11)\mathbb{d}y=\mathbb{d}f(x)=f'(x)\mathbb{d}x (11)

1.2 复合函数的微分[3]

y=f(u),u=g(x)y=f(u),u=g(x) ,均可导,则 yy 的微分为:

dy=df(u)=f(u)du=f(u)dg(x)=f(u)g(x)dx(12)\mathbb{d}y=\mathbb{d}f(u)=f'(u)\mathbb{d}u=f'(u)\mathbb{d}g(x)=f'(u)g'(x)\mathbb{d}x (12)

乍一看很复杂,其实举个例子就很简单了:

y=sin(2x+1),u=2x+1y=\sin(2x+1),u=2x+1 ,则 yy 的微分为:

dy=d(sinu)=cosudu=cos(2x+1)d(2x+1)=cos(2x+1)2dx=2cos(2x+1)dx(13)\mathbb{d}y=\mathbb{d}(\sin{u})=\cos{u}\mathbb{d}u=\cos(2x+1)\mathbb{d}(2x+1) =\cos(2x+1) \cdot 2 \mathbb{d}x=2\cos(2x+1) \mathbb{d}x (13)

2、多元函数的全微分

2.1 普通函数的全微分[4]

z=f(x,y)z=f(x,y)zz 可微,则其全微分为:

dz=zxdx+zydy(14)\mathbb{d}z=\frac{\partial z}{\partial x}\mathbb{d}x+\frac{\partial z}{\partial y}\mathbb{d}y (14)

2.2 复合函数的全微分

z=f(u),u=φ(x,y),z可导,u可微z=f(u),u=\varphi(x,y) , z 可导, u 可微,则其全微分为:

dz=df(u)=f(u)du=f(u)(uxdx+uydy)=f(u)uxdx+f(u)uydy(15)\mathbb{d}z =\mathbb{d}f(u)=f'(u)\mathbb{d}u=f'(u)(\frac{\partial u}{\partial x}\mathbb{d}x+\frac{\partial u}{\partial y}\mathbb{d}y) = f'(u)\frac{\partial u}{\partial x}\mathbb{d}x+f'(u)\frac{\partial u}{\partial y}\mathbb{d}y (15)

举个例子:

z=sin(2x+y2),u=2x+y2,则zz=\sin(2x+y^2),u=2x+y^2 ,则 z 的全微分为:

dz=d(sinu)=cosudu=cos(2x+y2)d(2x+y2)=cos(2x+y2)(2dx+2ydy)=2cos(2x+y2)dx+2ycos(2x+y2)dy\mathbb{d}z=\mathbb{d}(\sin u)=\cos u\mathbb{d}u=\cos(2x+y^2)\mathbb{d}(2x+y^2) =\cos(2x+y^2)(2\mathbb{d}x+2y\mathbb{d}y) = 2\cos(2x+y^2)\mathbb{d}x+2y\cos(2x+y^2)\mathbb{d}y

3、微分/全微分的法则[5]

3.1 常数的微分

dc=0(16_1)\mathbb{d}c=0 (16\_1)

其中,cc 为常数。

3.2 线性法则

相加再微分等于微分再相加,常数提外面

d(c1u+c2v)=c1du+c2dv(16_2)\mathbb{d}(c_1u+c_2v) = c_1\mathbb{d}u+c_2\mathbb{d}v (16\_2)

其中, 一元函数 u=u(x),v=v(x)u=u(x),v=v(x) 或多元函数 u=u(x,y),v=v(x,y)c1,c2u=u(x,y),v=v(x,y) , c_1,c_2 为常数。

3.3 乘积法则

前微后不微 ++ 前不微后微

d(uv)=d(u)v+ud(v)(16_3)\mathbb{d}(uv)=\mathbb{d}(u)v+u\mathbb{d}(v) (16\_3)

其中, 一元函数 u=u(x),v=v(x)u=u(x),v=v(x) 或多元函数 u=u(x,y),v=v(x,y)u=u(x,y),v=v(x,y)

3.4 商法则

(上微下不微 上不微下微)除以(下的平方)

d(uv)=1v2(d(u)vud(v))(16_4)\mathbb{d}(\frac{u}{v})=\frac{1}{v^2}(\mathbb{d}(u)v-u\mathbb{d}(v) ) (16\_4)

其中, 一元函数 v=v(x)0,u=u(x)v=v(x) \neq0,u=u(x) 或多元函数 v=v(x,y)0,u=u(x,y)v=v(x,y) \neq 0, u=u(x,y)

矩阵的微分

1、向量变元的实值标量函数[6]

f(x),x=[x1,x2,,xn]Tf(\pmb{x}),\pmb{x}=[x_1,x_2,\cdots,x_n]^T

它其实就是多元函数,设其可微,则它的全微分,即 (14) 式:

df(x)=fx1dx1+fx2dx2++fxndxn=(fx1,fx2,,fxn)[dx1dx2dxn](17)\mathbb{d}f(\pmb{x}) =\frac{\partial f}{\partial x_1}\mathbb{d}x_1+\frac{\partial f}{\partial x_2}\mathbb{d}x_2 + \cdots+\frac{\partial f}{\partial x_n}\mathbb{d}x_n= (\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}) \begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix} (17)

结果是标量,由 (2) 式可知, (17) 式可以写成迹的形式,即:

df(x)=(fx1,fx2,,fxn)[dx1dx2dxn]=tr((fx1,fx2,,fxn)[dx1dx2dxn])(18)\mathbb{d}f(\pmb{x}) = (\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}) \begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix} =\mathbb{tr}((\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}) \begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix}) (18)

2、矩阵变元的实值标量函数[7]

f(X),Xm×n=(xij)i=1,j=1m,nf(\pmb{X}),\pmb{X}_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n}

它也是多元函数,设其可微,则它的全微分,仍是 (14) 式:

df(X)=fx11dx11+fx12dx12++fx1ndx1n+fx21dx21+fx22dx22++fx2ndx2n++fxm1dxm1+fxm2dxm2++fxmndxmn(19)\mathbb{d}f(\pmb{X}) =\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n} +\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n} +\cdots+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} (19)

我们从这个结果中发现,它其实就是矩阵 (fxij)i=1,j=1m,n(\frac{\partial f}{\partial x_{ij}})_{i=1,j=1}^{m,n} 与矩阵 (dxij)i=1,j=1m,n(\mathbb{d}x_{ij})_{i=1,j=1}^{m,n} **对应位置的元素相乘并相加,**由 (6) 式可知, (19) 式也可以写成迹的形式,即:

df(X)=fx11dx11+fx12dx12++fx1ndx1n+fx21dx21+fx22dx22++fx2ndx2n++fxm1dxm1+fxm2dxm2++fxmndxmn=tr([fx11fx21fxm1fx12fx22fxm2fx1nfx2nfxmn]n×m[dx11dx12dx1ndx21dx22dx2ndxm1dxm2dxmn]m×n)(20)\mathbb{d}f(\pmb{X}) =\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}+\cdots+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} =\mathbb{tr}( \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11} & \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\ \mathbb{d}x_{21} & \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots\\ \mathbb{d}x_{m1} & \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \end{bmatrix}_{m \times n} ) (20)

3、矩阵变元的实矩阵函数[8]

F(X),Fp×q=(fij)i=1,j=1p,q,Xm×n=(xij)i=1,j=1m,n\pmb{F}(\pmb{X}),\pmb{F}_{p\times q}=(f_{ij})_{i=1,j=1}^{p,q},\pmb{X}_{m \times n}=(x_{ij})_{i=1,j=1}^{m,n}

由**本质篇_一._3_3.3 可知,矩阵变元的实矩阵函数,它的每个元素其实就是一个矩阵变元的实值标量函数 fij(X)f_{ij}(\pmb{X})

我们定义:设 fij(X)f_{ij}(\pmb{X}) 可微,则矩阵变元的实矩阵函数的矩阵微分,就是对每个位置的元素 fij(X)f_{ij}(\pmb{X}) 求全微分,排列布局不变,即:

dFp×q(X)=[df11(X)df12(X)df1q(X)df21(X)df22(X)df2q(X)dfp1(X)dfp2(X)dfpq(X)]p×q(21)\mathbb{d}\pmb{F}_{p \times q}(\pmb{X}) = \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X})& \mathbb{d}f_{12}(\pmb{X}) & \cdots & \mathbb{d}f_{1q}(\pmb{X}) \\ \mathbb{d}f_{21}(\pmb{X})& \mathbb{d}f_{22}(\pmb{X}) & \cdots & \mathbb{d}f_{2q}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}f_{p1}(\pmb{X})& \mathbb{d}f_{p2}(\pmb{X}) & \cdots & \mathbb{d}f_{pq}(\pmb{X}) \end{bmatrix}_{p \times q} (21)

3.1 四个法则(很重要,下文需要用到,建议熟记

a. 常数矩阵的矩阵微分

dAm×n=0m×n(22_1)\mathbb{d}\pmb{A}_{m \times n} = \pmb{0}_{m \times n} (22\_1)

其中,Am×n\pmb{A}_{m \times n} 为常数矩阵。

证明:

A\pmb{A} 的每个元素都是常数,由 (16_1) 得,每个元素的微分是 0 。

证毕。

b. 线性法则

相加再微分等于微分再相加,常数提外面

d(c1F(X)+c2G(X))=c1dF(X)+c2dG(X)(22_2)\mathbb{d}(c_1\pmb{F}(\pmb{X})+c_2\pmb{G}(\pmb{X})) = c_1\mathbb{d}\pmb{F}(\pmb{X})+c_2\mathbb{d}\pmb{G}(\pmb{X}) (22\_2)

其中,c1,c2c_1,c_2 为常数。

证明:

c1F(X)+c2G(X)c_1\pmb{F}(\pmb{X})+c_2\pmb{G}(\pmb{X}) 的每个元素都是 c1fij(X)+c2gij(X)c_1f_{ij}(\pmb{X})+c_2g_{ij}(\pmb{X}) ,由 (16_2) 式可知,每个元素的全微分是 c1dfij(X)+c2dgij(X)c_1\mathbb{d}f_{ij}(\pmb{X})+c_2\mathbb{d}g_{ij}(\pmb{X})

证毕。

c. 乘积法则

前微后不微 + 前不微后微

d(F(X)G(X))=d(F(X))G(X)+F(X)dG(X)(22_3_1)\mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X}))=\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}\pmb{G}(\pmb{X}) (22\_3\_1)

其中, Fp×q(X),Gq×s(X)\pmb{F}_{p \times q}(\pmb{X}),\pmb{G}_{q \times s}(\pmb{X}) 为矩阵,且 p,q,sp,q,s 为任意整数。

注意:此时的微分是矩阵不能交换乘积的左右顺序。

证明:

F(X)G(X)\pmb{F}(\pmb{X})\pmb{G}(\pmb{X}) 的每个元素都是 k=1q[fik(X)gkj(X)]\sum_{k=1}^q[f_{ik}(\pmb{X})g_{kj}(\pmb{X})] ,由 (16_2) 式、 (16_3) 式可知,每个元素的全微分是

d(k=1q[fik(X)gkj(X)])=k=1qd(fik(X)gkj(X))=k=1q[d(fik(X))gkj(X)+fik(X)dgkj(X)]=k=1q[d(fik(X))gkj(X)]+k=1q[fik(X)dgkj(X)](22_3_1_a)\mathbb{d}( \sum_{k=1}^q[f_{ik}(\pmb{X})g_{kj}(\pmb{X})] ) =\sum_{k=1}^q \mathbb{d}(f_{ik}(\pmb{X})g_{kj}(\pmb{X})) = \sum_{k=1}^q[\mathbb{d}(f_{ik}(\pmb{X}))g_{kj}(\pmb{X})+f_{ik}(\pmb{X})\mathbb{d}g_{kj}(\pmb{X})] = \sum_{k=1}^q[\mathbb{d}(f_{ik}(\pmb{X}))g_{kj}(\pmb{X})]+ \sum_{k=1}^q[f_{ik}(\pmb{X})\mathbb{d}g_{kj}(\pmb{X})] (22\_3\_1\_a)

结果左边的求和式,就是 d(F(X))G(X)\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X}) 的每个元素,结果右边的求和式,就是 F(X)dG(X)\pmb{F}(\pmb{X})\mathbb{d}\pmb{G}(\pmb{X}) 的每个元素。

证毕。

由此,很容易得到更多个乘积的法则:

d(F(X)G(X)H(X))=d(F(X))G(X)H(X)+F(X)d(G(X))H(X)+F(X)G(X)dH(X)(22_3_2)\mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}))=\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}(\pmb{G}(\pmb{X}))\pmb{H}(\pmb{X})+ \pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\mathbb{d}\pmb{H}(\pmb{X}) (22\_3\_2)

证明:

d(F(X)G(X)H(X))=d(F(X))G(X)H(X)+F(X)d(G(X)H(X))=d(F(X))G(X)H(X)+F(X)[d(G(X))H(X)+G(X)dH(X)]=d(F(X))G(X)H(X)+F(X)d(G(X))H(X)+F(X)G(X)dH(X)(22_3_2_a)\mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\pmb{H}(\pmb{X})) = \mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X})+\pmb{F}(\pmb{X})\mathbb{d}(\pmb{G}(\pmb{X})\pmb{H}(\pmb{X})) = \mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}) +\pmb{F}(\pmb{X})[\mathbb{d}(\pmb{G}(\pmb{X}))\pmb{H}(\pmb{X}) + \pmb{G}(\pmb{X})\mathbb{d}\pmb{H}(\pmb{X})] =\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}(\pmb{G}(\pmb{X}))\pmb{H}(\pmb{X})+ \pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\mathbb{d}\pmb{H}(\pmb{X}) (22\_3\_2\_a)

证毕。

d. 转置法则

转置的矩阵微分等于矩阵微分的转置

dFp×qT(X)=(dFp×q(X))T(22_4_1)\mathbb{d}\pmb{F}^T_{p \times q}(\pmb{X})= (\mathbb{d}\pmb{F}_{p \times q}(\pmb{X}))^T (22\_4\_1)

其中, Fp×q(X)\pmb{F}_{p \times q}(\pmb{X}) 为矩阵。

证明:

dFp×qT(X)=d[f11(X)f21(X)fp1(X)f12(X)f22(X)fp2(X)f1q(X)f2q(X)fpq(X)]q×p=[df11(X)df21(X)dfp1(X)df12(X)df22(X)dfp2(X)df1q(X)df2q(X)dfpq(X)]q×p=[df11(X)df12(X)df1q(X)df21(X)df22(X)df2q(X)dfp1(X)dfp2(X)dfpq(X)]p×qT=(dFp×q(X))T(22_4_2)\mathbb{d}\pmb{F}^T_{p \times q}(\pmb{X}) = \mathbb{d} \begin{bmatrix} f_{11}(\pmb{X})& f_{21}(\pmb{X}) & \cdots & f_{p1}(\pmb{X}) \\ f_{12}(\pmb{X})& f_{22}(\pmb{X}) & \cdots & f_{p2}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ f_{1q}(\pmb{X})&f_{2q}(\pmb{X}) & \cdots & f_{pq}(\pmb{X}) \end{bmatrix}_{q \times p} = \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X})& \mathbb{d}f_{21}(\pmb{X}) & \cdots & \mathbb{d}f_{p1}(\pmb{X}) \\ \mathbb{d}f_{12}(\pmb{X})& \mathbb{d}f_{22}(\pmb{X}) & \cdots & \mathbb{d}f_{p2}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}f_{1q}(\pmb{X})&\mathbb{d}f_{2q}(\pmb{X}) & \cdots & \mathbb{d}f_{pq}(\pmb{X}) \end{bmatrix}_{q \times p} = \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X})& \mathbb{d}f_{12}(\pmb{X}) & \cdots & \mathbb{d}f_{1q}(\pmb{X}) \\ \mathbb{d}f_{21}(\pmb{X})& \mathbb{d}f_{22}(\pmb{X}) & \cdots & \mathbb{d}f_{2q}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}f_{p1}(\pmb{X})& \mathbb{d}f_{p2}(\pmb{X}) & \cdots & \mathbb{d}f_{pq}(\pmb{X}) \end{bmatrix}_{p \times q}^T = (\mathbb{d}\pmb{F}_{p \times q}(\pmb{X}))^T (22\_4\_2)

证毕。

3.2 为什么要使用矩阵微分求导

Xm×n\pmb{X}_{m \times n} 自己就是矩阵变元为 Xm×n\pmb{X}_{m \times n} 的实矩阵函数,它的每个元素是 x_{ij} ,每个元素的全微分是 dxij\mathbb d{x_{ij}}

因此, Xm×n\pmb{X}_{m \times n} 的矩阵微分是:

dXm×n=[dx11dx12dx1ndx21dx22dx2ndxm1dxm2dxmn]m×n(23_1)\mathbb{d}\pmb{X}_{m \times n} = \begin{bmatrix} \mathbb{d}x_{11}& \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\ \mathbb{d}x_{21}& \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}x_{m1}& \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \\ \end{bmatrix}_{m \times n} (23\_1)

向量 x=[x1,x2,,xn]T\pmb{x}=[x_1,x_2,\cdots,x_n]^T 的矩阵微分是:

dx=[dx1dx2dxn]n×1(23_2)\mathbb{d}\pmb{x} = \begin{bmatrix} \mathbb{d}x_{1}\\ \mathbb{d}x_{2}\\ \vdots \\ \mathbb{d}x_{n} \\ \end{bmatrix}_{n \times 1} (23\_2)

于是,我们刚刚讲到的矩阵微分四个法则,对于 dXm×n,dx\mathbb{d}\pmb{X}_{m \times n},\mathbb{d}\pmb{x} 也是适用的。

我们现在回到矩阵变元的实值标量函数的全微分,即 (20) 式:

df(X)=fx11dx11+fx12dx12++fx1ndx1n+fx21dx21+fx22dx22++fx2ndx2n++fxm1dxm1+fxm2dxm2++fxmndxmn=tr([fx11fx21fxm1fx12fx22fxm2fx1nfx2nfxmn]n×m[dx11dx12dx1ndx21dx22dx2ndxm1dxm2dxmn]m×n)(20)\mathbb{d}f(\pmb{X}) =\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}+\cdots+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} =\mathbb{tr}\begin{pmatrix} \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11} & \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\ \mathbb{d}x_{21} & \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots\\ \mathbb{d}x_{m1} & \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \end{bmatrix}_{m \times n} \end{pmatrix} (20)

观察 (20) 式的结果,发现在 tr\mathbb{tr} 中,左边的矩阵,其实就是 (本质篇_9) 式:

DXf(X)=f(X)Xm×nT=[fx11fx21fxm1fx12fx22fxm2fx1nfx2nfxmn]n×m (本质篇_9)\text{D}_{\pmb{X}}f(\pmb{X})= \frac{\partial f(\pmb{X})}{\partial \pmb{X}^T_{m\times n}} = \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \ (本质篇\_9)

而右边的矩阵,其实就是 (23_1) 式:

dXm×n=[dx11dx12dx1ndx21dx22dx2ndxm1dxm2dxmn]m×n(23_1)\mathbb{d}\pmb{X}_{m \times n} = \begin{bmatrix} \mathbb{d}x_{11}& \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\ \mathbb{d}x_{21}& \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}x_{m1}& \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \\ \end{bmatrix}_{m \times n} (23\_1)

因此,矩阵变元实值标量函数的**全微分,**即 (20) 式,可以写成:

df(X)=tr(f(X)XTdX)(24)\mathbb{d}f(\pmb{X}) =\mathbb{tr}(\frac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X})(24)

别忘了我们的目标是什么,其实就是要求 f(X)XT\frac{\partial f(\pmb{X})}{\partial \pmb{X}^T} 。所以,只要我们可以把一个矩阵变元的实值标量函数的全微分写成 (24) 式,我们就找到了矩阵求导的结果。(已经有人证明[9],这样的结果是唯一的。即若

df(X)=tr(A1dX)=tr(A2dX),则A1=A2\mathbb{d}f(\pmb{X}) =\mathbb{tr}(\pmb{A}_1\mathbb{d}\pmb{X}) = \mathbb{tr}(\pmb{A}_2\mathbb{d}\pmb{X}) ,则 \pmb{A}_1=\pmb{A}_2

对于向量变元的实值标量函数的全微分,即 (18) 式,同样可以写成:

df(x)=tr(f(x)xTdx)(25)\mathbb{d}f(\pmb{x}) =\mathbb{tr}(\frac{\partial f(\pmb{x})}{\partial\pmb{x}^T} \mathbb{d}\pmb{x})(25)

而由本质篇_三._2.5_2.5.2 指出的,当矩阵变元 X\pmb{X} 本身就是一个列向量x\pmb{x}

f(X)XT=f(x)xT(26)\frac{\partial f(\pmb{X})}{\partial \pmb{X}^T} = \frac{\partial f(\pmb{x})}{\partial \pmb{x}^T} (26)

同时,由 (23_1) 式、 (23_2) 式,当矩阵 X\pmb{X} 本身是列向量 x\pmb{x} 时,也有

dX=dx(27)\mathbb{d}\pmb{X} = \mathbb{d}\pmb{x} (27)

所以,矩阵变元或向量变元的实值标量函数的矩阵求导的结果,都可以通过 (24) 式得到:

df(X)=tr(f(X)XTdX)(24)\mathbb{d}f(\pmb{X}) =\mathbb{tr}(\frac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X})(24)

那么,我们该如何写成形如 (24) 式的结果呢,别急,让我们先给出 3×2=63\times 2=6 个你应该记住的公式(以后就直接用了)。

3.2.1[8]夹层饼

d(AXB)=Ad(X)B(25_1_1)\mathbb{d}(\pmb{A}\pmb{X}\pmb{B})=\pmb{A}\mathbb{d}(\pmb{X})\pmb{B} (25\_1\_1)

其中, Ap×m,Bn×q\pmb{A}_{p \times m},\pmb{B}_{n \times q} 是常数矩阵。

证明:

由乘积法则 (22_3_2) 式得:

d(AXB)=d(A)XB+Ad(X)B+AXdB(25_1_a)\mathbb{d}(\pmb{A}\pmb{X}\pmb{B}) = \mathbb{d}(\pmb{A})\pmb{X}\pmb{B} + \pmb{A}\mathbb{d}({\pmb{X}})\pmb{B} + \pmb{A}\pmb{X}\mathbb{d}\pmb{B} (25\_1\_a)

由常数矩阵微分 (22_1) 式得:

dA=0p×m,dB=0n×q(25_1_b)\mathbb{d}\pmb{A} =\pmb{0}_{p \times m},\mathbb{d}\pmb{B} =\pmb{0}_{n \times q} (25\_1\_b)

证毕。

Xm×n\pmb{X}_{m \times n} 可以代入其他任意的矩阵函数:

d(AF(X)B)=Ad(F(X))B(25_1_2)\mathbb{d}(\pmb{A}\pmb{F}(\pmb{X})\pmb{B})=\pmb{A}\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{B} (25\_1\_2)

3.2.2[10] 行列式

dX=Xtr(X1dX)=tr(XX1dX)(25_2_1)\mathbb{d}|\pmb{X}|= |\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X}) (25\_2\_1)

其中, Xn×n\pmb{X}_{n \times n} 是任意矩阵。

证明:

首先明确,行列式是一个实值标量函数,故可以使用 (24) 式。

我们知道,行列式可以按照一行展开,即一行中每个元素乘以他的代数余子式然后求和[11]

我们按照元素 xijx_{ij} 所在的第 i 行展开:

X=xi1Ai1+xi2Ai2++xinAin(25_2_a)|\pmb{X}|=x_{i1}A_{i1}+x_{i2}A_{i2}+\cdots+x_{in}A_{in} (25\_2\_a)

因此,行列式对元素 xijx_{ij} 的偏导,即为该元素对应的代数余子式。

Xxij=Aij(25_2_b)\frac{\partial |\pmb{X}|}{\partial x_{ij}} = A_{ij} (25\_2\_b)

因此,行列式对矩阵求导的结果为:

XXT=[A11A21An1A12A22An2A1nA2nAnn](25_2_c)\frac{\partial |\pmb{X}|}{\partial \pmb{X}^T} = \begin{bmatrix} A_{11} & A_{21} & \cdots & A_{n1} \\ A_{12} & A_{22} & \cdots & A_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ A_{1n} & A_{2n} & \cdots & A_{nn} \\ \end{bmatrix} (25\_2\_c)

这个结果其实就是伴随矩阵[12] X\pmb{X}^*

又因为伴随矩阵和逆矩阵的关系[13]

X1=XX(25_2_d)\pmb{X}^{-1}=\frac{\pmb{X}^*}{|\pmb{X}|} (25\_2\_d)

代入 (24) 式得:

dX=tr(XXTdX)=tr(XX1dX)\mathbb{d}|\pmb{X}| =\mathbb{tr}(\frac{\partial |\pmb{X}|}{\partial\pmb{X}^T} \mathbb{d}\pmb{X}) =\mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X})

又因为行列式是标量,由 (3) 式,可以提到迹的外面,得:

dX=Xtr(X1dX)=tr(XX1dX)(25_2_1)\mathbb{d}|\pmb{X}|= |\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X}) (25\_2\_1)

证毕。

Xn×n\pmb{X}_{n \times n} 可以代入其他任意的矩阵函数[10]

dF(X)=F(X)tr(F(X)1dF(X))=tr(F(X)F(X)1dF(X))(25_2_2)\mathbb{d}|\pmb{F}(\pmb{X})|= |\pmb{F}(\pmb{X})|\mathbb{tr}(\pmb{F}(\pmb{X})^{-1}\mathbb{d}\pmb{F}(\pmb{X})) = \mathbb{tr}(|\pmb{F}(\pmb{X})|\pmb{F}(\pmb{X})^{-1}\mathbb{d}\pmb{F}(\pmb{X})) (25\_2\_2)

3.2.3[10] 逆矩阵

d(X1)=X1d(X)X1(25_3_1)\mathbb{d}(\pmb{X}^{-1})=-\pmb{X}^{-1}\mathbb{d}(\pmb{X})\pmb{X}^{-1} (25\_3\_1)

其中, Xn×n\pmb{X}_{n \times n}

证明:

因为 XX1=E\pmb{X}\pmb{X}^{-1}=\pmb{E}

而常数矩阵微分为 0\pmb{0} ,两边同时取矩阵微分得:

d(X)X1+Xd(X1)=0(25_3_1)\mathbb{d}(\pmb{X})\pmb{X}^{-1}+\pmb{X}\mathbb{d}(\pmb{X}^{-1}) =\pmb{0} (25\_3\_1)

等式两边左乘 X1\pmb{X}^{-1} 即得到结果。

证毕。

Xn×n\pmb{X}_{n \times n} 可以代入其他任意的矩阵函数[10]

d(F(X)1)=F(X)1d(F(X))F(X)1(25_3_2)\mathbb{d}(\pmb{F}(\pmb{X})^{-1})=-\pmb{F}(\pmb{X})^{-1}\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{F}(\pmb{X})^{-1} (25\_3\_2)

3.3 如何使用矩阵微分求导

对于实值标量函数 f(X)tr(f(X))=f(X)df(X)=tr(df(X))f(\pmb{X}) , \mathbb{tr}(f(\pmb{X})) =f(\pmb{X}) , \mathbb{d}f(\pmb{X})=\mathbb{tr}(\mathbb{d}f(\pmb{X}))

所以有

df(X)=d(trf(X))=tr(df(X))(26)\mathbb{d}f(\pmb{X}) = \mathbb{d}(\mathbb{tr}f(\pmb{X}))=\mathbb{tr}(\mathbb{d}f(\pmb{X})) (26)

如果实值标量函数本身就是某个矩阵函数 Fp×p(X)\pmb{F}_{p \times p}(\pmb{X}) 的迹,如 trF(X)\mathbb{tr}{\pmb{F}(\pmb{X})} ,则由全微分的线性法则 (16_2) 式,得:

d(trFp×p(X))=d(i=1pfii(X))=i=1pd(fii(X))=tr(dFp×p(X))(27)\mathbb{d}(\mathbb{tr}{\pmb{F}_{p\times p}(\pmb{X})}) = \mathbb{d}(\sum_{i=1}^pf_{ii}(\pmb{X})) = \sum_{i=1}^p\mathbb{d}(f_{ii}(\pmb{X})) = \mathbb{tr}(\mathbb{d}F_{p \times p}(\pmb{X})) (27)

我们以 6 个例子来非常非常详细地说明如何使用矩阵微分求导,例子的结论需要记忆,会推过程才是最重要的,用的时候推一下就好了。

3.3.1 例子1 (基础篇_31) 式

(aTXXTb)X=abTX+baTX(28)\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} (28)

证明:

**第一步:**写成 (26) 式的形式

d(aTXXTb)=tr(d(aTXXTb))(29)\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})= \mathbb{tr}(\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}))(29)

**第二步:**使用矩阵微分法则 (22_1) 式~ (22_4_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25_1_1) 式~ (25_3_2) 式,将 (29) 式化简成形如 (24) 式的形式

由 (25_1_2) 式得:

d(aTXXTb)=tr(d(aTXXTb))=tr(aTd(XXT)b)(30)\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) = \mathbb{tr}(\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})) = \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X}\pmb{X}^T)\pmb{b}) (30)

由 (22_3_1) 式得:

d(aTXXTb)=tr(aTd(XXT)b)=tr[aT(d(X)XT+XdXT)b](31)\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) = \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X}\pmb{X}^T)\pmb{b}) = \mathbb{tr}[\pmb{a}^T(\mathbb{d}(\pmb{X})\pmb{X}^T+\pmb{X}\mathbb{d}\pmb{X}^T)\pmb{b}] (31)

由 (3) 式得:

d(aTXXTb)=tr[aT(d(X)XT+XdXT)b]=tr(aTd(X)XTb)+tr(aTXd(XT)b)(32)\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) = \mathbb{tr}[\pmb{a}^T(\mathbb{d}(\pmb{X})\pmb{X}^T+\pmb{X}\mathbb{d}\pmb{X}^T)\pmb{b}] = \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}\mathbb{d}(\pmb{X}^T)\pmb{b}) (32)

由 (22_4_1) 式得:

d(aTXXTb)=tr(aTd(X)XTb)+tr(aTXd(XT)b)=tr(aTd(X)XTb)+tr(aTX(dX)Tb)(33)\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) = \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}\mathbb{d}(\pmb{X}^T)\pmb{b}) = \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T\pmb{b}) (33)

由 (9) 式, (10) 式得:

d(aTXXTb)=tr(aTd(X)XTb)+tr(aTX(dX)Tb)=tr(XTbaTdX)+tr(baTX(dX)T)=tr(XTbaTdX)+tr((baTX)TdX)=tr(XTbaTdX)+tr(XTabTdX)(34)\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) = \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T\pmb{b}) = \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{b}\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T)= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}((\pmb{b}\pmb{a}^T\pmb{X})^T\mathbb{d}\pmb{X})= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{X}^T\pmb{a}\pmb{b}^T\mathbb{d}\pmb{X}) (34)

由 (3) 式得:

d(aTXXTb)=tr(XTbaTdX)+tr(XTabTdX)=tr((XTbaT+XTabT)dX)(35)\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) = \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{X}^T\pmb{a}\pmb{b}^T\mathbb{d}\pmb{X}) = \mathbb{tr}((\pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T)\mathbb{d}\pmb{X}) (35)

第三步:得出结果

(aTXXTb)XT=XTbaT+XTabT(36)\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}^T}} =\pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T (36)

(aTXXTb)X=abTX+baTX(28)\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} (28)

证毕。

3.3.2 例子2[9]

tr(XTX)X=2X(37)\frac{\partial \mathbb{tr}(\pmb{X}^T\pmb{X})}{\partial \pmb{X}} = 2\pmb{X} (37)

**第一步:**写成 (27) 式的形式

d(tr(XTX))=tr(d(XTX))(38)\mathbb{d}(\mathbb{tr}(\pmb{X}^T\pmb{X})) =\mathbb{tr}(\mathbb{d}(\pmb{X}^T\pmb{X})) (38)

**第二步:**使用矩阵微分法则 (22_1) 式~ (22_4_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25_1_1) 式~ (25_3_2) 式,将 (38) 式化简成形如 (24) 式的形式

由 (22_3_1) 式得:

d(tr(XTX))=tr(d(XTX))=tr(d(XT)X+XTdX)(39)\mathbb{d}(\mathbb{tr}(\pmb{X}^T\pmb{X})) =\mathbb{tr}(\mathbb{d}(\pmb{X}^T\pmb{X})) = \mathbb{tr}(\mathbb{d}(\pmb{X}^T)\pmb{X}+\pmb{X}^T\mathbb{d}\pmb{X}) (39)

由 (3) 式得:

d(tr(XTX))=tr(d(XT)X+XTdX)=tr(d(XT)X)+tr(XTdX)\mathbb{d}(\mathbb{tr}(\pmb{X}^T\pmb{X})) = \mathbb{tr}(\mathbb{d}(\pmb{X}^T)\pmb{X}+\pmb{X}^T\mathbb{d}\pmb{X}) = \mathbb{tr}(\mathbb{d}(\pmb{X}^T)\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X})

由 (22_4_1) 式得:

d(tr(XTX))=tr(d(XT)X)+tr(XTdX)=tr((dX)TX)+tr(XTdX)\mathbb{d}(\mathbb{tr}(\pmb{X}^T\pmb{X})) = \mathbb{tr}(\mathbb{d}(\pmb{X}^T)\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X}) = \mathbb{tr}((\mathbb{d}\pmb{X})^T\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X})

由 (8) 式、 (10) 式得:

d(tr(XTX))=tr((dX)TX)+tr(XTdX)=tr(X(dX)T)+tr(XTdX)=tr(XTdX)+tr(XTdX)=2tr(XTdX)\mathbb{d}(\mathbb{tr}(\pmb{X}^T\pmb{X})) = \mathbb{tr}((\mathbb{d}\pmb{X})^T\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X}) = \mathbb{tr}(\pmb{X}(\mathbb{d}\pmb{X})^T)+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X}) = \mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X})= 2 \mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X})

由 (3) 式得:

d(tr(XTX))=2tr(XTdX)=tr(2XTdX)\mathbb{d}(\mathbb{tr}(\pmb{X}^T\pmb{X})) = 2 \mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X}) = \mathbb{tr}(2\pmb{X}^T\mathbb{d}\pmb{X})

第三步:得出结果

tr(XTX)XT=2XTtr(XTX)X=2X(40)\frac{\partial \mathbb{tr}(\pmb{X}^T\pmb{X})}{\partial \pmb{X}^T} = 2\pmb{X}^T\\\\ \frac{\partial \mathbb{tr}(\pmb{X}^T\pmb{X})}{\partial \pmb{X}} = 2\pmb{X} (40)

3.3.3 例子3[14]

logXX=(X1)T(41)\frac{\partial \log|\pmb{X}|}{\partial \pmb{X}} = (\pmb{X}^{-1})^T (41)

其中, Xn×n\pmb{X}_{n \times n}

**第一步:**写成 (26) 式的形式

d(logX)=tr(d(logX))(42)\mathbb{d}(\log|\pmb{X}|) = \mathbb{tr}(\mathbb{d}(\log|\pmb{X}|)) (42)

**第二步:**使用矩阵微分法则 (22_1) 式~ (22_4_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25_1_1) 式~ (25_3_2) 式,将 (42) 式化简成形如 (24) 式的形式

我们发现,这是一个复合函数的全微分X|\pmb{X}| 是多元函数, logu\log u 是一元函数,故由 (15) 式中的前两个等号,令 z=logX,u=Xz=\log|\pmb{X}|,u=|\pmb{X}| ,则

d(logX)=tr(d(logX))=tr(dz)=tr(d(logu))=tr(1u(du))=tr(1XdX)(43)\mathbb{d}(\log|\pmb{X}|) = \mathbb{tr}(\mathbb{d}(\log|\pmb{X}|)) = \mathbb{tr}(\mathbb{d}z) = \mathbb{tr}(\mathbb{d}(\log u)) = \mathbb{tr}(\frac{1}{u}(\mathbb{d}u) )= \mathbb{tr}(\frac{1}{|\pmb{X}|}\mathbb{d}|\pmb{X}|)(43)

由 (25_2_1) 式得:

d(logX)=tr(1XdX)=tr(1XXtr(X1dX))(44)\mathbb{d}(\log|\pmb{X}|) = \mathbb{tr}(\frac{1}{|\pmb{X}|}\mathbb{d}|\pmb{X}|) = \mathbb{tr}(\frac{1}{|\pmb{X}|}{|\pmb{X}|}\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X})) (44)

标量的迹还是标量,由 (2) 式得:

d(logX)=tr(1XXtr(X1dX))=1XXtr(X1dX)=tr(X1dX)(45)\mathbb{d}(\log|\pmb{X}|) = \mathbb{tr}(\frac{1}{|\pmb{X}|}{|\pmb{X}|}\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X})) = \frac{1}{|\pmb{X}|}{|\pmb{X}|}\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) (45)

第三步:得出结果

logXXT=X1logXX=(X1)T(46)\frac{\partial \log|\pmb{X}|}{\partial \pmb{X}^T} = \pmb{X}^{-1}\\\\ \frac{\partial \log|\pmb{X}|}{\partial \pmb{X}} = (\pmb{X}^{-1})^T (46)

3.3.4 例子4[14]

X1X=X1(X1)T(47)\frac{\partial |\pmb{X}^{-1}|}{\partial \pmb{X}} =-|\pmb{X}^{-1}|(\pmb{X}^{-1})^T (47)

其中, Xn×n\pmb{X}_{n \times n}

**第一步:**写成 (26) 式的形式,由 (25\_2\_2) 式得:

dX1=X1tr((X1)1d(X1))=X1tr(Xd(X1))(48)\mathbb{d} |\pmb{X}^{-1}| = |\pmb{X}^{-1}|\mathbb{tr}((\pmb{X}^{-1})^{-1}\mathbb{d}(\pmb{X}^{-1})) = |\pmb{X}^{-1}|\mathbb{tr}(\pmb{X}\mathbb{d}(\pmb{X}^{-1})) (48)

**第二步:**使用矩阵微分法则 (22_1) 式~ (22_4_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25_1_1) 式~ (25_3_2) 式,将 (48) 式化简成形如 (24) 式的形式

由 (25_3_1) 式得:

dX1=X1tr(Xd(X1))=X1tr(XX1d(X)X1)=X1tr(d(X)X1)(49)\mathbb{d} |\pmb{X}^{-1}| = |\pmb{X}^{-1}|\mathbb{tr}(\pmb{X}\mathbb{d}(\pmb{X}^{-1})) = |\pmb{X}^{-1}|\mathbb{tr}(-\pmb{X}\pmb{X}^{-1}\mathbb{d}(\pmb{X})\pmb{X}^{-1}) = |\pmb{X}^{-1}|\mathbb{tr}(-\mathbb{d}(\pmb{X})\pmb{X}^{-1}) (49)

由 (3) 式得:

dX1=X1tr(d(X)X1)=X1tr(d(X)X1)(50)\mathbb{d} |\pmb{X}^{-1}| = |\pmb{X}^{-1}|\mathbb{tr}(-\mathbb{d}(\pmb{X})\pmb{X}^{-1}) = -|\pmb{X}^{-1}|\mathbb{tr}(\mathbb{d}(\pmb{X})\pmb{X}^{-1}) (50)

由 (8) 式得:

dX1=X1tr(d(X)X1)=X1tr(X1dX)\mathbb{d} |\pmb{X}^{-1}| = -|\pmb{X}^{-1}|\mathbb{tr}(\mathbb{d}(\pmb{X})\pmb{X}^{-1}) = -|\pmb{X}^{-1}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X})

由 (3) 式得:

dX1=X1tr(X1dX)=tr(X1X1d(X))(51)\mathbb{d} |\pmb{X}^{-1}| = -|\pmb{X}^{-1}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(-|\pmb{X}^{-1}|\pmb{X}^{-1}\mathbb{d}(\pmb{X})) (51)

第三步:得出结果

X1XT=X1X1X1X=X1(X1)T(52)\frac{\partial |\pmb{X}^{-1}|}{\partial \pmb{X}^T} =-|\pmb{X}^{-1}|\pmb{X}^{-1} \\\\ \frac{\partial |\pmb{X}^{-1}|}{\partial \pmb{X}} =-|\pmb{X}^{-1}|(\pmb{X}^{-1})^T (52)

3.3.5 例子5[15]

tr(X+A)1X=((X+A)2)T(53)\frac{\partial \mathbb{tr}(\pmb{X}+\pmb{A})^{-1}}{\partial \pmb{X}} =-((\pmb{X}+\pmb{A})^{-2})^T (53)

其中, An×n\pmb{A}_{n \times n} 为常数矩阵, Xn×n,(X+A)2=(X+A)1(X+A)1\pmb{X}_{n \times n},(\pmb{X}+\pmb{A})^{-2}=(\pmb{X}+\pmb{A})^{-1}(\pmb{X}+\pmb{A})^{-1}

**第一步:**写成 (27) 式的形式

d(tr(X+A)1)=tr(d(X+A)1)(54)\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1}) = \mathbb{tr}(\mathbb{d}(\pmb{X}+\pmb{A})^{-1}) (54)

**第二步:**使用矩阵微分法则 (22_1) 式~ (22_4_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25_1_1) 式~ (25_3_2) 式,将 (54) 式化简成形如 (24) 式的形式

由 (25_3_2) 式得:

d(tr(X+A)1)=tr(d(X+A)1)=tr((X+A)1(d(X+A))(X+A)1)(55)\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1}) = \mathbb{tr}(\mathbb{d}(\pmb{X}+\pmb{A})^{-1})= \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-1}(\mathbb{d}(\pmb{X}+\pmb{A}))(\pmb{X}+\pmb{A})^{-1}) (55)

由 (9) 式得:

d(tr(X+A)1)=tr((X+A)1(d(X+A))(X+A)1)=tr((X+A)1(X+A)1d(X+A))=tr((X+A)2d(X+A))(56)\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1}) = \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-1}(\mathbb{d}(\pmb{X}+\pmb{A}))(\pmb{X}+\pmb{A})^{-1}) = \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-1}(\pmb{X}+\pmb{A})^{-1}\mathbb{d}(\pmb{X}+\pmb{A})) = \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}\mathbb{d}(\pmb{X}+\pmb{A})) (56)

由 (22_2) 式得:

d(tr(X+A)1)=tr((X+A)2d(X+A))=tr((X+A)2(dX+dA))(57)\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1}) = \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}\mathbb{d}(\pmb{X}+\pmb{A})) = \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}(\mathbb{d}\pmb{X}+\mathbb{d}\pmb{A})) (57)

由 (22_1) 式得:

d(tr(X+A)1)=tr((X+A)2(dX+dA))=tr((X+A)2dX)(58)\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1}) = \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}(\mathbb{d}\pmb{X}+\mathbb{d}\pmb{A})) = \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}\mathbb{d}\pmb{X}) (58)

第三步:得出结果

tr(X+A)1XT=(X+A)2tr(X+A)1X=((X+A)2)T(59)\frac{\partial \mathbb{tr}(\pmb{X}+\pmb{A})^{-1}}{\partial \pmb{X}^T} =-(\pmb{X}+\pmb{A})^{-2} \\\\ \frac{\partial \mathbb{tr}(\pmb{X}+\pmb{A})^{-1}}{\partial \pmb{X}} =-((\pmb{X}+\pmb{A})^{-2})^T (59)

3.3.6 例子6[15]

X3X=X3X=3X3(X1)T=3X3(X1)T(60)\frac{\partial|\pmb{X}^3|}{\partial \pmb{X}} =\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}} =3|\pmb{X}|^3(\pmb{X}^{-1})^T = 3|\pmb{X}^3|(\pmb{X}^{-1})^T (60)

**第一步:**写成 (26) 式的形式

我们知道,对于 n 阶矩阵 A,B\pmb{A},\pmb{B} ,有AB=AB|\pmb{A}\pmb{B}|=|\pmb{A}| |\pmb{B}|

因此,有

X3=XXX=XXX=X3(61)|\pmb{X}^3|= |\pmb{X}\pmb{X}\pmb{X}| = |\pmb{X}||\pmb{X}||\pmb{X}| = |\pmb{X}|^3 (61)

所以

dX3=d(X3)=tr(d(X3))(62)\mathbb{d}|\pmb{X}^3| =\mathbb{d}(|\pmb{X}|^3)= \mathbb{tr}(\mathbb{d}(|\pmb{X}|^3)) (62)

**第二步:**使用矩阵微分法则 (22_1) 式~ (22_4_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25_1_1) 式~ (25_3_2) 式,将 (62) 式化简成形如 (24) 式的形式

我们发现,这是一个复合函数的全微分X|\pmb{X}| 是多元函数, u3u^3 是一元函数,故由 (15) 式中的前两个等号,令 z=X3,u=Xz=|\pmb{X}|^3,u=|\pmb{X}| ,则

d(tr(X3))=tr(d(X3))=tr(dz)=tr(d(u3))=tr(3u2du)=tr(3X2dX)(63)\mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) = \mathbb{tr}(\mathbb{d}(|\pmb{X}|^3)) = \mathbb{tr}(\mathbb{d}z) = \mathbb{tr}(\mathbb{d}(u^3)) = \mathbb{tr}(3u^2\mathbb{d}u) = \mathbb{tr}(3|\pmb{X}|^2\mathbb{d}|\pmb{X}|) (63)

由 (25_2_1) 式得:

d(tr(X3))=tr(3X2dX)=tr(3X2Xtr(X1dX))=tr(3X3tr(X1dX))(64)\mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) = \mathbb{tr}(3|\pmb{X}|^2\mathbb{d}|\pmb{X}|) = \mathbb{tr}(3|\pmb{X}|^2|\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) = \mathbb{tr}(3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) (64)

标量的迹还是标量,由 (2) 式得:

d(tr(X3))=tr(3X3tr(X1dX))=3X3tr(X1dX)(65)\mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) = \mathbb{tr}(3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) = 3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X})(65)

由 (3) 式得:

d(tr(X3))=3X3tr(X1dX)=tr(3X3X1dX)=tr(3X3X1dX)\mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) = 3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X})= \mathbb{tr}(3|\pmb{X}|^3\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(3|\pmb{X}^3|\pmb{X}^{-1}\mathbb{d}\pmb{X})

第三步:得出结果

X3XT=X3XT=3X3X1=3X3X1X3X=X3X=3X3(X1)T=3X3(X1)T(66)\frac{\partial|\pmb{X}^3|}{\partial \pmb{X}^T} =\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}^T} =3|\pmb{X}|^3\pmb{X}^{-1} = 3|\pmb{X}^3|\pmb{X}^{-1} \\\\ \frac{\partial|\pmb{X}^3|}{\partial \pmb{X}} =\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}} =3|\pmb{X}|^3(\pmb{X}^{-1})^T = 3|\pmb{X}^3|(\pmb{X}^{-1})^T (66)

本系列到这里就结束了,至此,我们遇到的所有的矩阵变元/向量变元实值标量函数一阶矩阵求导都可以用本文的方法进行计算。至于高阶求导、矩阵变元的实矩阵函数的求导,我目前还没有遇到,如果将来遇到了,我会考虑再写几篇的。

矩阵求导系列其他文章:

对称矩阵的求导,以多元正态分布的极大似然估计为例(矩阵求导——补充篇) - Iterator的文章 - 知乎

矩阵求导公式的数学推导(矩阵求导——基础篇) - Iterator的文章 - 知乎

矩阵求导的本质与分子布局、分母布局的本质(矩阵求导——本质篇) - Iterator的文章 - 知乎

参考

  1. ^张贤达《矩阵分析与应用(第二版)》P50
  2. ^《高等数学 同济大学第七版 上册》P111
  3. ^《高等数学 同济大学第七版 上册》P115
  4. ^《高等数学 同济大学第七版 下册》P72
  5. ^《高等数学 同济大学第七版 下册》P114
  6. ^张贤达《矩阵分析与应用(第二版)》P154
  7. ^张贤达《矩阵分析与应用(第二版)》P155
  8. [8](#ref_8_0)b张贤达《矩阵分析与应用(第二版)》P152
  9. [9](#ref_9_0)b张贤达《矩阵分析与应用(第二版)》P156
  10. [10](#ref_10_0)bcd张贤达《矩阵分析与应用(第二版)》P153
  11. ^《工程数学线性代数 同济大学第六版》P17
  12. ^《工程数学线性代数 同济大学第六版》P38
  13. ^《工程数学线性代数 同济大学第六版》P40
  14. [11](#ref_14_0)b张贤达《矩阵分析与应用(第二版)》P160
  15. [12](#ref_15_0)b张贤达《矩阵分析与应用(第二版)》P158

  1. 张贤达《矩阵分析与应用(第二版)》P143 ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  2. 《高等数学 同济大学第七版 下册》P66 ↩︎

  3. 张贤达《矩阵分析与应用(第二版)》P144 ↩︎ ↩︎ ↩︎

  4. 张贤达《矩阵分析与应用(第二版)》P146 ↩︎ ↩︎ ↩︎

  5. 张贤达《矩阵分析与应用(第二版)》P145 ↩︎

  6. 张贤达《矩阵分析与应用(第二版)》P147 ↩︎

  7. a ↩︎

  8. a ↩︎

  9. a ↩︎

  10. a ↩︎

  11. a ↩︎

  12. a ↩︎

References