深度学习线性代数基础知识汇总

参考：动手学深度学习 - 线性代数

本文适合深度学习初学者，涵盖必需的线性代数概念！

📋 目录

1. 基本数学对象
2. 张量运算
3. 降维操作
4. 线性代数运算
5. 范数
6. 实际应用

🔢 1. 基本数学对象

1.1 标量（Scalar）

标量由只有一个元素的张量表示，是最简单的数学对象。

数学表示：通常用小写字母表示，如 $x$、$y$、$z$

特点：

零维张量
只有大小，没有方向
是向量和矩阵的基本组成单位

import torch

# 创建标量
x = torch.tensor(3.0)
y = torch.tensor(2.0)

print(f"x = {x}")
print(f"y = {y}")
print(f"x + y = {x + y}")
print(f"x * y = {x * y}")
print(f"x / y = {x / y}")
print(f"x ** y = {x ** y}")

1.2 向量（Vector）

向量是一个标量数组，通常用粗体小写字母表示。

数学表示：$\mathbf{x} = [x_1, x_2, \ldots, x_n]^{\top}$

特点：

一维张量
有 $n$ 个元素的向量属于 $\mathbb{R}^n$
可以表示位置、速度、方向等

import torch

# 创建向量
x = torch.arange(4)
print(f"向量 x: {x}")

# 访问元素
print(f"第三个元素: {x[2]}")

# 向量长度
print(f"向量长度: {len(x)}")

# 向量形状
print(f"向量形状: {x.shape}")

1.3 矩阵（Matrix）

矩阵是二维数组，通常用粗体大写字母表示。

数学表示：
$$\mathbf{A} = \begin{pmatrix}
a_{11} & a_{12} & \cdots & a_{1n} \
a_{21} & a_{22} & \cdots & a_{2n} \
\vdots & \vdots & \ddots & \vdots \
a_{m1} & a_{m2} & \cdots & a_{mn}
\end{pmatrix}$$

特点：

二维张量
$m \times n$ 矩阵有 $m$ 行 $n$ 列
广泛用于线性变换

import torch

# 创建矩阵
A = torch.arange(20).reshape(5, 4)
print(f"矩阵 A:\n{A}")

# 矩阵形状
print(f"矩阵形状: {A.shape}")

# 矩阵转置
print(f"转置 A^T:\n{A.T}")

# 访问元素
print(f"A[1,2] = {A[1,2]}")

1.4 张量（Tensor）

张量是具有任意数量轴的 $n$ 维数组，是标量、向量、矩阵的一般化。

特点：

可以有任意数量的轴
在深度学习中广泛使用
可以表示图像、视频等高维数据

import torch

# 创建三维张量
X = torch.arange(24).reshape(2, 3, 4)
print(f"三维张量 X:\n{X}")
print(f"张量形状: {X.shape}")

# 创建四维张量（常用于图像数据）
Y = torch.arange(48).reshape(2, 3, 4, 2)
print(f"四维张量形状: {Y.shape}")

⚙️ 2. 张量运算

2.1 基本算术运算

import torch

A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = A.clone()  # 克隆A来分配新内存

print("矩阵加法:")
print(A + B)

print("\n矩阵乘法（按元素）:")
print(A * B)

print("\n矩阵除法:")
print(A / B)

print("\n矩阵指数:")
print(A ** 2)

import torch

a = torch.arange(3).reshape(3, 1)
b = torch.arange(2).reshape(1, 2)

print(f"a: {a}")
print(f"b: {b}")
print(f"a + b:\n{a + b}")

import torch

X = torch.arange(12, dtype=torch.float32).reshape(3, 4)
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

# 按行连接（axis=0）
print("按行连接:")
print(torch.cat((X, Y), dim=0))

# 按列连接（axis=1）  
print("\n按列连接:")
print(torch.cat((X, Y), dim=1))

2.2 张量算法的基本性质

以下性质对于理解深度学习中的计算至关重要

加法交换律：$\mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A}$

乘法交换律：$\mathbf{A} \odot \mathbf{B} = \mathbf{B} \odot \mathbf{A}$

import torch

A = torch.arange(6).reshape(2, 3).float()
B = torch.ones(2, 3)

print("A + B =", A + B)
print("B + A =", B + A)
print("相等:", torch.equal(A + B, B + A))

加法结合律：$(\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C})$

乘法结合律：$(\mathbf{A} \odot \mathbf{B}) \odot \mathbf{C} = \mathbf{A} \odot (\mathbf{B} \odot \mathbf{C})$

import torch

A = torch.arange(6).reshape(2, 3).float()
B = torch.ones(2, 3)
C = torch.full((2, 3), 2.0)

left = (A + B) + C
right = A + (B + C)
print("(A + B) + C =", left)
print("A + (B + C) =", right)
print("相等:", torch.equal(left, right))

分配律：$\mathbf{A} \odot (\mathbf{B} + \mathbf{C}) = \mathbf{A} \odot \mathbf{B} + \mathbf{A} \odot \mathbf{C}$

import torch

A = torch.arange(6).reshape(2, 3).float()
B = torch.ones(2, 3)
C = torch.full((2, 3), 2.0)

left = A * (B + C)
right = A * B + A * C
print("A * (B + C) =", left)
print("A * B + A * C =", right)
print("相等:", torch.equal(left, right))

📉 3. 降维操作

3.1 求和与均值

import torch

A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
print(f"原始矩阵 A:\n{A}")

# 所有元素求和
print(f"所有元素求和: {A.sum()}")

# 按轴求和
print(f"沿轴0求和 (列): {A.sum(axis=0)}")
print(f"沿轴1求和 (行): {A.sum(axis=1)}")

# 多轴求和
print(f"沿轴[0,1]求和: {A.sum(axis=[0, 1])}")

import torch

A = torch.arange(20, dtype=torch.float32).reshape(5, 4)

# 均值
print(f"所有元素均值: {A.mean()}")
print(f"沿轴0均值: {A.mean(axis=0)}")
print(f"沿轴1均值: {A.mean(axis=1)}")

# 等价操作
print(f"sum/numel = {A.sum() / A.numel()}")

import torch

A = torch.arange(20, dtype=torch.float32).reshape(5, 4)

# 保持轴数的求和
sum_A = A.sum(axis=1, keepdims=True)
print(f"非降维求和结果形状: {sum_A.shape}")
print(f"非降维求和结果:\n{sum_A}")

# 用于广播
print(f"A / sum_A:\n{A / sum_A}")

3.2 累积求和

import torch

A = torch.arange(20, dtype=torch.float32).reshape(5, 4)

# 沿轴0累积求和
print("沿轴0累积求和:")
print(A.cumsum(axis=0))

# 沿轴1累积求和  
print("\n沿轴1累积求和:")
print(A.cumsum(axis=1))

🔢 4. 线性代数运算

4.1 点积（Dot Product）

定义：给定两个向量 $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$，它们的点积是：

$$\mathbf{x}^{\top} \mathbf{y} = \sum_{i=1}^{d} x_i y_i$$

几何意义：

表示两个向量的相似程度
用于计算投影
广泛用于机器学习算法中

import torch

x = torch.arange(4, dtype=torch.float32)
y = torch.ones(4, dtype=torch.float32)

print(f"x = {x}")
print(f"y = {y}")
print(f"点积: {torch.dot(x, y)}")

# 等价计算
print(f"手动计算: {torch.sum(x * y)}")

4.2 矩阵-向量积

定义：矩阵 $\mathbf{A} \in \mathbb{R}^{m \times n}$ 和向量 $\mathbf{x} \in \mathbb{R}^n$ 的乘积：

$$\mathbf{A}\mathbf{x} = \begin{pmatrix}
\mathbf{a}^{\top}{1} \
\mathbf{a}^{\top}{2} \
\vdots \
\mathbf{a}^{\top}{m}
\end{pmatrix}\mathbf{x} = \begin{pmatrix}
\mathbf{a}^{\top}{1} \mathbf{x} \
\mathbf{a}^{\top}{2} \mathbf{x} \
\vdots \
\mathbf{a}^{\top}{m} \mathbf{x}
\end{pmatrix}$$

应用：

线性变换
神经网络前向传播
解线性方程组

import torch

A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
x = torch.arange(4, dtype=torch.float32)

print(f"矩阵 A 形状: {A.shape}")
print(f"向量 x 形状: {x.shape}")

result = torch.mv(A, x)
print(f"矩阵-向量积: {result}")
print(f"结果形状: {result.shape}")

4.3 矩阵-矩阵乘法

定义：矩阵 $\mathbf{A} \in \mathbb{R}^{n \times k}$ 和 $\mathbf{B} \in \mathbb{R}^{k \times m}$ 的乘积：

$$[\mathbf{A}\mathbf{B}]{i,j} = \sum{l=1}^{k} a_{i,l} b_{l,j}$$

重要性质：

不满足交换律：$\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}$
满足结合律：$(\mathbf{A}\mathbf{B})\mathbf{C} = \mathbf{A}(\mathbf{B}\mathbf{C})$
满足分配律：$\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{A}\mathbf{B} + \mathbf{A}\mathbf{C}$

import torch

A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = torch.ones(4, 3, dtype=torch.float32)

print(f"矩阵 A 形状: {A.shape}")
print(f"矩阵 B 形状: {B.shape}")

C = torch.mm(A, B)
print(f"矩阵乘积 C 形状: {C.shape}")
print(f"矩阵乘积 C:\n{C}")

📏 5. 范数

5.1 向量范数

定义：$L_2$ 范数是向量元素平方和的平方根：

$$|\mathbf{x}|2 = \sqrt{\sum{i=1}^{n} x_i^2}$$

特点：

也称为欧几里得范数
表示向量的长度
在深度学习中广泛使用

import torch

x = torch.tensor([3.0, -4.0])
print(f"向量 x: {x}")
print(f"L2范数: {torch.norm(x)}")
print(f"手动计算: {torch.sqrt(torch.sum(x * x))}")

定义：$L_1$ 范数是向量元素绝对值之和：

$$|\mathbf{x}|1 = \sum{i=1}^{n} |x_i|$$

特点：

也称为曼哈顿范数
对异常值不敏感
常用于稀疏模型

import torch

x = torch.tensor([3.0, -4.0])
print(f"向量 x: {x}")
print(f"L1范数: {torch.sum(torch.abs(x))}")

定义：$L_\infty$ 范数是向量元素绝对值的最大值：

$$|\mathbf{x}|_\infty = \max_i |x_i|$$

import torch

x = torch.tensor([3.0, -4.0, 2.0])
print(f"向量 x: {x}")
print(f"L∞范数: {torch.max(torch.abs(x))}")

5.2 矩阵范数

定义：Frobenius范数是矩阵元素平方和的平方根：

$$|\mathbf{X}|F = \sqrt{\sum{i=1}^m \sum_{j=1}^n x_{ij}^2}$$

特点：

类似于向量的 $L_2$ 范数
满足向量范数的所有性质

import torch

X = torch.ones((4, 9))
print(f"矩阵形状: {X.shape}")
print(f"Frobenius范数: {torch.norm(X)}")
print(f"手动计算: {torch.sqrt(torch.sum(X * X))}")

5.3 范数的应用

在深度学习中的应用：

正则化：防止过拟合
损失函数：衡量预测误差
优化算法：梯度裁剪
相似度计算：向量相似性

🎯 6. 实际应用

6.1 神经网络中的线性变换

import torch
import torch.nn as nn

# 简单的全连接层示例
class LinearLayer(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()
        self.linear = nn.Linear(input_size, output_size)
    
    def forward(self, x):
        return self.linear(x)

# 创建层和数据
layer = LinearLayer(4, 3)
x = torch.randn(2, 4)  # batch_size=2, input_size=4

print(f"输入形状: {x.shape}")
output = layer(x)
print(f"输出形状: {output.shape}")
print(f"权重矩阵形状: {layer.linear.weight.shape}")

import torch

# 模拟批量数据处理
batch_size = 32
input_dim = 784  # 28x28图像展平
hidden_dim = 128

# 权重矩阵
W = torch.randn(input_dim, hidden_dim)
b = torch.randn(hidden_dim)

# 批量输入
X = torch.randn(batch_size, input_dim)

# 线性变换
output = torch.mm(X, W) + b
print(f"批量输出形状: {output.shape}")

6.2 数据预处理

import torch

# 数据标准化示例
data = torch.randn(100, 5)

# 计算均值和标准差
mean = data.mean(dim=0, keepdim=True)
std = data.std(dim=0, keepdim=True)

# 标准化
normalized_data = (data - mean) / std

print(f"原始数据均值: {data.mean(dim=0)}")
print(f"标准化后均值: {normalized_data.mean(dim=0)}")
print(f"标准化后标准差: {normalized_data.std(dim=0)}")

import torch

# L2归一化示例
vectors = torch.randn(5, 10)

# L2归一化
normalized_vectors = vectors / torch.norm(vectors, dim=1, keepdim=True)

print(f"归一化前范数: {torch.norm(vectors, dim=1)}")
print(f"归一化后范数: {torch.norm(normalized_vectors, dim=1)}")

📚 练习与思考

点击查看练习题

基础练习

证明矩阵转置的转置等于原矩阵：$(\mathbf{A}^\top)^\top = \mathbf{A}$

证明转置的分配律：$\mathbf{A}^\top + \mathbf{B}^\top = (\mathbf{A} + \mathbf{B})^\top$

对于方阵 $\mathbf{A}$，$\mathbf{A} + \mathbf{A}^\top$ 是否总是对称的？

编程练习

实现一个函数计算两个向量的余弦相似度

编写代码验证矩阵乘法的结合律

实现批量矩阵乘法并测试性能

思考题

在深度学习中，为什么要使用矩阵运算而不是循环？

L1和L2正则化在机器学习中有什么不同的作用？

📖 扩展阅读

动手学深度学习

https://zh-v2.d2l.ai/

📝 总结

核心要点回顾：

标量、向量、矩阵、张量是深度学习的基础数学对象
掌握张量运算对理解神经网络至关重要
范数在正则化和损失函数中发挥重要作用
线性代数运算是深度学习计算的核心

学习建议：理论与实践结合，多动手编程验证数学概念，这样能更好地理解深度学习中的数学原理。