本文基于d2l项目内容整理,介绍在ImageNet挑战赛中脱颖而出的GoogLeNet网络,包括其核心的Inception块设计和并行连接思想。

CNN架构发展历程:

从 LeNet 首次将卷积的思想引入计算机视觉,再到后来的 AlexNet、VGG 和 NiN 等,这些网络除了考虑如何变得更深而复杂,也在不断地探索哪种窗口大小(卷积核大小)更适合 ImageNet 的数据。一定程度受美国华纳兄弟于 2010 年发行的科幻动作惊悚片《盗梦空间 (Inception)》影响,2014 年腾空出世的 GoogLeNet 在保持类似精度的条件下,以较少的计算复杂度成为 ImageNet 图像识别挑战赛最有效的模型之一。

双重设计理念

We Need To Go Deeper - 通过更深的网络结构提高性能,借鉴 NiN 网络设计

全面启动 - 同时引入多种卷积核尺寸的并行组合 (1×1

梦中梦 - 逐层深入的多层次特征捕捉概念

受《盗梦空间 (Inception)》影响

一方面,电影《盗梦空间 (Inception)》的台词”We Need To Go Deeper”被 GoogLeNet 用于强调模型通过更深的网络结构,进一步提高性能的方式。在具体实现上,借鉴了 NiN 网络。

另一方面,电影台湾译名”全面启动”似乎更能说明 GoogLeNet 同时引入多种卷积核尺寸的并行组合(1×1、3×3 和 5×5),捕捉多层次特征。这与电影中”梦中梦”逐层深入的概念不谋而合。

盗梦空间电影海报
盗梦空间电影海报

实现简化说明:

随着观念和框架的进步,这里在初始版本的 GoogLeNet 上删去了不必要的、为稳定训练而设置的特性,简化了实现。


1. GoogLeNet 网络架构设计

1.1 Inception 块的核心思想

Inception 块设计理念:

Inception 块是实现并行连接的关键。为了捕捉不同尺度下的图像特征,使用 4 种不同的卷积窗口组合并行,最后在通道维度上进行连接输出。

1.2 四种并行路径设计

1×1 卷积层

  • 提取通道特征并降维
  • 计算效率最高的路径

1×1 卷积层 → 3×3 卷积层

  • 先对通道降维,减少计算量
  • 随后提取较大的空间特征

1×1 卷积层 → 5×5 卷积层

  • 先对通道降维,减少计算量
  • 随后提取更大的空间特征

3×3 最大池化层 → 1×1 卷积层

  • 先下采样特征图,保留重要信息
  • 随后调整通道数,匹配其他路径
Inception块结构示意图
Inception块结构示意图

1.3 Inception 块的 PyTorch 实现

Inception 块IncepBlock的 PyTorch 实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from typing import Tuple

import torch
from torch import nn


class IncepBlock(nn.Module):
def __init__(self, in_channels: int, c1_out: int, c2_out: Tuple[int, int], c3_out: Tuple[int, int], c4_out: int):
super().__init__()

self.channel1 = nn.Sequential( # 路径一: 1×1 卷积
nn.Conv2d(in_channels, c1_out, kernel_size=1), nn.ReLU()
)

self.channel2 = nn.Sequential( # 路径二: 1×1 卷积 -> 3×3 卷积
nn.Conv2d(in_channels, c2_out[0], kernel_size=1), nn.ReLU(),
nn.Conv2d(c2_out[0], c2_out[1], kernel_size=3, padding=1), nn.ReLU()
)

self.channel3 = nn.Sequential( # 路径三: 1×1 卷积 -> 5×5 卷积
nn.Conv2d(in_channels, c3_out[0], kernel_size=1), nn.ReLU(),
nn.Conv2d(c3_out[0], c3_out[1], kernel_size=5, padding=2), nn.ReLU()
)

self.channel4 = nn.Sequential( # 路径四: 3×3 最大池化 -> 1×1 卷积
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
nn.Conv2d(in_channels, c4_out, kernel_size=1), nn.ReLU()
)

def forward(self, x):
output1 = self.channel1(x)
output2 = self.channel2(x)
output3 = self.channel3(x)
output4 = self.channel4(x)
output = torch.cat([output1, output2, output3, output4], dim=1)
return output

2. GoogLeNet 完整网络架构

2.1 网络整体设计

GoogLeNet 架构特点:

GoogLeNet 在进入 Inception 块之前,首先需要经过一系列层的逐步特征提取与数据维度压缩:

7×7 卷积层以较大的感受野捕获特征,并由最大池化层下采样

1×1 卷积层用于降维并进行跨通道特征融合

3×3 卷积层提取更细粒度的特征,并由最大池化层下采样

共有 9 个 Inception 块,每个 Inception 块之间用最大池化层降维

最后使用全局平均池化层和全连接层获得输出

2.2 适配 Fashion-MNIST 的设计

数据集适配说明:

为了继续在 Fashion-MNIST 数据集上测试 GoogLeNet 网络,需要将图像像素修改为 96×96 以简化计算。复用 Inception 块,GoogLeNet 网络的 PyTorch 实现如下:

GoogLeNet完整网络架构图
GoogLeNet完整网络架构图
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
from typing import Tuple

import torch
from torch import nn, Tensor


class IncepBlock(nn.Module):
def __init__(self, in_channels: int, c1_out: int, c2_out: Tuple[int, int], c3_out: Tuple[int, int], c4_out: int):
super().__init__()

self.channel1 = nn.Sequential( # 路径一: 1×1 卷积
nn.Conv2d(in_channels, c1_out, kernel_size=1), nn.ReLU()
)

self.channel2 = nn.Sequential( # 路径二: 1×1 卷积 -> 3×3 卷积
nn.Conv2d(in_channels, c2_out[0], kernel_size=1), nn.ReLU(),
nn.Conv2d(c2_out[0], c2_out[1], kernel_size=3, padding=1), nn.ReLU()
)

self.channel3 = nn.Sequential( # 路径三: 1×1 卷积 -> 5×5 卷积
nn.Conv2d(in_channels, c3_out[0], kernel_size=1), nn.ReLU(),
nn.Conv2d(c3_out[0], c3_out[1], kernel_size=5, padding=2), nn.ReLU()
)

self.channel4 = nn.Sequential( # 路径四: 3×3 最大汇聚 -> 1×1 卷积
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
nn.Conv2d(in_channels, c4_out, kernel_size=1), nn.ReLU()
)

def forward(self, x) -> Tensor:
output1 = self.channel1(x)
output2 = self.channel2(x)
output3 = self.channel3(x)
output4 = self.channel4(x)
output = torch.cat([output1, output2, output3, output4], dim=1)
return output


class GoogLeNet(nn.Module):
def __init__(self, num_classes: int):
super().__init__()
self.model = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=64, kernel_size=7, stride=2, padding=3), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),

nn.Conv2d(in_channels=64, out_channels=64, kernel_size=1), nn.ReLU(),
nn.Conv2d(in_channels=64, out_channels=192, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),

IncepBlock(in_channels=192, c1_out=64, c2_out=(96, 128), c3_out=(16, 32), c4_out=32),
IncepBlock(in_channels=256, c1_out=128, c2_out=(128, 192), c3_out=(32, 96), c4_out=64),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),

IncepBlock(in_channels=480, c1_out=192, c2_out=(96, 208), c3_out=(16, 48), c4_out=64),
IncepBlock(in_channels=512, c1_out=160, c2_out=(112, 224), c3_out=(24, 64), c4_out=64),
IncepBlock(in_channels=512, c1_out=128, c2_out=(128, 256), c3_out=(24, 64), c4_out=64),
IncepBlock(in_channels=512, c1_out=112, c2_out=(144, 288), c3_out=(32, 64), c4_out=64),
IncepBlock(in_channels=528, c1_out=256, c2_out=(160, 320), c3_out=(32, 128), c4_out=128),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),

IncepBlock(in_channels=832, c1_out=256, c2_out=(160, 320), c3_out=(32, 128), c4_out=128),
IncepBlock(in_channels=832, c1_out=384, c2_out=(192, 384), c3_out=(48, 128), c4_out=128),
nn.AdaptiveAvgPool2d(1), nn.Flatten(),
nn.Linear(in_features=1024, out_features=num_classes)
)

self._initialize_weights()

def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='relu')
if m.bias is not None: nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None: nn.init.constant_(m.bias, 0)

def forward(self, x) -> Tensor:
return self.model(x)

2.3 网络结构分析

使用torchinfo库的summary函数执行输出维度测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
from torchinfo import summary

model = GoogLeNet(num_classes=10)
summary(model, input_size=(1, 1, 96, 96))
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
GoogLeNet [1, 10] --
├─Sequential: 1-1 [1, 10] --
│ └─Conv2d: 2-1 [1, 64, 48, 48] 3,200
│ └─ReLU: 2-2 [1, 64, 48, 48] --
│ └─MaxPool2d: 2-3 [1, 64, 24, 24] --
│ └─Conv2d: 2-4 [1, 64, 24, 24] 4,160
│ └─ReLU: 2-5 [1, 64, 24, 24] --
│ └─Conv2d: 2-6 [1, 192, 24, 24] 110,784
│ └─ReLU: 2-7 [1, 192, 24, 24] --
│ └─MaxPool2d: 2-8 [1, 192, 12, 12] --
│ └─IncepBlock: 2-9 [1, 256, 12, 12] --
│ │ └─Sequential: 3-1 [1, 64, 12, 12] 12,352
│ │ └─Sequential: 3-2 [1, 128, 12, 12] 129,248
│ │ └─Sequential: 3-3 [1, 32, 12, 12] 15,920
│ │ └─Sequential: 3-4 [1, 32, 12, 12] 6,176
│ └─IncepBlock: 2-10 [1, 480, 12, 12] --
│ │ └─Sequential: 3-5 [1, 128, 12, 12] 32,896
│ │ └─Sequential: 3-6 [1, 192, 12, 12] 254,272
│ │ └─Sequential: 3-7 [1, 96, 12, 12] 85,120
│ │ └─Sequential: 3-8 [1, 64, 12, 12] 16,448
│ └─MaxPool2d: 2-11 [1, 480, 6, 6] --
│ └─IncepBlock: 2-12 [1, 512, 6, 6] --
│ │ └─Sequential: 3-9 [1, 192, 6, 6] 92,352
│ │ └─Sequential: 3-10 [1, 208, 6, 6] 226,096
│ │ └─Sequential: 3-11 [1, 48, 6, 6] 26,944
│ │ └─Sequential: 3-12 [1, 64, 6, 6] 30,784
│ └─IncepBlock: 2-13 [1, 512, 6, 6] --
│ │ └─Sequential: 3-13 [1, 160, 6, 6] 82,080
│ │ └─Sequential: 3-14 [1, 224, 6, 6] 283,472
│ │ └─Sequential: 3-15 [1, 64, 6, 6] 50,776
│ │ └─Sequential: 3-16 [1, 64, 6, 6] 32,832
│ └─IncepBlock: 2-14 [1, 512, 6, 6] --
│ │ └─Sequential: 3-17 [1, 128, 6, 6] 65,664
│ │ └─Sequential: 3-18 [1, 256, 6, 6] 360,832
│ │ └─Sequential: 3-19 [1, 64, 6, 6] 50,776
│ │ └─Sequential: 3-20 [1, 64, 6, 6] 32,832
│ └─IncepBlock: 2-15 [1, 528, 6, 6] --
│ │ └─Sequential: 3-21 [1, 112, 6, 6] 57,456
│ │ └─Sequential: 3-22 [1, 288, 6, 6] 447,408
│ │ └─Sequential: 3-23 [1, 64, 6, 6] 67,680
│ │ └─Sequential: 3-24 [1, 64, 6, 6] 32,832
│ └─IncepBlock: 2-16 [1, 832, 6, 6] --
│ │ └─Sequential: 3-25 [1, 256, 6, 6] 135,424
│ │ └─Sequential: 3-26 [1, 320, 6, 6] 545,760
│ │ └─Sequential: 3-27 [1, 128, 6, 6] 119,456
│ │ └─Sequential: 3-28 [1, 128, 6, 6] 67,712
│ └─MaxPool2d: 2-17 [1, 832, 3, 3] --
│ └─IncepBlock: 2-18 [1, 832, 3, 3] --
│ │ └─Sequential: 3-29 [1, 256, 3, 3] 213,248
│ │ └─Sequential: 3-30 [1, 320, 3, 3] 594,400
│ │ └─Sequential: 3-31 [1, 128, 3, 3] 129,184
│ │ └─Sequential: 3-32 [1, 128, 3, 3] 106,624
│ └─IncepBlock: 2-19 [1, 1024, 3, 3] --
│ │ └─Sequential: 3-33 [1, 384, 3, 3] 319,872
│ │ └─Sequential: 3-34 [1, 384, 3, 3] 823,872
│ │ └─Sequential: 3-35 [1, 128, 3, 3] 193,712
│ │ └─Sequential: 3-36 [1, 128, 3, 3] 106,624
│ └─AdaptiveAvgPool2d: 2-20 [1, 1024, 1, 1] --
│ └─Flatten: 2-21 [1, 1024] --
│ └─Linear: 2-22 [1, 10] 10,250
==========================================================================================
Total params: 5,977,530
Trainable params: 5,977,530
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 276.66
==========================================================================================
Input size (MB): 0.04
Forward/backward pass size (MB): 4.74
Params size (MB): 23.91
Estimated Total Size (MB): 28.69
==========================================================================================

网络参数统计:

  • 总参数量:5,977,530 个参数(约 598 万)
  • 计算复杂度:276.66 MB 的乘加运算
  • 内存占用:总计约 28.69 MB
  • 特点:通过 Inception 块的并行设计,在保持高性能的同时控制了参数数量

3. 模型训练与评估

3.1 训练配置与实现

继续使用training_tools.py中的工具训练评估模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
from typing import Tuple

import torch
from torch import nn, Tensor, optim

from training_tools import fashionMNIST_loader, Trainer


class IncepBlock(nn.Module):
def __init__(self, in_channels: int, c1_out: int, c2_out: Tuple[int, int], c3_out: Tuple[int, int], c4_out: int):
super().__init__()

self.channel1 = nn.Sequential( # 路径一: 1×1 卷积
nn.Conv2d(in_channels, c1_out, kernel_size=1), nn.ReLU()
)

self.channel2 = nn.Sequential( # 路径二: 1×1 卷积 -> 3×3 卷积
nn.Conv2d(in_channels, c2_out[0], kernel_size=1), nn.ReLU(),
nn.Conv2d(c2_out[0], c2_out[1], kernel_size=3, padding=1), nn.ReLU()
)

self.channel3 = nn.Sequential( # 路径三: 1×1 卷积 -> 5×5 卷积
nn.Conv2d(in_channels, c3_out[0], kernel_size=1), nn.ReLU(),
nn.Conv2d(c3_out[0], c3_out[1], kernel_size=5, padding=2), nn.ReLU()
)

self.channel4 = nn.Sequential( # 路径四: 3×3 最大汇聚 -> 1×1 卷积
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
nn.Conv2d(in_channels, c4_out, kernel_size=1), nn.ReLU()
)

def forward(self, x) -> Tensor:
output1 = self.channel1(x)
output2 = self.channel2(x)
output3 = self.channel3(x)
output4 = self.channel4(x)
output = torch.cat([output1, output2, output3, output4], dim=1)
return output


class GoogLeNet(nn.Module):
def __init__(self, num_classes: int):
super().__init__()
self.model = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=64, kernel_size=7, stride=2, padding=3), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),

nn.Conv2d(in_channels=64, out_channels=64, kernel_size=1), nn.ReLU(),
nn.Conv2d(in_channels=64, out_channels=192, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),

IncepBlock(in_channels=192, c1_out=64, c2_out=(96, 128), c3_out=(16, 32), c4_out=32),
IncepBlock(in_channels=256, c1_out=128, c2_out=(128, 192), c3_out=(32, 96), c4_out=64),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),

IncepBlock(in_channels=480, c1_out=192, c2_out=(96, 208), c3_out=(16, 48), c4_out=64),
IncepBlock(in_channels=512, c1_out=160, c2_out=(112, 224), c3_out=(24, 64), c4_out=64),
IncepBlock(in_channels=512, c1_out=128, c2_out=(128, 256), c3_out=(24, 64), c4_out=64),
IncepBlock(in_channels=512, c1_out=112, c2_out=(144, 288), c3_out=(32, 64), c4_out=64),
IncepBlock(in_channels=528, c1_out=256, c2_out=(160, 320), c3_out=(32, 128), c4_out=128),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),

IncepBlock(in_channels=832, c1_out=256, c2_out=(160, 320), c3_out=(32, 128), c4_out=128),
IncepBlock(in_channels=832, c1_out=384, c2_out=(192, 384), c3_out=(48, 128), c4_out=128),
nn.AdaptiveAvgPool2d(1), nn.Flatten(),
nn.Linear(in_features=1024, out_features=num_classes)
)

self._initialize_weights()

def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='relu')
if m.bias is not None: nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None: nn.init.constant_(m.bias, 0)

def forward(self, x) -> Tensor:
return self.model(x)


if __name__ == '__main__':
BATCH_SIZE = 128
EPOCHS_NUM = 30
LEARNING_RATE = 0.005

model = GoogLeNet(num_classes=10)
train_loader, test_loader = fashionMNIST_loader(BATCH_SIZE, resize=96)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), LEARNING_RATE)
platform = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

with Trainer(model, train_loader, test_loader, criterion, optimizer, platform) as trainer:
trainer.train(EPOCHS_NUM)

3.2 训练结果与分析

查看完整训练过程
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
001/30 轮,训练损失:1.0233,训练精度:64.77%,测试损失:0.5896,测试精度:78.30%
002/30 轮,训练损失:0.5145,训练精度:81.14%,测试损失:0.4663,测试精度:83.26%
003/30 轮,训练损失:0.4330,训练精度:84.10%,测试损失:0.4222,测试精度:84.27%
004/30 轮,训练损失:0.3915,训练精度:85.55%,测试损失:0.3902,测试精度:85.33%
005/30 轮,训练损失:0.3597,训练精度:86.72%,测试损失:0.6181,测试精度:78.45%
006/30 轮,训练损失:0.3360,训练精度:87.59%,测试损失:0.4024,测试精度:85.58%
007/30 轮,训练损失:0.3194,训练精度:88.19%,测试损失:0.3629,测试精度:86.59%
008/30 轮,训练损失:0.3041,训练精度:88.78%,测试损失:0.3193,测试精度:88.33%
009/30 轮,训练损失:0.2902,训练精度:89.14%,测试损失:0.3558,测试精度:86.62%
010/30 轮,训练损失:0.2797,训练精度:89.57%,测试损失:0.3258,测试精度:88.02%
011/30 轮,训练损失:0.2684,训练精度:90.09%,测试损失:0.2906,测试精度:89.48%
012/30 轮,训练损失:0.2612,训练精度:90.34%,测试损失:0.3176,测试精度:88.67%
013/30 轮,训练损失:0.2493,训练精度:90.71%,测试损失:0.2911,测试精度:89.44%
014/30 轮,训练损失:0.2429,训练精度:90.96%,测试损失:0.3492,测试精度:87.41%
015/30 轮,训练损失:0.2351,训练精度:91.34%,测试损失:0.3176,测试精度:88.10%
016/30 轮,训练损失:0.2292,训练精度:91.44%,测试损失:0.2931,测试精度:88.95%
017/30 轮,训练损失:0.2221,训练精度:91.71%,测试损失:0.3761,测试精度:86.24%
018/30 轮,训练损失:0.2123,训练精度:92.17%,测试损失:0.2816,测试精度:89.70%
019/30 轮,训练损失:0.2087,训练精度:92.14%,测试损失:0.3294,测试精度:88.39%
020/30 轮,训练损失:0.2000,训练精度:92.52%,测试损失:0.2823,测试精度:89.97%
021/30 轮,训练损失:0.1973,训练精度:92.78%,测试损失:0.2764,测试精度:90.12%
022/30 轮,训练损失:0.1918,训练精度:92.79%,测试损失:0.2800,测试精度:89.67%
023/30 轮,训练损失:0.1846,训练精度:93.20%,测试损失:0.2640,测试精度:90.43%
024/30 轮,训练损失:0.1796,训练精度:93.38%,测试损失:0.2875,测试精度:89.47%
025/30 轮,训练损失:0.1744,训练精度:93.61%,测试损失:0.2566,测试精度:90.57%
026/30 轮,训练损失:0.1676,训练精度:93.80%,测试损失:0.2848,测试精度:89.85%
027/30 轮,训练损失:0.1627,训练精度:94.03%,测试损失:0.2633,测试精度:90.86%
028/30 轮,训练损失:0.1585,训练精度:94.17%,测试损失:0.2793,测试精度:90.02%
029/30 轮,训练损失:0.1545,训练精度:94.38%,测试损失:0.2631,测试精度:90.83%
030/30 轮,训练损失:0.1463,训练精度:94.52%,测试损失:0.2876,测试精度:90.21%
GoogLeNet训练过程可视化
GoogLeNet训练过程可视化

训练结果分析:

  • 最终性能:GoogLeNet 在 Fashion-MNIST 上达到 94.52% 的训练精度和 90.21% 的测试精度
  • 收敛速度:相比传统CNN,Inception并行架构加速了特征学习过程
  • 泛化能力:通过多尺度特征捕捉,模型展现出良好的泛化性能
  • 参数效率:598万参数实现高性能,体现了并行设计的优势
  • 训练稳定性:Kaiming初始化和Xavier初始化保证了训练过程的稳定性

总结

本文深入介绍了GoogLeNet的创新架构设计:

  1. Inception块:通过四种并行路径同时捕捉不同尺度的特征,提高了网络的表达能力
  2. 并行连接:多尺度卷积核的并行组合有效解决了选择最优卷积核大小的问题
  3. 参数效率:1×1卷积的巧妙运用在降维的同时减少了计算复杂度
  4. 架构创新:受电影《盗梦空间》启发的”梦中梦”式递归设计理念
  5. 实用价值:在ImageNet挑战赛中以较少计算量获得优异性能,推动了深度学习发展

GoogLeNet的Inception思想为后续的深度学习架构设计提供了重要启发,特别是在多尺度特征融合和计算效率优化方面开创了新的范式。