摘要:
合集:AI案例-CV-计算机服务业
数据集:加拿大高等研究院CIFAR发布的100个类别计算机视觉数据集
数据集价值:用于图像分类和机器学习任务。
解决方案:PyTorch框架、ResNet101神经网络模型
一、问题描述
CIFAR是 Canadian Institute For Advanced Research(加拿大高等研究院)的缩写,CIFAR-100数据集是CIFAR-10数据集的扩展,具有更多的类别和更复杂的分类任务,用于图像分类和机器学习任务。CIFAR-100数据集的开发目的是为了推动图像分类技术的发展,特别是在深度学习和计算机视觉领域。它由Hinton的学生Alex Krizhevsky和Ilya Sutskever收集,旨在通过提供标注好的图像数据,训练深度学习模型以识别图片中的目标
二、数据集内容
CIFAR-100数据集包含60,000张32×32像素的彩色图像,分为100个类别,每个类别有600张图像。CIFAR-100中的100个类别被分为20个超类。每张图像都带有一个“细粒度”标签(它所属的类别)和一个“粗粒度”标签(它所属的超类)。有50,000张训练图像和10,000张测试图像。
基本信息
CIFAR-100数据集基本信息:
图像数量:60,000张
图像尺寸:32x32像素
颜色通道:3(RGB)
类别数量:100
每个类别的图像数量:每个类别包含600张图像
训练集大小:50,000张
测试集大小:10,000张
数据结构
元数据文件包含每个类别和超类的标签名称。。
Classes:
1-5) beaver, dolphin, otter, seal, whale
6-10) aquarium fish, flatfish, ray, shark, trout
11-15) orchids, poppies, roses, sunflowers, tulips
16-20) bottles, bowls, cans, cups, plates
21-25) apples, mushrooms, oranges, pears, sweet peppers
26-30) clock, computer keyboard, lamp, telephone, television
31-35) bed, chair, couch, table, wardrobe
36-40) bee, beetle, butterfly, caterpillar, cockroach
41-45) bear, leopard, lion, tiger, wolf
46-50) bridge, castle, house, road, skyscraper
51-55) cloud, forest, mountain, plain, sea
56-60) camel, cattle, chimpanzee, elephant, kangaroo
61-65) fox, porcupine, possum, raccoon, skunk
66-70) crab, lobster, snail, spider, worm
71-75) baby, boy, girl, man, woman
76-80) crocodile, dinosaur, lizard, snake, turtle
81-85) hamster, mouse, rabbit, shrew, squirrel
86-90) maple, oak, palm, pine, willow
91-95) bicycle, bus, motorcycle, pickup truck, train
96-100) lawn-mower, rocket, streetcar, tank, tractor
and the list of the 20 superclasses:
1) aquatic mammals (classes 1-5)
2) fish (classes 6-10)
3) flowers (classes 11-15)
4) food containers (classes 16-20)
5) fruit and vegetables (classes 21-25)
6) household electrical devices (classes 26-30)
7) household furniture (classes 31-35)
8) insects (classes 36-40)
9) large carnivores (classes 41-45)
10) large man-made outdoor things (classes 46-50)
11) large natural outdoor scenes (classes 51-55)
12) large omnivores and herbivores (classes 56-60)
13) medium-sized mammals (classes 61-65)
14) non-insect invertebrates (classes 66-70)
15) people (classes 71-75)
16) reptiles (classes 76-80)
17) small mammals (classes 81-85)
18) trees (classes 86-90)
19) vehicles 1 (classes 91-95)
20) vehicles 2 (classes 96-100)
致谢
Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.
数据集使用许可协议
Creative Commons Attribution 4.0 License
三、识别样例
源码:CIFAR-100-ResNet101.ipynb
安装pytorch2.4.1
选择合适的CUDA版本进行安装,例如pytorch==2.4.1:
conda create -n pytorch241-gpu python=3.10
conda activate pytorch241-gpu
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.1 -c pytorch -c nvidia
导入库
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
# 环境准备
import numpy as np # numpy数组库
import matplotlib.pyplot as plt # 画图库
import time as time
import torch # torch基础库
import torch.nn as nn # torch神经网络库
import torch.nn.functional as F
import torchvision.datasets as dataset # 公开数据集的下载和管理
import torchvision.transforms as transforms # 公开数据集的预处理库及格式转换
import torchvision.utils as utils
import torch.utils.data as data_utils # 对数据集进行分批加载的工具集
from PIL import Image # 图片显示
from collections import OrderedDict
import torchvision.models as models
print(torch.__version__)
print(torch.cuda.is_available())
print(torch.version.cuda)
print(torch.backends.cudnn.version())
输出:
2.4.1
True
12.1
90100
以下是工作流程:
1、加载数据集
# 数据集格式转换
transform_train = transforms.Compose([transforms.Resize(256), # transforms.Scale(256)
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
transform_test = transforms.Compose([transforms.Resize(256), # transforms.Scale(256)
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
# 训练数据集
train_data = dataset.CIFAR100(root = "./", train = True, transform = transform_train, download = True)
# 测试数据集
test_data = dataset.CIFAR100(root = "./", train = False, transform = transform_test, download = True)
print(train_data)
print("train_data size = ", len(train_data))
print("")
print(test_data)
print("test_data size = ", len(test_data))
输出:
Files already downloaded and verified
Files already downloaded and verified
Dataset CIFAR100
Number of datapoints: 50000
Root location: ./
Split: Train
StandardTransform
Transform: Compose(
Resize(size=256, interpolation=bilinear, max_size=None, antialias=True)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
train_data size = 50000
Dataset CIFAR100
Number of datapoints: 10000
Root location: ./
Split: Test
StandardTransform
Transform: Compose(
Resize(size=256, interpolation=bilinear, max_size=None, antialias=True)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
test_data size = 10000
批量读取数据:
# 批量数据读取
batch_size = 32
# batch_size 是每个批次读取的图片数量,shuffle 表示读取到的数据是否需要随机打乱顺序
train_loader = data_utils.DataLoader(dataset = train_data, batch_size = batch_size, shuffle = True)
test_loader = data_utils.DataLoader(dataset = test_data, batch_size = batch_size, shuffle = True)
print(len(train_data), len(train_data) / batch_size)
print(len(test_data), len(test_data) / batch_size)
输出:
# 批量数据读取
batch_size = 32
# batch_size 是每个批次读取的图片数量,shuffle 表示读取到的数据是否需要随机打乱顺序
train_loader = data_utils.DataLoader(dataset = train_data, batch_size = batch_size, shuffle = True)
test_loader = data_utils.DataLoader(dataset = test_data, batch_size = batch_size, shuffle = True)
print(len(train_data), len(train_data) / batch_size)
print(len(test_data), len(test_data) / batch_size)
2、定义模型
# 直接使用 torchvision 框架提供的预定义模型,设定输出分类的种类数100,默认为1000分类
net = models.resnet101(num_classes = 100)
#print(net)
# 导入模型预先参数
# 预训练参数预先从官网上下载
# model = models.resnet101(pretrained=True)
# model
net_params_path = "./models/resnet101-63fe2227.pth"
# 导入预训练参数
net_params = torch.load(net_params_path)
print(net_params)
定义网络预测输出:
# 测试网络是否能够工作
print("定义测试数据")
input = torch.randn(1, 3, 224, 224)
print(input.shape)
print("\n设定在训练模式, 由于随机dropout,导致相同的输入,输出每次都不相同")
net.train()
print("Net的输出方法1:")
out = net(input)
print(out.shape)
print(out)
print("\nNet的输出方法2:")
out = net.forward(input)
print(out)
定义Loss和优化器
loss_fn = nn.CrossEntropyLoss()
print(loss_fn)
Learning_rate = 0.01 # 学习率
# optimizer = SGD 基本梯度下降法
# parameters 指明要优化的参数列表
# lr 指明学习率
# optimizer = torch.optim.Adam(model.parameters(), lr = Learning_rate)
optimizer = torch.optim.SGD(net.parameters(), lr = Learning_rate, momentum = 0.9)
print(optimizer)
3、训练和运行结果
采用预定义的ResNet101模型进行训练:
# Assume that we are on a CUDA machine, then this should print a CUDA device:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
# 把网络转移到GPU
# net.to(device) # 自适应选择法
net.cuda() # 强制指定法
# 把loss计算转移到GPU
# loss_fn = loss_fn.to(device) # 自适应选择法
loss_fn.cuda() # 强制指定法
# 定义迭代次数
epochs = 1 # 30
loss_history = [] # 训练过程中的loss数据
accuracy_history = [] # 中间的预测结果
accuracy_batch = 0.0
# 网络进入训练模式
model = net.train()
train_start = time.time()
print('Train start at - {}'.format(time.strftime("%X", time.localtime())))
for i in range(0, epochs):
epoch_start = time.time()
for j, (x_train, y_train) in enumerate(train_loader):
# 设置模型在 train 模式
# net.train()
# 指定数据处理的硬件单元
x_train = x_train.to(device)
# x_train = x_train.cuda()
y_train = y_train.to(device)
# y_train = y_train.cuda()
# (0) 复位优化器的梯度
optimizer.zero_grad()
# (1) 前向计算
y_pred = net(x_train)
# (2) 计算loss
loss = loss_fn(y_pred, y_train)
# (3) 反向求导
loss.backward()
# (4) 反向迭代
optimizer.step()
# 记录训练过程中的损失值
loss_history.append(loss.item()) # loss for a batch
# 记录训练过程中的准确率
number_batch = y_train.size()[0] # 图片的个数
_, predicted = torch.max(y_pred.data, dim = 1)
correct_batch = (predicted == y_train).sum().item() # 预测正确的数目
accuracy_batch = 100 * correct_batch/number_batch
accuracy_history.append(accuracy_batch)
if (j % 10 == 0):
print('Epoch {} batch {} in {}, loss = {:.4f} accuracy = {:.4f}%, {}'.format(i, j , len(train_data)/batch_size, loss.item(), accuracy_batch, time.strftime("%X", time.localtime())))
epoch_end = time.time()
epoch_cost = epoch_end - epoch_start
print('Epoch {} cost {}s '.format(i, epoch_cost))
train_end = time.time()
train_cost = train_end - train_start
print('\nTrain finshed at - {}'.format(time.strftime("%X", time.localtime())))
print('Train cost {}s '.format(train_cost))
print("Final loss = ", loss.item())
print("Final accuracy = ", accuracy_batch)
输出:
Train start at - 10:09:13
Epoch 0 batch 0 in 1562.5, loss = 5.0165 accuracy = 0.0000%, 10:09:58
Epoch 0 batch 10 in 1562.5, loss = 5.7989 accuracy = 0.0000%, 10:17:07
...
4、预测和运行结果
index = 0
print("获取一个batch样本")
images, labels = next(iter(test_loader))
print(images.shape)
print(labels.shape)
print(labels)
print("\n对batch中所有样本进行预测")
outputs = net(images)
print(outputs.data.shape)
print("\n对batch中每个样本的预测结果,选择最可能的分类")
_, predicted = torch.max(outputs, 1)
print(predicted.data.shape)
print(predicted)
print("\n对batch中的所有结果进行比较")
bool_results = (predicted == labels)
print(bool_results.shape)
print(bool_results)
print("\n统计预测正确样本的个数和精度")
corrects = bool_results.sum().item()
accuracy = corrects/(len(bool_results))
print("corrects = ", corrects)
print("accuracy = ", accuracy)
print("\n样本index = ", index)
print("标签值:", labels[index]. item())
print("分类可能性:", outputs.data[index].numpy())
print("最大可能性:", predicted.data[index].item())
print("正确性:", bool_results.data[index].item())
源码开源协议
MIT License