摘要:
合集:AI案例-CV-人类生理心理
数据集:蒙特利尔脸部关键点识别数据集(Montreal Face Landmark Dataset)
数据集价值:用于定位和描述人脸的关键部位
解决方案:Pytorch框架、CNN模型
一、问题描述
脸部关键点识别是计算机视觉中的一个重要任务,主要用于定位和描述人脸的关键部位,如眼睛、鼻子、嘴巴和下巴等。这一技术对于人脸识别、表情分析、姿态估计等多种应用至关重要。
脸部关键点识别的任务目标是预测人脸图像上的关键点位置。这可以作为几个应用中的构建模块,例如:
- 在图像和视频中跟踪人脸
- 分析面部表情
- 为医疗诊断检测畸形面部标志
- 生物识别/面部识别
检测面部关键点是一个非常具有挑战性的问题。面部特征因个体而异,即使是单个个体,由于3D姿态、大小、位置、视角和光照条件等因素,也存在大量的变化。计算机视觉研究在解决这些困难方面取得了很大的进展,但仍有许多改进的机会。
二、数据集内容
本数据集是由蒙特利尔大学/Montreal的约书亚·本吉奥(Yoshua Bengio)博士慷慨提供的。该数据集通常与论文 《Facial Keypoint Detection with Neural Networks》 或相关研究关联。本数据集提供了一个基准数据,帮助你开始分析人脸图像。
数据结构
training.csv:
为7,049张训练图像的列表。每行包含15个关键点的(x,y)坐标和作为行排序像素列表的图像数据。
每个预测的关键点都由像素索引空间中的一个(x,y)实数对指定。有15个关键点,代表面部的以下元素:
- 左眼中心:left_eye_center_x, left_eye_center_y
- 右眼中心: right_eye_center_x, right_eye_center_y
- 左眼内角: left_eye_inner_corner_x, left_eye_inner_corner_y
- 左眼外角: left_eye_outer_corner_x, left_eye_outer_corner_y
- 右眼内角: right_eye_inner_corner_x , right_eye_inner_corner_y
- 右眼外角: right_eye_outer_corner_x , right_eye_outer_corner_y
- 左眉毛内端: left_eyebrow_inner_end_x, left_eyebrow_inner_end_y
- 左眉毛外端: left_eyebrow_outer_end_x, left_eyebrow_outer_end_y
- 右眉毛内端: right_eyebrow_inner_end_x, right_eyebrow_inner_end_y
- 右眉毛外端: right_eyebrow_outer_end_x, right_eyebrow_outer_end_y
- 鼻尖: nose_tip_x, nose_tip_y
- 嘴角左: mouth_left_corner_x, mouth_left_corner_y
- 嘴角右: mouth_right_corner_x, mouth_right_corner_y
- 嘴唇顶部中心: mouth_center_top_lip_x, mouth_center_top_lip_y
- 嘴唇底部中心: mouth_center_bottom_lip_x, mouth_center_bottom_lip_y
这里的左右是指受试者的视角。
Image: 输入的图像为表格中的最后一个字段值,由像素列表组成(按行排序),像素值为(0,255)之间的整数。图像是96×96的像素。
在一些示例中,一些目标关键点位置缺失(在csv中编码为缺失条目,即没有字段值)。
test.csv
为1,783张测试图像的列表。每行包含ImageId和作为行排序像素列表的图像数据。
数据样例:
ImageId | Image |
---|---|
1 | 182 183 182 182 180 180 176 169 156 … |
2 | 76 87 81 72 65 59 64 76 69 … |
submissionFileFormat.csv
要预测的27124个关键点的列表。每行包含一个RowId、ImageId、FeatureName、Location。FeatureName是”left_eye_center_x”、”right_eyebrow_outer_end_y”等。Location是您需要预测的内容。
IdLookupTable.csv
RowId | ImageId | FeatureName | Location |
---|---|---|---|
1 | 1 | left_eye_center_x | |
2 | 1 | left_eye_center_y | |
3 | 1 | right_eye_center_x | |
4 | 1 | right_eye_center_y | |
5 | 1 | left_eye_inner_corner_x | |
6 | 1 | left_eye_inner_corner_y | |
7 | 1 | left_eye_outer_corner_x | |
8 | 1 | left_eye_outer_corner_y | |
9 | 1 | right_eye_inner_corner_x |
数据集引用要求
@misc{facial-keypoints-detection,
author = {James Petterson and Will Cukierski},
title = {Facial Keypoints Detection},
year = {2013},
howpublished = {\url{https://kaggle.com/competitions/facial-keypoints-detection}},
note = {Kaggle}
}
三、解决方案样例
源码:facial-keypoints-detection-cnn-with-pytorch.ipynb
加载开发包
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
1. 数据准备
- 数据加载:从CSV文件加载训练集(
training.csv
)和测试集(test.csv
) - 数据预处理:
- 将图像像素值从字符串转换为浮点数数组
- 归一化像素值(除以最大值)
- 调整形状为(样本数, 1, 96, 96)适合CNN输入
- 处理缺失值(用平均值填充)
training = pd.read_csv('./data/training.csv')
test = pd.read_csv('./data/test.csv')
id_lookup_table = pd.read_csv('./data/IdLookupTable.csv')
SampleSubmission = pd.read_csv('./data/SampleSubmission.csv')
data = training['Image'].apply(lambda x : x.split(' ')).to_list()
data1 = np.array(data, dtype='float32')
# .reshape(training.shape[0], 96, 96)
datanorm = data1/np.max(data1)
datanorm = datanorm.reshape(datanorm.shape[0], 1, 96, 96)
label = np.array(list(training[training.columns[0:-1]].values), dtype='float32')
coord_means = np.nanmean(label, axis=0)
for n in range(30):
label[:,n] = np.nan_to_num(label[:,n], nan=coord_means[n])
data_t = torch.tensor(datanorm)
label_t = torch.tensor(label)
train_data, val_data, train_label, val_label = train_test_split(data_t, label_t, test_size=0.1, random_state=32)
train_data = TensorDataset(train_data, train_label)
val_data = TensorDataset(val_data, val_label)
batchsize = 16
train_loader = DataLoader(train_data, batch_size=batchsize, shuffle=True, drop_last=True)
val_loader = DataLoader(val_data, batch_size=val_data.tensors[0].shape[0])
2. 模型架构
构建了一个CNN模型cnn_net
,包含:
- 3个卷积层(Conv2d):
- 第一层:1输入通道→4输出通道,5×5核
- 第二层:4→64通道,3×3核
- 第三层:64→128通道,3×3核
- 3个全连接层(Linear):
- 第一层:计算后的特征数→250
- 第二层:250→128
- 输出层:128→30(对应15个关键点的x,y坐标)
每层卷积后使用ReLU激活函数和2×2最大池化。
class cnn_net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=4, kernel_size=5) # (b,1,96,96) to (b,4,92,92)
self.conv2 = nn.Conv2d(in_channels=4, out_channels=64, kernel_size=3) # (b,4,46,46) to (b,64,44,44)
self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3) # (b,64,22,22) to (b,128,20,20)
expected_size = expected_size = np.floor(((10+2*0)-1)/1 +1)
expected_size = 128*int(expected_size**2)
self.fc1 = nn.Linear(expected_size, 250)
self.fc2 = nn.Linear(250, 128)
self.out = nn.Linear(128, 30)
def forward(self, x):
# print(x)
# print(1)
x = F.relu(F.max_pool2d(self.conv1(x), 2))
# print(x)
# print(2)
x = F.relu(F.max_pool2d(self.conv2(x), 2))
# print(x)
# print(3)
x = F.relu(F.max_pool2d(self.conv3(x), 2))
# print(x)
# print(4)
# reshape for linear
n_units = x.shape.numel()/x.shape[0]
x = x.view(-1, int(n_units))
# print(x)
# print(5)
x = F.relu(self.fc1(x))
# print(x)
# print(6)
x = F.relu(self.fc2(x))
# print(x)
# print(7)
x = self.out(x)
# print(x)
# print(8)
return x
3. 训练过程
- 损失函数:均方误差(MSE)
- 优化器:Adam
- 训练循环:
- 50个epoch
- 每个batch计算损失并反向传播
- 记录训练和验证损失
net = cnn_net()
loss_func = nn.MSELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.001)
# test with one batch
x,y = next(iter(train_loader))
x=x.to(device)
y=y.to(device)
net.to(device)
yhat = net(x)
loss = loss_func(yhat, y)
print(loss.item())
epoch = 50
train_loss = []
val_loss = []
net.to(device)
for i in range(epoch):
print(i)
net.train()
# batchacc = []
batchloss = []
for x,y in train_loader:
x = x.to(device)
y = y.to(device)
yhat = net(x)
yhat = yhat.cpu()
y = y.cpu()
loss = loss_func(yhat, y)
batchloss.append(loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_loss.append(np.mean(batchloss))
net.eval()
x,y = next(iter(val_loader))
x = x.to(device)
y = y.to(device)
yhat = net(x)
yhat = yhat.cpu()
y = y.cpu()
loss = loss_func(yhat, y)
val_loss.append(loss.item())
打印:train loss 和 val loss 损失值
plt.plot(train_loss, label = "train loss")
plt.plot(val_loss, label = "val loss")
plt.legend()
plt.show()
4. 预测与提交
- 对测试集进行预处理(与训练集相同)
- 使用训练好的模型预测关键点位置
- 调整预测值范围(0-96)
- 生成提交文件格式
test_data = test['Image'].apply(lambda x : x.split(' ')).to_list()
test_data1 = np.array(data, dtype='float32')
test_datanorm = test_data1/np.max(test_data1)
test_datanorm = test_datanorm.reshape(test_datanorm.shape[0], 1, 96, 96)
test_data_t = torch.tensor(datanorm)
test_data_t = test_data_t.to(device)
preds = net(test_data_t)
preds = preds.cpu()
predictions = preds.detach().reshape(-1)
raw_fearure_names = training.columns[0:-1]
FeatureName = np.tile(raw_fearure_names, (len(preds),1)).reshape(-1)
ImageId = np.arange(1,len(preds)+1).repeat(30)
pred_test_final_df = pd.DataFrame({'ImageId': ImageId
,'FeatureName': FeatureName
,'Location': predictions})
print(SampleSubmission['Location'].min())
print(SampleSubmission['Location'].max())
# set feature above threhold of 96 to 96
threhold_mask = SampleSubmission['Location'] > 96
SampleSubmission.loc[threhold_mask, 'Location'] = 96
print(SampleSubmission['Location'].min())
print(SampleSubmission['Location'].max())
SampleSubmission.to_csv('submission_format.csv', index=False)
print("Your submission was successfully saved!")