Montreal2013脸部关键点识别数据集和CNN解决方案

摘要：

合集：AI案例-CV-人类生理心理
数据集：蒙特利尔脸部关键点识别数据集(Montreal Face Landmark Dataset)
数据集价值：用于定位和描述人脸的关键部位
解决方案：Pytorch框架、CNN模型

一、问题描述

脸部关键点识别是计算机视觉中的一个重要任务，主要用于定位和描述人脸的关键部位，如眼睛、鼻子、嘴巴和下巴等。这一技术对于人脸识别、表情分析、姿态估计等多种应用至关重要。

脸部关键点识别的任务目标是预测人脸图像上的关键点位置。这可以作为几个应用中的构建模块，例如：

在图像和视频中跟踪人脸
分析面部表情
为医疗诊断检测畸形面部标志
生物识别/面部识别

检测面部关键点是一个非常具有挑战性的问题。面部特征因个体而异，即使是单个个体，由于3D姿态、大小、位置、视角和光照条件等因素，也存在大量的变化。计算机视觉研究在解决这些困难方面取得了很大的进展，但仍有许多改进的机会。

二、数据集内容

本数据集是由蒙特利尔大学/Montreal的约书亚·本吉奥（Yoshua Bengio）博士慷慨提供的。该数据集通常与论文 《Facial Keypoint Detection with Neural Networks》 或相关研究关联。本数据集提供了一个基准数据，帮助你开始分析人脸图像。

数据结构

training.csv:

为7,049张训练图像的列表。每行包含15个关键点的(x,y)坐标和作为行排序像素列表的图像数据。

每个预测的关键点都由像素索引空间中的一个(x,y)实数对指定。有15个关键点，代表面部的以下元素：

左眼中心：left_eye_center_x, left_eye_center_y
右眼中心: right_eye_center_x, right_eye_center_y
左眼内角: left_eye_inner_corner_x, left_eye_inner_corner_y
左眼外角: left_eye_outer_corner_x, left_eye_outer_corner_y
右眼内角: right_eye_inner_corner_x , right_eye_inner_corner_y
右眼外角: right_eye_outer_corner_x , right_eye_outer_corner_y
左眉毛内端: left_eyebrow_inner_end_x, left_eyebrow_inner_end_y
左眉毛外端: left_eyebrow_outer_end_x, left_eyebrow_outer_end_y
右眉毛内端: right_eyebrow_inner_end_x, right_eyebrow_inner_end_y
右眉毛外端: right_eyebrow_outer_end_x, right_eyebrow_outer_end_y
鼻尖: nose_tip_x, nose_tip_y
嘴角左: mouth_left_corner_x, mouth_left_corner_y
嘴角右: mouth_right_corner_x, mouth_right_corner_y
嘴唇顶部中心: mouth_center_top_lip_x, mouth_center_top_lip_y
嘴唇底部中心: mouth_center_bottom_lip_x, mouth_center_bottom_lip_y

这里的左右是指受试者的视角。

Image: 输入的图像为表格中的最后一个字段值，由像素列表组成（按行排序），像素值为(0,255)之间的整数。图像是96×96的像素。

在一些示例中，一些目标关键点位置缺失（在csv中编码为缺失条目，即没有字段值）。

test.csv

为1,783张测试图像的列表。每行包含ImageId和作为行排序像素列表的图像数据。

数据样例：

ImageId	Image
1	182 183 182 182 180 180 176 169 156 …
2	76 87 81 72 65 59 64 76 69 …

submissionFileFormat.csv

要预测的27124个关键点的列表。每行包含一个RowId、ImageId、FeatureName、Location。FeatureName是”left_eye_center_x”、”right_eyebrow_outer_end_y”等。Location是您需要预测的内容。

IdLookupTable.csv

RowId	ImageId	FeatureName
1	1	left_eye_center_x
2	1	left_eye_center_y
3	1	right_eye_center_x
4	1	right_eye_center_y
5	1	left_eye_inner_corner_x
6	1	left_eye_inner_corner_y
7	1	left_eye_outer_corner_x
8	1	left_eye_outer_corner_y
9	1	right_eye_inner_corner_x

数据集引用要求

@misc{facial-keypoints-detection,
    author = {James Petterson and Will Cukierski},
    title = {Facial Keypoints Detection},
    year = {2013},
    howpublished = {\url{https://kaggle.com/competitions/facial-keypoints-detection}},
    note = {Kaggle}
}

三、解决方案样例

源码：facial-keypoints-detection-cnn-with-pytorch.ipynb

加载开发包

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split

1. 数据准备

数据加载：从CSV文件加载训练集(training.csv)和测试集(test.csv)
数据预处理：
- 将图像像素值从字符串转换为浮点数数组
- 归一化像素值(除以最大值)
- 调整形状为(样本数, 1, 96, 96)适合CNN输入
- 处理缺失值(用平均值填充)

training = pd.read_csv('./data/training.csv')
test = pd.read_csv('./data/test.csv')
id_lookup_table = pd.read_csv('./data/IdLookupTable.csv')
SampleSubmission = pd.read_csv('./data/SampleSubmission.csv')

data = training['Image'].apply(lambda x : x.split(' ')).to_list()
data1 = np.array(data, dtype='float32')
# .reshape(training.shape[0], 96, 96)
datanorm = data1/np.max(data1)
datanorm = datanorm.reshape(datanorm.shape[0], 1, 96, 96)

label = np.array(list(training[training.columns[0:-1]].values), dtype='float32')

coord_means = np.nanmean(label, axis=0)
for n in range(30):
    label[:,n] = np.nan_to_num(label[:,n], nan=coord_means[n])
    
data_t = torch.tensor(datanorm)
label_t = torch.tensor(label)

train_data, val_data, train_label, val_label = train_test_split(data_t, label_t, test_size=0.1, random_state=32)

train_data = TensorDataset(train_data, train_label)
val_data = TensorDataset(val_data, val_label)

batchsize = 16
train_loader = DataLoader(train_data, batch_size=batchsize, shuffle=True, drop_last=True)
val_loader = DataLoader(val_data, batch_size=val_data.tensors[0].shape[0])

2. 模型架构

构建了一个CNN模型cnn_net，包含：

3个卷积层(Conv2d)：
- 第一层：1输入通道→4输出通道，5×5核
- 第二层：4→64通道，3×3核
- 第三层：64→128通道，3×3核
3个全连接层(Linear)：
- 第一层：计算后的特征数→250
- 第二层：250→128
- 输出层：128→30(对应15个关键点的x,y坐标)

每层卷积后使用ReLU激活函数和2×2最大池化。

class cnn_net(nn.Module):
    def __init__(self):
        super().__init__()

        self.conv1 = nn.Conv2d(in_channels=1, out_channels=4, kernel_size=5) # (b,1,96,96) to (b,4,92,92)
        self.conv2 = nn.Conv2d(in_channels=4, out_channels=64, kernel_size=3) # (b,4,46,46) to (b,64,44,44)
        self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3) # (b,64,22,22) to (b,128,20,20)

        expected_size = expected_size = np.floor(((10+2*0)-1)/1 +1)
        expected_size = 128*int(expected_size**2)

        self.fc1 = nn.Linear(expected_size, 250)
        self.fc2 = nn.Linear(250, 128)
        self.out = nn.Linear(128, 30)


    def forward(self, x):

        # print(x)
        # print(1)
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        # print(x)
        # print(2)
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        # print(x)
        # print(3)
        x = F.relu(F.max_pool2d(self.conv3(x), 2))
        # print(x)
        # print(4)

        # reshape for linear
        n_units = x.shape.numel()/x.shape[0]
        x = x.view(-1, int(n_units))
        # print(x)
        # print(5)

        x = F.relu(self.fc1(x))
        # print(x)
        # print(6)
        x = F.relu(self.fc2(x))
        # print(x)
        # print(7)
        x = self.out(x)
        # print(x)
        # print(8)

        return x

3. 训练过程

损失函数：均方误差(MSE)
优化器：Adam
训练循环：
- 50个epoch
- 每个batch计算损失并反向传播
- 记录训练和验证损失

net = cnn_net()
loss_func = nn.MSELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.001)

# test with one batch
x,y = next(iter(train_loader))
x=x.to(device)
y=y.to(device)
net.to(device)

yhat = net(x)
loss = loss_func(yhat, y)
print(loss.item())

epoch = 50

train_loss = []
val_loss = []

net.to(device)

for i in range(epoch):
    print(i)

    net.train()

    # batchacc = []
    batchloss = []
    for x,y in train_loader:

        x = x.to(device)
        y = y.to(device)
        
        yhat = net(x)

        yhat = yhat.cpu()
        y = y.cpu()

        loss = loss_func(yhat, y)
        batchloss.append(loss.item())

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    train_loss.append(np.mean(batchloss))

    net.eval()
    x,y = next(iter(val_loader))
    x = x.to(device)
    y = y.to(device)
    yhat = net(x)
    yhat = yhat.cpu()
    y = y.cpu()
    loss = loss_func(yhat, y)
    val_loss.append(loss.item())

打印：train loss 和 val loss 损失值

plt.plot(train_loss, label = "train loss")
plt.plot(val_loss, label = "val loss")
plt.legend()
plt.show()

4. 预测与提交

对测试集进行预处理(与训练集相同)
使用训练好的模型预测关键点位置
调整预测值范围(0-96)
生成提交文件格式

test_data = test['Image'].apply(lambda x : x.split(' ')).to_list()
test_data1 = np.array(data, dtype='float32')
test_datanorm = test_data1/np.max(test_data1)
test_datanorm = test_datanorm.reshape(test_datanorm.shape[0], 1, 96, 96)

test_data_t = torch.tensor(datanorm)
test_data_t = test_data_t.to(device)

preds = net(test_data_t)
preds = preds.cpu()
predictions = preds.detach().reshape(-1)

raw_fearure_names = training.columns[0:-1]

FeatureName = np.tile(raw_fearure_names, (len(preds),1)).reshape(-1)
ImageId = np.arange(1,len(preds)+1).repeat(30)

pred_test_final_df = pd.DataFrame({'ImageId': ImageId
                                  ,'FeatureName': FeatureName
                                  ,'Location': predictions})
print(SampleSubmission['Location'].min())
print(SampleSubmission['Location'].max())

# set feature above threhold of 96 to 96 
threhold_mask = SampleSubmission['Location'] > 96
SampleSubmission.loc[threhold_mask, 'Location'] = 96

print(SampleSubmission['Location'].min())
print(SampleSubmission['Location'].max())                     
SampleSubmission.to_csv('submission_format.csv', index=False)
print("Your submission was successfully saved!")

四、获取案例套装

需要登录后才允许下载文件包。登录