摘要:
数据集:人员跌倒数据集(2021)
数据集价值:医院、养老院、商场等场所可通过部署摄像头和AI系统,快速识别跌倒事件,辅助安保人员或医护人员及时响应。
解决方案:TensorFlow框架,使用多种迁移学习模型(如VGG16、InceptionV3、MobileNetV2等)并结合注意力机制和Focal Loss进行改进。
一、问题描述
人员跌倒目标检测数据集是用于训练和评估人员跌倒检测模型的数据集。这些数据集通常包含图像或视频片段,其中标注了人员跌倒的位置和情况。这些数据集广泛应用于智能家居、养老院监控、工业安全等领域,有助于及时发现并应对潜在的安全风险。
二、数据集内容
Uttejkumar Kandagatla 是一位数据科学和机器学习领域的专业人士,专注于数据分析、模式识别和人工智能应用。他在数据集发布和分享方面具有一定的影响力。人员跌倒目标检测数据集发布于 2021年8月。
数据结构
Kandagatla从各种来源收集了图像,并创建了自定义的跌倒检测数据集,其中包含两个目录:图像images和标签labels。图像目录由两个子目录组成:train(374张图像)用于训练,以及val(111张图像)用于验证。标签目录也由两个子目录组成:train 和 val,在这个目录中存储带有特定标签的图像文件。
带标签的图像文件 | 起始 | 结束 |
---|---|---|
训练集合 | fall001.jpg | fall208.jpg |
not fallen001.jpg | not fallen166.jpg | |
验证集合 | fall001.jpg | fall70.jpg |
not fallen001.jpg | not fallen041.jpg |
分类标签包括:检测到跌倒、行走、坐着。同时为人类图像创建了边界框,并为这些边界框分配了相应的标签。
0 – 检测到跌倒
1 – 行走
2 – 坐着
在为每张图片创建边界框之后,我们最终得到一个标签文件,该文件将与图像文件名相对应,其中包含1个类别标签和该特定图像的4个边界框值。
标签文件fall001.txt样例:
0 0.506757 0.548230 0.445946 0.241148
数据集图片样例:

数据集使用许可协议
Open Data Commons Open Database License (ODbL) v1.0 — Open Data Commons: legal tools for open data
三、解决方案样例
源码:fall-detect-transfer-learning.ipynb
方案
本项目旨在开发一个基于深度学习的跌倒检测模型,通过输入图片来判断用户是否处于跌倒、行走或坐姿状态。项目采用多种迁移学习模型,并结合注意力机制和Focal Loss进行创新性改进,以提升模型的检测性能。
安装
参考《安装深度学习框架TensorFlow》安装tensorflow_gpu-2.6.0。
conda create -n tensorflow-gpu-2-6-p3-9 python=3.9
conda activate tensorflow-gpu-2-6-p3-9
conda install tensorflow-gpu==2.6
加载开发包
import os
# Load necessary packages
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
import platform
if platform.system() == "Windows":
plt.rcParams['font.family'] = ['SimHei'] # Set font for Windows
elif platform.system() == "Darwin":
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS'] # Set font for MacOS
plt.rcParams['axes.unicode_minus']=False
import tensorflow as tf
from tensorflow.keras.applications import VGG16, ResNet50, MobileNetV2
from tensorflow.keras import layers, models
import cv2
from sklearn.metrics import confusion_matrix, roc_curve, auc, accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import label_binarize
import random
import os
四、源码工作流程
1. 数据加载与预处理
数据加载
数据加载过程中,首先获取图片和标签文件列表并对其进行排序。项目中定义了load_data
函数,用于从指定目录中加载图像及其对应的标签信息。图片文件和标签文件分别位于不同的目录中,通过读取这些文件并对其进行处理,将数据组织成所需的格式。
在加载图片时,首先获取图片文件的名称列表,并对这些名称进行排序,以确保每次加载时的顺序一致。接着,依次读取每个图片文件,使用plt.imread
函数将其加载为数组格式。对于标签文件,通过打开并读取其内容,将标签信息解析为边界框(bounding boxes),每个边界框包含类别标签及其在图像中的位置和大小。通过解析标签文件中的每一行,提取出边界框的具体信息并进行相应的转换,将其转换为像素坐标系下的值。
在提取出边界框后,根据边界框的位置信息从原图像中裁剪出对应的图像区域,并将其存储在一个列表中。同时,将对应的类别标签也存储在另一个列表中。最终,这两个列表将分别包含所有图像区域及其对应的类别标签。
- 脚本从指定路径(
./fall_dataset
)加载训练和验证数据 load_data()
函数处理图像和对应的YOLO格式标签文件:- 读取图像文件(
plt.imread
) - 解析标签文件(每行包含类别和边界框坐标)
- 根据边界框坐标裁剪图像中的目标区域
- 返回图像区域列表和对应的类别标签列表
- 读取图像文件(
data_source = r'./fall_dataset'
# Data paths
train_img_path = os.path.join(data_source,'images/train/')
train_label_path = os.path.join(data_source,'labels/train/')
val_img_path = os.path.join(data_source,'images/val/')
val_label_path = os.path.join(data_source,'labels/val/')
print(train_img_path)
def load_data(img_path, label_path):
"""
Load images and corresponding labels from the specified directories.
Parameters:
img_path (str): Path to the directory containing image files.
label_path (str): Path to the directory containing label files.
Returns:
images (list): List of image arrays.
labels (list): List of corresponding labels.
"""
# Get list of image and label files
img_files = os.listdir(img_path)
img_files.sort() # Sort image files to ensure consistent order
label_files = os.listdir(label_path)
label_files.sort() # Sort label files to ensure consistent order
images = []
labels = []
# Iterate through each image file
for i in range(len(img_files)):
# Read the image
img = plt.imread(img_path + img_files[i])
# Read the corresponding label file
with open(label_path + label_files[i], 'r') as file:
r = file.readlines()
bounding_boxes = []
# Process each line in the label file
for j in r:
j = j.split()
bounding_boxes.append([int(j[0]), float(j[1]), float(j[2]), float(j[3]), float(j[4])])
# Extract each bounding box from the label
for box in bounding_boxes:
image_height, image_width, _ = img.shape
xmin, ymin, width, height = box[1:]
# Convert bounding box coordinates to pixel values
xmin = int(xmin * image_width)
ymin = int(ymin * image_height)
width = int(width * image_width)
height = int(height * image_height)
# Append the class label and corresponding image region
labels.append(box[0])
images.append(img[ymin-height//2:ymin+height//2, xmin-width//2:xmin+width//2])
return images, labels
数据预处理
为了将图像调整为模型所需的输入尺寸,项目定义了一个preprocess_images
函数。该函数的目的是将所有图像调整为统一的大小,并对像素值进行归一化处理。这样可以确保输入到模型中的数据格式一致,从而提高模型的训练效果。
首先,preprocess_images
函数会遍历每一张图像,并使用cv2.resize
函数将其调整为指定的尺寸(128×128像素)。调整尺寸的过程需要保持图像的纵横比,并在必要时进行填充或裁剪。调整后的图像存储在一个新的数组中,以便后续使用。
接下来,preprocess_images
函数对调整后的图像进行归一化处理。归一化的目的是将像素值缩放到[0, 1]的范围内,这有助于加速模型的训练并提高模型的收敛速度。具体来说,函数会将每个像素值除以255,从而将像素值从[0, 255]缩放到[0, 1]。处理完成后,所有预处理过的图像将存储在X_train
、y_train
、X_val
和y_val
变量中,分别对应训练集和验证集的数据及其标签。
# usage of the load_data function
train_images, train_labels = load_data(train_img_path, train_label_path)
val_images, val_labels = load_data(val_img_path, val_label_path)
# Preprocess images to the preferred size
pref_size = (128, 128)
def preprocess_images(images):
for i in range(len(images)):
images[i] = cv2.resize(images[i], pref_size)
return np.array(images) / 255.0
# Preprocess train and validation images
X_train = preprocess_images(train_images)
y_train = np.array(train_labels)
X_val = preprocess_images(val_images)
y_val = np.array(val_labels)
if len(np.unique(y_train))==2:
label_names = ['Fall Detected', 'NoFall Detected']
else:
# Mapping of labels to class names
label_names = ['Fall Detected', 'Walking', 'Sitting']
global num_classes
num_classes = len(label_names)
import matplotlib.pyplot as plt
# Function to display random images with labels
def display_random_images(images, labels, num_images=9):
"""
Display a grid of random images with their corresponding labels.
Parameters:
images (array): Array of image data.
labels (array): Array of labels corresponding to the images.
num_images (int): Number of images to display. Default is 9.
"""
plt.figure(figsize=(10, 10))
np.random.seed(3)
# Generate random indices
indices = np.random.choice(len(images), num_images, replace=False)
for i, index in enumerate(indices):
plt.subplot(3, 3, i + 1)
plt.imshow(images[index])
plt.title(label_names[labels[index]])
plt.axis('off')
plt.tight_layout()
plt.show()
# Display random images from the training set
display_random_images(X_train, y_train, num_images=9)
2. 模型架构
迁移学习是一种将从一个任务中学到的知识应用到不同但相关的任务中的技术。在计算机视觉中,迁移学习通常涉及使用在大型数据集(如ImageNet)上预训练的深度学习模型,然后将这些预训练模型的权重和特征应用到新的任务中。通过迁移学习,可以显著减少训练时间和对大量标注数据的需求,同时提高模型的泛化能力和准确性。本项目选择了多种迁移学习模型,包括VGG16、InceptionV3、MobileNetV2、DenseNet121、Xception、NASNetMobile、VGG19和InceptionResNetV2。
本项目在模型中引入了多头自注意力机制。其核心思想是通过多头的方式将输入映射到多个子空间,在每个子空间中独立进行注意力计算,然后将各子空间的结果拼接起来,从而增强模型的特征表达能力。多头自注意力机制首先将输入的特征表示(矩阵形式)通过线性变换分别映射到查询(Query)、键(Key)和值(Value)三个空间。然后,在每个子空间中,通过计算查询与键的点积,得到注意力权重,并将该权重应用于值,从而得到注意力输出。
核心组件 – MultiHeadSelfAttention
- 实现了多头自注意力机制(Transformer的核心组件)
- 主要功能:
- 将输入投影到查询(Query)、键(Key)、值(Value)三个空间
- 计算注意力权重(缩放点积注意力)
- 多头的并行计算与结果合并
- 包含层归一化和Dropout正则化
from tensorflow.keras.applications import VGG16, InceptionV3, MobileNetV2, ResNet50, DenseNet121, EfficientNetB0
from tensorflow.keras.layers import Layer, Dropout, LayerNormalization, Dense
class MultiHeadSelfAttention(Layer):
"""
Multi-Head Self Attention Layer.
This layer implements the multi-head self-attention mechanism used in transformers.
It projects the input into multiple heads, performs scaled dot-product attention
on each head, and then concatenates and projects the results.
Attributes:
embed_dim: Dimensionality of the embedding.
num_heads: Number of attention heads.
dropout_rate: Dropout rate for regularization.
"""
def __init__(self, embed_dim=256, num_heads=8, dropout_rate=0.1):
"""
Initialize the layer.
Args:
embed_dim: Dimensionality of the embedding.
num_heads: Number of attention heads.
dropout_rate: Dropout rate for regularization.
"""
super(MultiHeadSelfAttention, self).__init__()
self.num_heads = num_heads
self.embed_dim = embed_dim
self.dropout_rate = dropout_rate
if embed_dim % num_heads != 0:
raise ValueError(f"embedding dimension = {embed_dim} should be divisible by number of heads = {num_heads}")
self.projection_dim = embed_dim // num_heads
# Define dense layers for query, key, and value projections
self.query_dense = Dense(embed_dim)
self.key_dense = Dense(embed_dim)
self.value_dense = Dense(embed_dim)
# Define dense layer to combine the heads
self.combine_heads = Dense(embed_dim)
# Define dropout and layer normalization layers
self.dropout = Dropout(dropout_rate)
self.layernorm = LayerNormalization(epsilon=1e-6)
def attention(self, query, key, value):
"""
Compute scaled dot-product attention.
Args:
query: Query tensor.
key: Key tensor.
value: Value tensor.
Returns:
attention: Result of the attention mechanism.
"""
score = tf.matmul(query, key, transpose_b=True) # Calculate dot product
dim_key = tf.cast(tf.shape(key)[-1], tf.float32) # Get dimension of key
scaled_score = score / tf.math.sqrt(dim_key) # Scale the scores
weights = tf.nn.softmax(scaled_score, axis=-1) # Apply softmax to get attention weights
attention = tf.matmul(weights, value) # Multiply weights with values
return attention
def separate_heads(self, x, batch_size):
"""
Separate the heads for multi-head attention.
Args:
x: Input tensor.
batch_size: Batch size of the input.
Returns:
x: Tensor with separated heads.
"""
x = tf.reshape(x, (batch_size, -1, self.num_heads, self.projection_dim))
return tf.transpose(x, perm=[0, 2, 1, 3])
def call(self, inputs):
"""
Forward pass for the layer.
Args:
inputs: Input tensor.
Returns:
output: Output tensor after applying multi-head self-attention.
"""
# batch_size = tf.shape(inputs)[0]
batch_size = 8 # GPU 资源不够情况下,修改该参数
# Project inputs to query, key, and value tensors
query = self.query_dense(inputs)
key = self.key_dense(inputs)
value = self.value_dense(inputs)
# Separate the heads for multi-head attention
query = self.separate_heads(query, batch_size)
key = self.separate_heads(key, batch_size)
value = self.separate_heads(value, batch_size)
# Compute attention
attention = self.attention(query, key, value)
# Concatenate the heads and reshape the tensor
attention = tf.transpose(attention, perm=[0, 2, 1, 3])
concat_attention = tf.reshape(attention, (batch_size, -1, self.embed_dim))
# Combine heads and apply dropout and layer normalization
output = self.combine_heads(concat_attention)
output = self.dropout(output)
output = self.layernorm(inputs + output)
# Reduce mean across the time dimension to get fixed-size output
output = tf.reduce_mean(output, axis=1)
return output
def compute_output_shape(self, input_shape):
"""
Compute the output shape of the layer.
Args:
input_shape: Shape of the input tensor.
Returns:
Output shape.
"""
return input_shape[0], self.embed_dim
def compute_output_shape(self, input_shape):
return input_shape[0], self.embed_dim
模型构建
脚本提供了11种不同的模型架构,分为两类:
- 基于预训练模型的特征提取器:
- VGG16/VGG19
- ResNet50
- MobileNetV2
- DenseNet121/DenseNet201
- InceptionV3/InceptionResNetV2
- Xception
- NASNetMobile
- 共同结构:预训练CNN(冻结权重) + GlobalAveragePooling + Dense层 + 注意力层 + 分类层
- 自定义CNN模型:
- 4个卷积块(Conv2D + BatchNorm + MaxPooling + Dropout)
- 全连接层
- 注意力层
- 分类层
def create_vgg16_model():
base_model = VGG16(weights=None, # '/kaggle/input/transfer-learning-weights/Transfer-learning-weights/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5',
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
def create_inceptionv3_model():
base_model = tf.keras.applications.InceptionV3(weights= None, # '/kaggle/input/transfer-learning-weights/Transfer-learning-weights/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5',
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
def create_mobilenet_model():
base_model = MobileNetV2(weights=None, # '/kaggle/input/transfer-learning-weights/Transfer-learning-weights/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_128_no_top.h5'
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
def create_cnn_model():
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.3))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(layers.Dropout(0.5))
model.add(MultiHeadSelfAttention(embed_dim=512, num_heads=8))
model.add(layers.Dense(256, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
def create_densenet121_model():
base_model = DenseNet121(weights=None, # '/kaggle/input/transfer-learning-weights/Transfer-learning-weights/densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5',
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
from tensorflow.keras.applications import Xception, NASNetMobile, ResNet101, VGG19, InceptionResNetV2,ResNet50
def create_xception_model():
base_model = Xception(weights= None, # '/kaggle/input/transfer-learning-weights/Transfer-learning-weights/xception_weights_tf_dim_ordering_tf_kernels_notop.h5',
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
def create_nasnet_mobile_model():
base_model = NASNetMobile(weights=None, # '/kaggle/input/transfer-learning-weights/Transfer-learning-weights/NASNet-mobile-no-top.h5',
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
def create_vgg19_model():
base_model = VGG19(weights= None, # '/kaggle/input/transfer-learning-weights/Transfer-learning-weights/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5',
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
def create_inception_resnet_v2_model():
base_model = InceptionResNetV2(weights= None, # '/kaggle/input/transfer-learning-weights/Transfer-learning-weights/inception_resnet_v2_weights_tf_dim_ordering_tf_kernels_notop.h5',
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
######## new
def create_densenet201_model():
base_model = tf.keras.applications.DenseNet201(weights= None, # '/kaggle/input/tf-keras-pretrained-model-weights/No Top/densenet201_weights_tf_dim_ordering_tf_kernels_notop.h5',
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
def create_resnet50_model():
base_model = ResNet50(weights= None, # '/kaggle/input/tf-keras-pretrained-model-weights/No Top/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5',
include_top=False, input_shape=(128, 128, 3))
for layer in base_model.layers:
layer.trainable = False
model = models.Sequential()
model.add(base_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(256, activation='relu'))
model.add(MultiHeadSelfAttention(embed_dim=256, num_heads=8))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax'))
return model
3. 训练与评估
训练配置
- 使用Focal Loss(解决类别不平衡问题)
- 计算类别权重(进一步处理不平衡数据)
- 优化器:Adam(学习率1e-3)
- 回调函数:
- EarlyStopping(监控验证准确率,耐心值8)
- ReduceLROnPlateau(监控验证损失,学习率衰减)
评估指标
对每个模型计算:
- 准确率(Accuracy)
- 精确率(Precision)
- 召回率(Recall)
- F1分数(F1-score)
- 混淆矩阵(可视化)
- ROC曲线和AUC值(可视化)
# Define dictionary of models
models_dict = {
"DenseNet201": create_densenet201_model(),
"ResNet50": create_resnet50_model(),
"VGG16": create_vgg16_model(),
"InceptionV3": create_inceptionv3_model(),
"MobileNetV2": create_mobilenet_model(),
"DenseNet121": create_densenet121_model(),
"Xception": create_xception_model(),
"NASNetMobile": create_nasnet_mobile_model(),
"VGG19": create_vgg19_model(),
"InceptionResNetV2": create_inception_resnet_v2_model(),
}
from tensorflow.keras import backend as K
# Define focal loss function
def focal_loss(gamma=2., alpha=0.25):
"""
Compute focal loss for multi-class classification.
Parameters:
gamma (float): Focusing parameter.
alpha (float): Balancing parameter.
Returns:
function: Loss function.
"""
def focal_loss_fixed(y_true, y_pred):
epsilon = K.epsilon()
y_pred = K.clip(y_pred, epsilon, 1. - epsilon)
y_true = tf.one_hot(tf.cast(y_true, tf.int32), depth=y_pred.shape[-1])
alpha_t = y_true * alpha + (K.ones_like(y_true) - y_true) * (1 - alpha)
p_t = y_true * y_pred + (K.ones_like(y_true) - y_true) * (1 - y_pred)
fl = - alpha_t * K.pow((K.ones_like(y_true) - p_t), gamma) * K.log(p_t)
return K.mean(K.sum(fl, axis=-1))
return focal_loss_fixed
from sklearn.utils.class_weight import compute_class_weight
# Compute class weights
class_weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
class_weights_dict = {i: class_weights[i] for i in range(len(class_weights))}
from tensorflow.keras.optimizers import SGD,Adam
# Train and evaluate models
results = {}
for model_name, model in models_dict.items():
print(f"Training {model_name} model...")
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
# Define callbacks for early stopping and learning rate reduction
early_stopping = EarlyStopping(monitor='val_accuracy', mode='max', patience=8, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', mode='min', factor=0.5, patience=3, min_lr=1e-8)
# Compile the model
model.compile(optimizer=Adam(learning_rate=1e-3), loss=focal_loss(gamma=2., alpha=0.25), metrics=['accuracy'])
# Train the model
model.fit(X_train, y_train, epochs=20, validation_data=(X_val, y_val), callbacks=[early_stopping, reduce_lr],verbose=0)
# Restore best weights
if early_stopping.best_weights is not None:
model.set_weights(early_stopping.best_weights)
print("Restored best model weights")
print("Evaluation...")
# Predict validation set
y_pred = model.predict(X_val, verbose=0)
if isinstance(y_pred, tf.RaggedTensor):
y_pred = y_pred.to_tensor()
y_pred_classes = np.argmax(y_pred, axis=1)
# Calculate metrics
accuracy = accuracy_score(y_val, y_pred_classes)
precision = precision_score(y_val, y_pred_classes, average='weighted')
recall = recall_score(y_val, y_pred_classes, average='weighted')
f1 = f1_score(y_val, y_pred_classes, average='weighted')
# Store results
results[model_name] = {
"Accuracy": accuracy,
"Precision": precision,
"Recall": recall,
"F1-score": f1
}
# Plot confusion matrix
conf_mat = confusion_matrix(y_val, y_pred_classes)
plt.figure(figsize=(7, 6))
sns.heatmap(conf_mat, annot=True, fmt='d', cmap='Blues', xticklabels=label_names, yticklabels=label_names)
plt.title(f'Confusion Matrix - {model_name}')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.savefig(f'confusion_matrix_{model_name}.png')
plt.show()
if num_classes==2:
# Plot ROC curves for binary classification
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
# 假设 y_val 是真实标签,y_pred 是预测概率
# Compute ROC curve and AUC for binary classification
fpr, tpr, _ = roc_curve(y_val, y_pred[:, 1]) # 只使用正类的概率
roc_auc = auc(fpr, tpr)
# Plotting ROC curve
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2,
label='ROC curve (area = {0:0.2f})'.format(roc_auc))
plt.plot([0, 1], [0, 1], color='gray', lw=2, linestyle='--')
plt.xlim([-0.01, 1.0])
plt.ylim([-0.01, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title(f'Receiver Operating Characteristic (ROC) - {model_name}')
plt.legend(loc="lower right")
plt.savefig(f'roc_curve_{model_name}.png')
plt.show()
else:
# Plot ROC curves
y_val_bin = label_binarize(y_val, classes=label_names)
n_classes = y_val_bin.shape[1]
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_val_bin[:, i], y_pred[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
plt.figure()
colors = ['aqua', 'darkorange', 'cornflowerblue']
for i, color in zip(range(n_classes), colors):
plt.plot(fpr[i], tpr[i], color=color, lw=2,
label='ROC curve of {0} (area = {1:0.2f})'
''.format(label_names[i], roc_auc[i]))
plt.plot([0, 1], [0, 1], 'k--', lw=2)
plt.xlim([-0.01, 1.0])
plt.ylim([-0.01, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title(f'Receiver Operating Characteristic (ROC) - {model_name}')
plt.legend(loc="lower right")
plt.savefig(f'roc_curve_{model_name}.png')
plt.show()
# Print results
for model_name, metrics in results.items():
print(f"Results for {model_name}:")
for metric, value in metrics.items():
print(f"{metric}: {value}")
print("\n")
4. 结果分析与可视化
plot_evaluation_comparison()
函数生成模型性能对比条形图- 保存所有评估结果和可视化图表
- 打印各模型的详细评估指标
def plot_evaluation_comparison(eval_dic, metric_x='Metric', metric_hue='Model', dataset_name=None):
"""
Plot a comparison of evaluation metrics for different models.
Parameters:
eval_dic (dict): Dictionary containing evaluation metrics for each model.
Format: {'model_name': {'metric_name': value, ...}, ...}
metric_x (str): Column name to be used for x-axis in the plot. Default is 'Metric'.
metric_hue (str): Column name to be used for hue (color) in the plot. Default is 'Model'.
dataset_name (str, optional): Name of the dataset. Used for the plot title and file name.
Returns:
pd.DataFrame: DataFrame containing the evaluation metrics.
"""
# Convert the dictionary to a pandas DataFrame
eval_df = pd.DataFrame([[md, mt.title(), v] for md, dic in eval_dic.items() for mt, v in dic.items()],
columns=['Model', 'Metric', 'Value'])
eval_df.sort_values(by=['Metric', 'Value'], inplace=True)
eval_df.reset_index(drop=True, inplace=True)
print(eval_df)
# Plot the evaluation metrics
plt.figure(figsize=(10, 7))
sns.barplot(data=eval_df, x=metric_x, y='Value', hue=metric_hue)
plt.title(f"Model Comparison", fontsize=15)
plt.xticks(rotation=0)
plt.ylim(eval_df['Value'].min() * 0.8, eval_df['Value'].max() * 1.05)
plt.legend(loc=0, prop={'size': 8})
plt.tight_layout()
plt.savefig((f"{dataset_name} - " if dataset_name else '') + f"Comparison.jpg", dpi=300)
plt.show()
return eval_df
# Call the function to plot the evaluation comparison
eval_df = plot_evaluation_comparison(results,dataset_name='Fall Detection')
源码开源协议
作者:Jacob, Data Analyst at PyStudio
执行结果
Model Metric Value
0 ResNet50 Accuracy 0.728972
1 InceptionV3 Accuracy 0.785047
2 NASNetMobile Accuracy 0.813084
3 Xception Accuracy 0.822430
4 InceptionResNetV2 Accuracy 0.859813
5 VGG16 Accuracy 0.869159
6 MobileNetV2 Accuracy 0.878505
7 DenseNet121 Accuracy 0.878505
8 VGG19 Accuracy 0.897196
9 DenseNet201 Accuracy 0.906542
10 ResNet50 F1-Score 0.722568
11 InceptionV3 F1-Score 0.785969
12 NASNetMobile F1-Score 0.813084
13 Xception F1-Score 0.820470
14 InceptionResNetV2 F1-Score 0.860855
15 VGG16 F1-Score 0.868660
16 MobileNetV2 F1-Score 0.878701
17 DenseNet121 F1-Score 0.878701
18 VGG19 F1-Score 0.897010
19 DenseNet201 F1-Score 0.906542
20 ResNet50 Precision 0.727136
21 InceptionV3 Precision 0.788129
22 NASNetMobile Precision 0.813084
23 Xception Precision 0.822289
...
36 MobileNetV2 Recall 0.878505
37 DenseNet121 Recall 0.878505
38 VGG19 Recall 0.897196
39 DenseNet201 Recall 0.906542