摘要:
合集:AI案例-NLP-语音等多模态
赛题:婴儿啼哭声识别挑战赛
主办方:中国科学技术大学
主页:http://challenge.xfyun.cn/topic/info?type=baby-crying
AI问题:语音等多模态识别
数据集:婴儿啼哭声数据集
数据集价值:判别婴儿啼哭声所传递的信息
解决方案:音乐信息检索(MIR)和语音处理librosa、Tensorflow2.x框架
一、赛题描述
对婴儿来说,啼哭声是一种通讯的方式,一个非常有限的,但类似成年人进行交流的方式。它也是一种生物报警器,向外界传达着婴儿生理和心理的需求。基于啼哭声声波携带的信息,婴儿的身体状况才能被确定,疾病才能被检测出来。因此,有效辨识啼哭声,成功地将婴儿啼哭声“翻译”成“成人语言”,让我们能够读懂啼哭声的含义,有重大的实际意义。婴儿啼哭声识别挑战赛旨在判别婴儿啼哭声所传递的信息。
二、数据集描述
数据集内容
1.训练数据集包含六类哭声,已人工添加噪声。支持复杂噪声环境下婴儿啼哭声识别,判断容易引起混淆的啼哭声,分析各类啼哭声的明显特征及简单直接的判别方式。
- A:awake(苏醒)
- B:diaper(换尿布)
- C:hug(要抱抱)
- D:hungry(饥饿)
- E:sleepy(困乏)
- F:uncomfortable(不舒服)
2.噪声数据来源为Noisex-92标准数据库。
3.测试集包含228条音频数据。
训练集文件信息
目录 ./data/train 包括多个子目录,以下列表为对应于婴儿啼哭类型的子目录名和包含的文件个数:
awake 160
diaper 134
hug 160
hungry 160
sleepy 144
uncomfortable 160
总共文件数:918个。
测试集文件信息
目录 ./data/test 包括测试集的文件,包含228条音频数据。
数据集版权许可协议
BY-NC-SA 4.0
https://creativecommons.org/licenses/by-nc-sa/4.0/deed.zh-hans
三、解决方案样例
工作原理介绍
音频分析库librosa
librosa
是一个开源的 Python 音频分析库,专注于 音乐信息检索(MIR) 和 语音处理。它提供了丰富的音频信号处理工具,能够高效地从原始音频文件中提取特征(如梅尔频率倒谱系数 MFCC)、分析节奏、音调、音色等,并广泛用于机器学习任务(如语音识别、音乐分类)。
关键技术原理:
- 短时傅里叶变换(STFT)
- 梅尔频率倒谱系数(MFCC)
- 节拍检测(Beat Tracking)
MFCC是一种将音频信号的短时功率谱转换为基于梅尔刻度(Mel scale)的非线性频谱表示,再通过倒谱分析得到的特征系数。它模拟了人类听觉系统对频率的感知特性。
为什么使用MFCC:
- 符合人耳特性:梅尔刻度模拟人耳对不同频率的敏感度
- 降维:将高维音频信号压缩为少量有意义的系数
- 区分性:能有效捕捉语音/音频的个性特征
- 鲁棒性:对背景噪声有一定抵抗能力
Sequential模型
Sequential模型是Keras提供的线性堆叠模型,各层之间按照严格的先后顺序连接,前一层的输出直接作为后一层的输入,形成”一条路走到黑”的线性结构。
主要特点:
- 简单直观:适合初学者快速搭建模型
- 层间单输入单输出:每层只有一个输入张量和一个输出张量
- 线性结构:没有分支、跳跃或多输入/输出
- 快速原型开发:适合快速验证想法
运行环境
参考《安装深度学习框架TensorFlow》文章。以tensorflow_gpu-2.6.0的安装为例,查表可获知该版本依赖于python3.6-3.9。
conda create -n tensorflow260 python=3.9
conda activate tensorflow260
conda install tensorflow-gpu==2.6.0
查看已安装的tensorflow版本信息:
conda list tensor
# packages in environment at D:\App-Data\conda3\envs\tensorflow260:
#
# Name Version Build Channel
tensorboard 2.6.0 py_1 anaconda
tensorboard-data-server 0.6.1 py39haa95532_0 anaconda
tensorboard-plugin-wit 1.8.1 py39haa95532_0 anaconda
tensorflow 2.6.0 gpu_py39he88c5ba_0 anaconda
tensorflow-base 2.6.0 gpu_py39hb3da07e_0 anaconda
tensorflow-estimator 2.6.0 pyh7b7c402_0 anaconda
tensorflow-gpu 2.6.0 h17022bd_0 anaconda
导入tensorflow开发包后发现有个警告,说numpy的版本太高了。
(tensorflow260) C:\>python -c "import tensorflow as tf; print(tf.__version__)"
C:\AppData\Conda-Data\envs\tensorflow260\lib\site-packages\tensorflow\python\framework\dtypes.py:585: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar.
np.object,
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\AppData\Conda-Data\envs\tensorflow260\lib\site-packages\tensorflow\__init__.py", line 41, in <module>
from tensorflow.python.tools import module_util as _module_util
...
(tensorflow260) C:\>conda list numpy
# packages in environment at C:\AppData\Conda-Data\envs\tensorflow260:
#
# Name Version Build Channel
numpy 1.26.4 py39h055cbcc_0
numpy-base 1.26.4 py39h65a83cf_0
根据 TensorFlow 2.6 的实际需求,NumPy 版本一般为 1.19.x 或 1.20.x。检测到我的环境中已安装的版本是1.26,需降低NumPy 版本号。
conda install numpy==1.20.3
# 若使用 pandas
conda install pandas==1.3.5
常用开发库numpy、pandas、scikit-learn、tqdm版本信息:
conda list | findstr "numpy pandas scikit-learn tqdm"
numpy 1.20.3 py39h749eb61_1 anaconda
numpy-base 1.20.3 py39h5bfbeaa_1 anaconda
pandas 1.3.5 py39h6214cd6_0 anaconda
scikit-learn 1.6.1 py39hdd013cc_0 conda-forge
tqdm 4.67.1 py39h9909e9c_0 anaconda
执行tensorflow开发包的函数,查看版本信息:
python -c "import tensorflow as tf; print(tf.__version__)"
# 输出:2.6.0
检测tensorflow对GPU的支持情况:
test-tensorflow-gpu.py
import tensorflow as tf
# 检查 TensorFlow 是否可以访问 GPU
# print("GPU 是否可用: ", tf.test.is_gpu_available())
print("GPU 是否可用: ", tf.config.list_physical_devices('GPU'))
# 获取 TensorFlow 可用的 GPU 设备列表
gpus = tf.test.gpu_device_name()
if gpus:
print("可用的 GPU 设备: ", gpus)
else:
print("没有检测到 GPU 设备")
执行:python test-tensorflow-gpu.py
GPU 是否可用: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2024-11-28 12:36:21.185202: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-28 12:36:22.293439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 1328 MB memory: -> device: 0, name: NVIDIA GeForce MX150, pci bus id: 0000:02:00.0, compute capability: 6.1
可用的 GPU 设备: /device:GPU:0
说明:从 TensorFlow 2.0 开始,Keras 被官方整合为 TensorFlow 的高级 API(tensorflow.keras),成为其核心组件之一。tensorflow.keras 包含了所有核心 Keras 功能(模型构建、层、优化器、损失函数等),并与 TensorFlow 生态(如 tensorflow.data、SavedModel)深度集成。
最后,安装 librosa 等开发包。
conda install -c conda-forge libsndfile
conda install -c conda-forge librosa
验证:
python -c "import soundfile as sf; print(sf.__version__)"
输出:0.13.1
源码结构和运行结果
源码:NLP_Baby_Crying_Recognition.ipynb
工作流程:
- 加载并预处理音频数据,提取MFCC特征
- 构建CNN模型
- 使用5折交叉验证训练模型
- 集成多个模型的预测结果
- 对测试集进行预测并输出结果
1、导入相关系统库
import librosa
import os
from tensorflow.keras.utils import to_categorical
import numpy as np
from tqdm import tqdm
# TensorFlow 2.x 的 Keras 位于 tensorflow.keras 命名空间下
# import keras
# from keras.models import Sequential
# from keras.layers import Input, Dense, Dropout, Flatten, Conv2D, MaxPooling2D
# from keras.utils import to_categorical
# 使用 TensorFlow 内置的 Keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import StratifiedKFold
import pandas as pd
2、定义音频数据预处理函数
函数wav2mfcc():
- 使用
librosa
库加载音频文件并提取MFCC(Mel频率倒谱系数)特征 - MFCC是语音识别中常用的特征表示方法,能有效捕捉声音的频谱特性
def get_labels(path=DATA_TRAIN_PATH):
labels = os.listdir(path)
print(labels)
label_indices = np.arange(0, len(labels))
return labels, label_indices, to_categorical(label_indices)
def wav2mfcc(file_path):
wave, sr = librosa.load(file_path, mono=True, sr=None)
mfcc = librosa.feature.mfcc(y=wave, sr=8000) # 显式指定 y= 和 sr=
return mfcc
3、数据准备
为每个标签的音频文件提取MFCC特征并保存为.npy文件。
def save_data_to_array_train(path=DATA_TRAIN_PATH):
labels, _, _ = get_labels(path)
for label in labels:
mfcc_vectors = []
wavfiles = [path + label + '/' + wavfile for wavfile in os.listdir(path + label)]
for wavfile in tqdm(wavfiles, "Saving vectors of label - '{}'".format(label)):
mfcc = np.zeros((20, 400))
mfcc_feat = wav2mfcc(wavfile)[:, :400]
mfcc[:, :mfcc_feat.shape[1]] = mfcc_feat
mfcc_vectors.append(mfcc)
mfcc_vectors = np.stack(mfcc_vectors)
np.save(DATA_PATH + label + '.npy', mfcc_vectors)
函数get_train_test加载预处理后的数据并划分训练集和测试集。
def get_train_test(split_ratio=0.8, random_state=42):
labels, indices, _ = get_labels(DATA_TRAIN_PATH)
X = np.load(DATA_PATH + labels[0] + '.npy')
y = np.zeros(X.shape[0])
for i, label in enumerate(labels[1:]):
x = np.load(DATA_PATH + label + '.npy')
X = np.vstack((X, x))
y = np.append(y, np.full(x.shape[0], fill_value= (i + 1)))
return X, y
保存测试数据:
def save_data_to_array_test(path=DATA_TEST_PATH):
mfcc_vectors = []
wavfiles = [DATA_TEST_PATH + wavfile for wavfile in os.listdir(DATA_TEST_PATH)]
for wavfile in tqdm(wavfiles, "Saving vectors of label - '{}'".format('test')):
mfcc = np.zeros((20, 400))
mfcc_feat = wav2mfcc(wavfile)[:, :400]
mfcc[:, :mfcc_feat.shape[1]] = mfcc_feat
mfcc_vectors.append(mfcc)
mfcc_vectors = np.stack(mfcc_vectors)
np.save(DATA_PATH + 'test.npy', mfcc_vectors)
对特征进行归一化处理(除以255.0)
使用分层K折交叉验证(StratifiedKFold)确保每折中各类别比例相同
执行:save_data_to_array_train()
['awake', 'diaper', 'hug', 'hungry', 'sleepy', 'uncomfortable']
Saving vectors of label - 'awake': 100%|██████████| 160/160 [00:18<00:00, 8.46it/s]
Saving vectors of label - 'diaper': 100%|██████████| 134/134 [00:08<00:00, 15.84it/s]
Saving vectors of label - 'hug': 100%|██████████| 160/160 [00:09<00:00, 17.52it/s]
Saving vectors of label - 'hungry': 100%|██████████| 160/160 [00:09<00:00, 17.58it/s]
Saving vectors of label - 'sleepy': 100%|██████████| 144/144 [00:09<00:00, 14.45it/s]
Saving vectors of label - 'uncomfortable': 100%|██████████| 160/160 [00:16<00:00, 9.73it/s]
执行:
save_data_to_array_test()
X, Y = get_train_test()
skf = StratifiedKFold(n_splits=5)
test_pred = np.zeros((228, 6))
运行结果:
Saving vectors of label - 'test': 100%|██████████| 228/228 [00:06<00:00, 35.95it/s]
['awake', 'diaper', 'hug', 'hungry', 'sleepy', 'uncomfortable']
4、模型架构
使用Keras Sequential模型:
- 包含2个卷积层、最大池化层、Flatten层和3个全连接层
- 使用Dropout防止过拟合
- 输出层使用softmax激活函数进行多分类
各层作用解析:
- Conv2D:二维卷积层,提取音频MFCC特征的局部模式
- MaxPooling2D:空间下采样,减少参数数量
- Flatten:将多维特征展平为一维向量
- Dropout:随机丢弃部分神经元,防止过拟合
- Dense:全连接层,实现高级特征组合和分类
### 2.3 模型架构
def get_model():
model = Sequential()
Input(shape=(20, 400, channel)), # Keras 新版本中推荐使用 Input(shape) 层替代直接在第一个层中指定 input_shape
model.add(Conv2D(48, kernel_size=(2, 2), activation='relu'))
model.add(Conv2D(120, kernel_size=(2, 2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.25))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(64, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
return model
5、训练 Keras Sequential 模型
- 使用Adadelta优化器
- 分类交叉熵损失函数
- 添加了EarlyStopping和ModelCheckpoint回调
- 进行5折交叉验证
for idx, (tr_idx, val_idx) in enumerate(skf.split(X, Y)):
feature_dim_1 = 20
channel = 1
epochs = 10
batch_size = 6
verbose = 1
num_classes = 6
X_train, X_test = X[tr_idx], X[val_idx]
y_train, y_test = Y[tr_idx], Y[val_idx]
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], channel) / 255.0
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], channel) / 255.0
y_train_hot = to_categorical(y_train)
y_test_hot = to_categorical(y_test)
model = get_model()
my_callbacks = [
keras.callbacks.EarlyStopping(patience=10),
keras.callbacks.ModelCheckpoint(filepath='./model/model-{0}.keras'.format(idx), save_best_only=True),
]
model.fit(X_train, y_train_hot,
batch_size=batch_size,
epochs=epochs,
verbose=verbose,
validation_data=(X_test, y_test_hot),
callbacks=my_callbacks
)
model.load_weights('./model/model-{0}.keras'.format(idx))
X_test = np.load(DATA_PATH + 'test.npy') / 255.0
test_pred += model.predict(X_test.reshape(228, 20, 400, 1))
训练过程展示:
Epoch 1/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m49s•[0m 377ms/step - accuracy: 0.1802 - loss: 1.7917 - val_accuracy: 0.1793 - val_loss: 1.7907
Epoch 2/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m45s•[0m 365ms/step - accuracy: 0.2151 - loss: 1.7899 - val_accuracy: 0.1848 - val_loss: 1.7892
Epoch 3/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 341ms/step - accuracy: 0.1918 - loss: 1.7880 - val_accuracy: 0.2065 - val_loss: 1.7880
Epoch 4/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m45s•[0m 367ms/step - accuracy: 0.2385 - loss: 1.7872 - val_accuracy: 0.2391 - val_loss: 1.7867
Epoch 5/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m44s•[0m 362ms/step - accuracy: 0.2256 - loss: 1.7879 - val_accuracy: 0.2391 - val_loss: 1.7856
Epoch 6/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m41s•[0m 330ms/step - accuracy: 0.2500 - loss: 1.7823 - val_accuracy: 0.2337 - val_loss: 1.7840
Epoch 7/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m37s•[0m 304ms/step - accuracy: 0.2651 - loss: 1.7817 - val_accuracy: 0.2283 - val_loss: 1.7823
Epoch 8/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m38s•[0m 306ms/step - accuracy: 0.2603 - loss: 1.7816 - val_accuracy: 0.2500 - val_loss: 1.7806
Epoch 9/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 341ms/step - accuracy: 0.2758 - loss: 1.7771 - val_accuracy: 0.2554 - val_loss: 1.7792
Epoch 10/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m41s•[0m 335ms/step - accuracy: 0.3108 - loss: 1.7707 - val_accuracy: 0.2446 - val_loss: 1.7778
•[1m8/8•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m1s•[0m 85ms/step
Epoch 1/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m46s•[0m 347ms/step - accuracy: 0.1755 - loss: 1.7920 - val_accuracy: 0.2011 - val_loss: 1.7907
Epoch 2/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m43s•[0m 354ms/step - accuracy: 0.2042 - loss: 1.7904 - val_accuracy: 0.1957 - val_loss: 1.7895
Epoch 3/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m37s•[0m 304ms/step - accuracy: 0.1844 - loss: 1.7907 - val_accuracy: 0.2500 - val_loss: 1.7880
Epoch 4/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m44s•[0m 361ms/step - accuracy: 0.2018 - loss: 1.7878 - val_accuracy: 0.2717 - val_loss: 1.7865
Epoch 5/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 339ms/step - accuracy: 0.2336 - loss: 1.7871 - val_accuracy: 0.2772 - val_loss: 1.7847
Epoch 6/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 343ms/step - accuracy: 0.2276 - loss: 1.7852 - val_accuracy: 0.2826 - val_loss: 1.7829
Epoch 7/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m48s•[0m 388ms/step - accuracy: 0.2678 - loss: 1.7819 - val_accuracy: 0.2826 - val_loss: 1.7814
Epoch 8/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m46s•[0m 370ms/step - accuracy: 0.2471 - loss: 1.7817 - val_accuracy: 0.2826 - val_loss: 1.7801
Epoch 9/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m45s•[0m 363ms/step - accuracy: 0.2569 - loss: 1.7799 - val_accuracy: 0.2772 - val_loss: 1.7787
Epoch 10/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m43s•[0m 348ms/step - accuracy: 0.2699 - loss: 1.7769 - val_accuracy: 0.2772 - val_loss: 1.7763
•[1m8/8•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m1s•[0m 107ms/step
Epoch 1/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m50s•[0m 381ms/step - accuracy: 0.1456 - loss: 1.7924 - val_accuracy: 0.1902 - val_loss: 1.7902
Epoch 2/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m43s•[0m 353ms/step - accuracy: 0.1899 - loss: 1.7900 - val_accuracy: 0.2337 - val_loss: 1.7890
Epoch 3/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m44s•[0m 361ms/step - accuracy: 0.2296 - loss: 1.7895 - val_accuracy: 0.2826 - val_loss: 1.7878
Epoch 4/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m43s•[0m 352ms/step - accuracy: 0.2337 - loss: 1.7881 - val_accuracy: 0.2880 - val_loss: 1.7870
Epoch 5/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m40s•[0m 324ms/step - accuracy: 0.2557 - loss: 1.7858 - val_accuracy: 0.2880 - val_loss: 1.7858
Epoch 6/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 339ms/step - accuracy: 0.2423 - loss: 1.7860 - val_accuracy: 0.2880 - val_loss: 1.7846
Epoch 7/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 345ms/step - accuracy: 0.2620 - loss: 1.7803 - val_accuracy: 0.2880 - val_loss: 1.7830
Epoch 8/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m43s•[0m 352ms/step - accuracy: 0.2608 - loss: 1.7818 - val_accuracy: 0.2935 - val_loss: 1.7816
Epoch 9/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m44s•[0m 358ms/step - accuracy: 0.2618 - loss: 1.7799 - val_accuracy: 0.2826 - val_loss: 1.7798
Epoch 10/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m43s•[0m 346ms/step - accuracy: 0.2693 - loss: 1.7791 - val_accuracy: 0.3098 - val_loss: 1.7782
WARNING:tensorflow:5 out of the last 17 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x000001C496CB6CA0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
•[1m8/8•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m1s•[0m 101ms/step
Epoch 1/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m46s•[0m 351ms/step - accuracy: 0.1838 - loss: 1.7920 - val_accuracy: 0.1913 - val_loss: 1.7911
Epoch 2/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 343ms/step - accuracy: 0.1902 - loss: 1.7901 - val_accuracy: 0.2295 - val_loss: 1.7889
Epoch 3/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 338ms/step - accuracy: 0.2299 - loss: 1.7882 - val_accuracy: 0.2350 - val_loss: 1.7872
Epoch 4/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m48s•[0m 388ms/step - accuracy: 0.2077 - loss: 1.7884 - val_accuracy: 0.2022 - val_loss: 1.7853
Epoch 5/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m43s•[0m 353ms/step - accuracy: 0.2245 - loss: 1.7872 - val_accuracy: 0.2022 - val_loss: 1.7835
Epoch 6/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m40s•[0m 328ms/step - accuracy: 0.2659 - loss: 1.7834 - val_accuracy: 0.2295 - val_loss: 1.7816
Epoch 7/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 341ms/step - accuracy: 0.2632 - loss: 1.7815 - val_accuracy: 0.2077 - val_loss: 1.7796
Epoch 8/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m41s•[0m 331ms/step - accuracy: 0.2181 - loss: 1.7808 - val_accuracy: 0.1913 - val_loss: 1.7777
Epoch 9/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m41s•[0m 330ms/step - accuracy: 0.2424 - loss: 1.7738 - val_accuracy: 0.2295 - val_loss: 1.7757
Epoch 10/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 341ms/step - accuracy: 0.2307 - loss: 1.7739 - val_accuracy: 0.2459 - val_loss: 1.7737
•[1m8/8•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m1s•[0m 132ms/step
Epoch 1/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m47s•[0m 348ms/step - accuracy: 0.1713 - loss: 1.7918 - val_accuracy: 0.1749 - val_loss: 1.7898
Epoch 2/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m42s•[0m 340ms/step - accuracy: 0.1708 - loss: 1.7892 - val_accuracy: 0.2568 - val_loss: 1.7883
Epoch 3/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m41s•[0m 335ms/step - accuracy: 0.2236 - loss: 1.7860 - val_accuracy: 0.2787 - val_loss: 1.7872
Epoch 4/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m38s•[0m 309ms/step - accuracy: 0.2011 - loss: 1.7868 - val_accuracy: 0.2732 - val_loss: 1.7860
Epoch 5/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m38s•[0m 313ms/step - accuracy: 0.2203 - loss: 1.7843 - val_accuracy: 0.2678 - val_loss: 1.7848
Epoch 6/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m38s•[0m 306ms/step - accuracy: 0.2437 - loss: 1.7811 - val_accuracy: 0.2678 - val_loss: 1.7832
Epoch 7/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m37s•[0m 305ms/step - accuracy: 0.2670 - loss: 1.7783 - val_accuracy: 0.2678 - val_loss: 1.7818
Epoch 8/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m37s•[0m 304ms/step - accuracy: 0.2788 - loss: 1.7763 - val_accuracy: 0.2514 - val_loss: 1.7804
Epoch 9/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m38s•[0m 306ms/step - accuracy: 0.2458 - loss: 1.7750 - val_accuracy: 0.2568 - val_loss: 1.7789
Epoch 10/10
•[1m123/123•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m37s•[0m 302ms/step - accuracy: 0.2552 - loss: 1.7757 - val_accuracy: 0.2404 - val_loss: 1.7776
•[1m8/8•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m1s•[0m 95ms/step
6、预测和输出
- 对测试集进行预测
- 将预测结果转换为类别标签
- 输出为CSV文件
test_pred = np.zeros((228, 6))
for path in ['./model/model-0.keras', './model/model-2.keras', './model/model-6.keras'][:1]:
model.load_weights(path)
X_test = np.load(DATA_PATH + 'test.npy') / 255.0
test_pred += model.predict(X_test.reshape(228, 20, 400, 1))
输出:
•[1m8/8•[0m •[32m━━━━━━━━━━━━━━━━━━━━•[0m•[37m•[0m •[1m1s•[0m 76ms/step
7、输出预测结果
输出对婴儿啼哭声测试集的预测结果
wavfiles = [wavfile for wavfile in os.listdir(DATA_TEST_PATH)]
df = pd.DataFrame()
df['id'] = [wavfile for wavfile in os.listdir(DATA_TEST_PATH)]
df['label'] = [['hug', 'sleepy', 'uncomfortable', 'hungry', 'awake', 'diaper'][x] for x in test_pred.argmax(1)]
df.to_csv('submit.csv', index=None)
写入submit.csv的数据样例:
id label
test_0.wav uncomfortable
test_1.wav diaper
test_10.wav uncomfortable
test_100.wav diaper
test_101.wav diaper
test_102.wav diaper
test_103.wav uncomfortable
test_104.wav uncomfortable
源码开源协议
GPL-v3