基于卷积神经网络的面部年龄识别

2021-05-26 机器学习 PV:

1.训练模型和框架

1.1 Adience数据集

Adience图片集包括Flickr等相册，通过从iPhone5（或更高版本）的智能手机等移动设备自动上传并进行组装，由其作者根据知识共享（CC）许可向公众发布。数据集提供了共计26580张面部照片的数据和基准，旨在尽可能真实地应对实际成像并做出判断。数据的标签有年龄组、性别、户外等，可用于监督学习的人脸识别对年龄的研究。Adience收录的数据信息包括主体外观、动作、噪点、光线等实际情况中进行图像采集时包含的动态变化，具有环境适应性强、应用范围广的特点。

1.2 Caffe框架

Caffe（Convolutional Architecture for Fast Feature Embedding）采用CUDA架构，可在CPU和GPU上进行高速运算，是一个兼具了效率、表达和思维模块化的卷积神经网络框架。

Caffe的数据结构以Blobs-Layers-Net的形式存在。

Blobs是Caffe的核心数据格式，提供了统一的内存接口，并且可以在CPU与GPU之间进行数据同步。主要通过四维张量（NumberChannelweight*high）的形式，按照C-contiguous方式（数组的行存储连续且不间断）来存储和交流网络中的权重、激活值、正反向数据。

Layers是Caffe模型的关键内容，是组成神经网络和进行相关计算的基础。所有的Layer层都可以接收底层输入的Blobs，并向高层输出Blobs。Layers每一层都定义三种重要的计算：初始化（Setup）、向前传播（Forward）、向后传播（Backward）。

其包含的运算有：

· 1.load data：数据载入

· 2.Convolve filters：卷积层，进行卷积。

· 3.Pooling：池化层，进行池化。

· 4.Nonlinearities：非线性映射运算，即激活函数。

· 5.Inner Products：内积运算。

· 6.Normalize：归一化

· 7.Compute losses：损失函数计算，如softmax、hinge。

Net是一个由一系列连接的Layer层组成的有向无环图（Directed Acyclic Graph，DAG）。caffe会在向前传播或向后传播时，对DAG中的所有层进行记录，确保其准确性。

2.基于卷积神经网络的人脸识别

2.1 卷积神经网络架构

使用的架构包括3个卷积层、2个全连接层和1个最终输出层。具体定义卷积层如下：

1.Conv1：将内核大小为337的共计96个像素节点的过滤应用于输入第一卷积层中，经过修正线性单元ReLU（激活函数）处理后，池化层采用保留最大值（max-pooling）的规则，选择一个两像素跨度的3*3区域中最大值，进行池化，再经过局部响应归一化层（Local Response Normalization，LRN）。

2.Conv2：上一层的输出（96×28×28）由第二个卷积层进行处理，包括对256个大小为9655的像素过滤。同样的，经过一个修正线性单元ReLU，最大池化层，和一个与之前参数相同的局部响应归一化层。

3.Conv3：第三层卷积层通过对一组384个大小为25633的像素过滤来对256×14×14的 Blob进行处理，接着经过修正线性单元ReLU和一个最大池化层。

再通过下列方式定义完全连接层：

1.第一个完全连接层包含了512个人工神经元，用于接收第三卷积层的输出结果。接着再通过修正线性单元ReLU和Dropout层（防止CNN过拟合）。

2.第二个完全连接层接收第一个完全连接层的512个人工神经元空间大小的输出（同样包含512个人工神经元），再通过修正线性单元ReLU和Dropout层。

3.第三层完全连接层映射最终的分类结果。

最终，最后一个完全连接层的输出会被反馈到为每个类别分配概率的Softmax层，预测其本身通过给定的测试图像的最大概率。

2.2 年龄预测

人的面部特征无时不刻发生着微妙变化反映出其年龄的不断增长，在最理想的情况之下，人的面部特征随着人的成长应该表现出正相关的关系，那么年龄估计就是一个广义上的回归问题。然而实际上仅通过回归的方法来判断一个人的年龄是靠不住的，即便一个正常的自然人也很难推断出观察对象的准确实际年龄。

但是人眼可以对观察对象做出一个大致判断，较为准确的预测出对方的年龄所在区间。这样，就对对方的年龄有了初步估计。这样，就可以对年龄区间进行一个分类，以进一步研究人脸和年龄的关系。

Adience数据集将人的年龄划分为了八个类别，分别为：[0-2]、[4-6]、[8-13]、[15-20]、[25-32]、[38-43]、[48-53]、[60 -]。因此，深度神经网络在最终的Softmax层中有8个节点，分别对各年龄段进行分类。

1 2	MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746) ageList = ['(0-2)', '(4-6)', '(8-12)', '(15-20)', '(25-32)', '(38-43)', '(48-53)', '(60-100)']

Blob输入网络进行年龄的检测，并且年龄检测程序向前传播。

1
2
3

ageNet.setInput(blobs)
agepredction = ageNet.forward()
age = ageList[agepredction[0].argmax()]

3.计算机视觉技术与系统的运行测试

3.1 OpenCV

OpenCV是基于伯克利软件套件（Berkeley Software Distribution，BSD）许可发行的一个开源的跨平台计算机视觉库，其本身由C++语言和C函数编写，同时提供了Python、JAVA、MATLAB等语言的接口，能够运行计算机视觉的一系列算法，对图像进行处理。

要在python环境下使用OpenCV视觉库，可以通过pip执行命令“pip install opencv-python“来安装OpenCV模块。

安装完成以后，在Python中导入模块。

3.2 图像侦测

使用OpenCV中的DNN人脸侦测模块对人脸的图像进行侦测和获取，OpenCV为该检测器提供了Caffe实施的16位浮点数版本。其优点有：

· 1.在CPU上实时运行。

· 2.即使在严重遮挡下也可以工作。

· 3.能够检测各种比例的面部。

· 4.适用不同的脸部朝向，如上、下、左、右和侧面等。

整个面部检测的功能使用函数“getFaceBox”完成，包含信号量参数“net“、图像参数“frame“和阈值参数“conf_threshold”。将图像转化为blob，其中“blobFromImage”函数用于减均值、图像缩放和进行通道交换（由于opencv中的图像存储都是基于BGR通道，所以需要将原本的RGB通道替换为BGR通道）。

import cv2

def getFaceBox(net, frame, threshold_conf=0.7):
    bounding_OpencvDnn = frame.copy()
    bounding_Height = bounding_OpencvDnn.shape[0]
    bounding_Width = bounding_OpencvDnn.shape[1]
    blobs = cv2.dnn.blobFromImage(bounding_OpencvDnn, 1.0, (300, 300), [104, 117, 123], True, False)

侦测人脸的网络向前传递。接着进行人脸识别框bounding box的绘制，设置bounding box的坐标分别为x1、y1，x2、y2。输出检测是一个4-D矩阵，其中i是面部的迭代器，第三维用于遍历检测到的面部数据，第四维包含了每个面的边界框和分数信息。由于识别框的输出坐标会在[0,1]间进行归一化，所以，为了获得正确的bounding box，需要将坐标再乘以原始图像的宽度和高度。

net.setInput(blobs)
detections = net.forward()
bounding_bboxes = []
for i in range(detections.shape[2]):
    confidence = detections[0, 0, i, 2]
    if confidence > threshold_conf:
        x1 = int(detections[0, 0, i, 3] * bounding_Width)
        y1 = int(detections[0, 0, i, 4] * bounding_Height)
        x2 = int(detections[0, 0, i, 5] * bounding_Width)
        y2 = int(detections[0, 0, i, 6] * bounding_Height)
        bounding_bboxes.append([x1, y1, x2, y2])
        cv2.rectangle(bounding_OpencvDnn, (x1, y1), (x2, y2), (0, 255, 0), int(round(bounding_Height/150)), 8)
return bounding_OpencvDnn, bounding_bboxes

3.3 输入

将识别的源图片位置存放在字符串str当中，调用OpenCV中的函数imread()来读取图片，完成图像的输入。并截取文件的路径只保留文件名，方便后续输出。

1
2
3

str = "C:\\Users\\elbadaernU9.9\\Desktop\\test.png"
frame = cv2.imread(str)
name = str[31:]

使用OpenCV库，导入网络模型和预训练模型，在Caffe框架下对Adience数据集进行模型的训练。

faceProto = "opencv_face_detector.pbtxt"
faceModel = "opencv_face_detector_uint8.pb"

ageProto = "age_deploy.prototxt"
ageModel = "age_net.caffemodel"

5.4 测试输出

最后使用imshow()函数在输入的图像上显示网络的输出，包括识别框bounding box、分类（预测）的最终结果。为了使图像正常显示，需要添加一行代码“cv2.waitKey(1)”，并设置显示的时间为5秒。通过imwrite()函数保存输出的结果。至此，主要的年龄识别系统编写完成。

label = "{},{}".format(gender, age)
cv2.putText(frameFace, label, (bounding_bbox[0], bounding_bbox[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 255), 2, cv2.LINE_AA)
cv2.imshow("Age & Gender Prediction Demo", frameFace)
cv2.imwrite("age-gender-out-{}".format(args.input),frameFace)
time.sleep(5)

5.5 拓展

系统初步的年龄估计功能实现以后，开始对系统进行进一步的优化和功能拓展。

由于原系统的图像输入是以保存的图片形式导入的，在实际应用当中没有事先保存源图像文件的情况下就会有诸多不便。所以尝试将图像的输入形式做一个动态的拓展，由静态的图片形式补充为摄像头动态捕捉人像。

通过调用OpenCV启动摄像头的函数，使系统自动捕获人脸，同时保留图片输入的功能（使用Dos命令实现）。

cap = cv2.VideoCapture(args.input if args.input else 0)
padding = 20
while cv2.waitKey(1) < 0:
    t = time.time()
    hasFrame, frame = cap.read()
    if not hasFrame:
        cv2.waitKey()
        break

由于Adience数据集同样包含了性别的标签，可以在年龄估计的同时为系统添加性别预测功能的代码。

genderProto = "gender_deploy.prototxt"
genderModel = "gender_net.caffemodel"

genderList = ['Male', 'Female']
genderNet = cv2.dnn.readNet(genderModel, genderProto)

genderNet.setInput(blobs)
genderPredction = genderNet.forward()

label = "{},{}".format(gender, age)

完成后在Pycharm中将项目封装为.exe可执行文件。

得到最终的运行结果如下：

完整的人脸识别程序代码：

import time
import argparse

import cv2

def getFaceBox(net, frame, threshold_conf=0.7):
    bounding_OpencvDnn = frame.copy()
    bounding_Height = bounding_OpencvDnn.shape[0]
    bounding_Width = bounding_OpencvDnn.shape[1]
    blobs = cv2.dnn.blobFromImage(bounding_OpencvDnn, 1.0, (300, 300), [104, 117, 123], True, False)

    net.setInput(blobs)
    detections = net.forward()
    bounding_bboxes = []
    for i in range(detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        if confidence > threshold_conf:
            x1 = int(detections[0, 0, i, 3] * bounding_Width)
            y1 = int(detections[0, 0, i, 4] * bounding_Height)
            x2 = int(detections[0, 0, i, 5] * bounding_Width)
            y2 = int(detections[0, 0, i, 6] * bounding_Height)
            bounding_bboxes.append([x1, y1, x2, y2])
            cv2.rectangle(bounding_OpencvDnn, (x1, y1), (x2, y2), (0, 255, 0), int(round(bounding_Height/150)), 8)
    return bounding_OpencvDnn, bounding_bboxes

parser = argparse.ArgumentParser(description='Use this script to run age and gender recognition using OpenCV.')
parser.add_argument('--input', help='Path to input image or video file. Skip this argument to capture frames from a camera.')

args = parser.parse_args()

faceProto = "opencv_face_detector.pbtxt"
faceModel = "opencv_face_detector_uint8.pb"

ageProto = "age_deploy.prototxt"
ageModel = "age_net.caffemodel"

genderProto = "gender_deploy.prototxt"
genderModel = "gender_net.caffemodel"

MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
ageList = ['(0-2)', '(4-6)', '(8-12)', '(15-20)', '(25-32)', '(38-43)', '(48-53)', '(60-100)']
genderList = ['Male', 'Female']
#genderList.decode("utf-8")

# Load network
ageNet = cv2.dnn.readNet(ageModel, ageProto)
genderNet = cv2.dnn.readNet(genderModel, genderProto)
faceNet = cv2.dnn.readNet(faceModel, faceProto)

# Open a video file or an image file or a camera stream
cap = cv2.VideoCapture(args.input if args.input else 0)
padding = 20
while cv2.waitKey(1) < 0:
    # Read frame
    t = time.time()
    hasFrame, frame = cap.read()
    if not hasFrame:
        cv2.waitKey()
        break

    frameFace, bounding_bboxes = getFaceBox(faceNet, frame)
    if not bounding_bboxes:
        print("No face Detected, Checking next frame")
        continue

    for bounding_bbox in bounding_bboxes:
        print("=====================================Face Found=====================================")
        # print(bounding_bbox)
        face = frame[max(0,bounding_bbox[1]-padding):min(bounding_bbox[3]+padding,frame.shape[0]-1),max(0,bounding_bbox[0]-padding):min(bounding_bbox[2]+padding, frame.shape[1]-1)]

        blobs = cv2.dnn.blobFromImage(face, 1.0, (227, 227), MODEL_MEAN_VALUES, swapRB=False)
        genderNet.setInput(blobs)
        genderPredction = genderNet.forward()
        gender = genderList[genderPredction[0].argmax()]
        # print("Gender Output : {}".format(genderPredction))

        print("Gender : {}, conf = {:.3f}".format(gender, genderPredction[0].max()))

        ageNet.setInput(blobs)
        agepredction = ageNet.forward()
        age = ageList[agepredction[0].argmax()]

        print("Age Output : {}".format(agepredction))
        print("Age : {}, conf = {:.3f}".format(age, agepredction[0].max()))

        label = "{},{}".format(gender, age)
        cv2.putText(frameFace, label, (bounding_bbox[0], bounding_bbox[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 255), 2, cv2.LINE_AA)
        cv2.imshow("Age & Gender Prediction Demo", frameFace)
        # cv2.imwrite("age-gender-out-{}".format(args.input),frameFace)

    print("time : {:.3f}".format(time.time() - t))
    print("=====================================Round Over=====================================")

(本文节选自我的毕业论文，有删改)
相关代码和训练集已打包至github：https://github.com/elbadaernU404/AgeGenderPredictionDemo