Waverley Software’s R&D team developed a face recognition system using Computer Vision and Deep Learning techniques.
Our R&D department is currently working on a face detection system that will recognize people when they enter the room and address them with an audible personalized greeting.The goal of the project is to create a Computer Vision algorithm and train it to detect a human face on the basis of footage from surveillance cameras. Once detected, the face is matched with the identity of a specific person whose data is in the system, which then generates action items such as a personalized greeting.
The team selected Python development as the core programming language to implement the project. The OpenCV library was used to enable Computer Vision algorithms, and Deep Learning tools were based on the YOLO architecture, version 3. In order to set up the detection system, our first step was to build a database of faces to be recognized. The data we compiled included people’s names and job titles, as well as random faces pulled from publicly available images to use for model training. Every machine model needs to be trained, often through manual labeling. In the case of a face recognition software, a technician basically “shows” the system the face of the person it’s being asked to identify, thus allowing it to make distinctions and learn how to compare faces, recognize the differences, and “know” which face matches the data. The database was processed using a special Deep Learning algorithm which creates digital vector representations of faces, each one unique.
First we gathered all the corresponding vectors we used to create the database and assigned them to the people to whom they belong. To accomplish that we applied a Deep Learning algorithm that generates face embeddings to be compared with the vector embeddings already in the database. When a match is detected, the system sends a notification and launches an action item such as “play a personalized greeting” or “open the door.”
In order to cope with the poor quality video extracted from surveillance cameras and “clean up” the visual data, we used Computer Vision algorithms and various optical filters to improve contrast, remove shades, add more exposure,etc.
To optimize the system, we applied a high-speed detector for the initial processing and a slow-speed detector that requires more resources from the system but provides more accurate results. This model filtered the videos to separate footage that contained human faces from footage that did not. This combined, pre-filtering approach was introduced to prevent overloading the recognition server.