Computer-Pointer-Controller

(Note: A video of the result where the mouse is moving, can be downloaded in the "result_images" directory) In this project the gaze detection model is used to control the mouse pointer of your computer. The mouse pointer changes position accordingly to the estimated gaze of the users's eyes. It demostrates the ability to run multiple models in the same machine and coordinate the flow of data between those models.

How it works

The application uses the InferenceEngine API from Intel's OpenVino ToolKit. The gaze estimation model requires three inputs:

The head pose
The left eye image
The right eye image

To get these inputs for the gaze estimation model, the following pre-trained OpenVino models have been used:

The Pipeline

Flow of the data inside between the model and inside the application was the following:

Project Setup and Installation

Step: Install OpenVino Toolkit v2020.1 but be sure to download all pre-requisites first (to safe trouble).
Step: Clone this repository
Step: Setup virtual environment, can be achieved by using the command: virtualenv venv, if you are not familiar with creating a virtual environment, I recommend following guide: Click
Step: Download the following 4 pre-trained models

Face Detection
Head Pose Estimation
Facial Landmarks Detection
Gaze Estimation
Download example command: python3 <openvino dir>/deployment_tools/tools/model_downloader/downloader.py --name "face-detection-adas-binary-0001"

Step: Install all the necessary libraries/dependencies with the command pip install requirements.txt (on macOS: use pip3 instead of pip)

Directory Structure

main-directory:

bin: currently only contains the example video, which was part of the atarting project of Udacity
README.md: this document you are currently reading
requirements.txt: required librararies/dependecies that have to be installed
src:
- app.py: Main file of the application that loads, runs, connects all the models and calculates/displays the results
- face_detection.py
- gaze_estimation.py
- general_model.py: Contains several pre-/post-processing functions for different models
- head_pose_detection.py
- input_feeder.py: Used to load video or webcam stream
- landmark_detection.py
- models: Directory contains the above mentioned models (not part of this GitHub Repo, need to be downloaded as described above)
- mouse_controller.py: used to move the mouse based on the final results of the gaze estimation model

Command line options

The file app.py has following command line options available:

-fdm: The location of the face-detection model (required).
-lrm: The location of the landmark-regression model (required).
-hpm: The location of the head-pose-estimation model (required).
-gem: The location of the gaze-estimation mode (required).
-i: Input-type of the Stream, either 'cam' or give video-file-directory.
-d: The device name, if not 'CPU', can be GPU, FPGA or MYRIAD.
-ct: The confidence threshold to use with the models.
-flags: Select from following flags: ffd, flr, fhp, fge (if multiple, enter with single [Space]). (ffd -> flagFaceDetection, flr -> flagLandmarkRegression, fhp -> flagHeadPose, fge -> flagGazeEstimation)

Run the application

If everything was installed correctly, the application can be run with the following command: python3 app.py -fdm models/face-detection-adas-binary-0001.xml -lrm models/landmarks-regression-retail-0009.xml -hpm models/head-pose-estimation-adas-0001.xml -gem models/gaze-estimation-adas-0002.xml -flags ffd flr fhp fge -i ../bin/demo.mp4 (Also see above mentioned command line options to achieve different results)

Benchmarking

When comparing FP16, FP32 and FP32-INT, I have focused on Model-Load-Time, Inference Time and Frames per Second (note: the face-detection model only offered FP32-INT1).

Model-Load-Time

Inference Time

Frames per Second

Comparison

FP16

model-load-time: 0.44661 s
inference time: 23.5582 s
fps: 2.504

FP32

model-load-time: 0.5140 s
inference time: 23.2349 s
fps: 2.539

FP32-INT 8

model-load-time: 2.1982 s
inference time: 23.5177 s
fps: 2.509

Results

First we can notice that the model-load-time is the lowest for FP16, which makes sense because it is the lowest precision in our comparison (lower precision -> lower accuracy)
therefore it is not surprising that FPR32-INT 8 takes the longest to load up, as higher precisions lead to higher weight of the model (+ larger model file)
FPS and Inference Time are pretty similar for the three models (within error deviation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer-Pointer-Controller

How it works

The Pipeline

Project Setup and Installation

Directory Structure

Command line options

Run the application

Benchmarking

Model-Load-Time

Inference Time

Frames per Second

Comparison

FP16

FP32

FP32-INT 8

Results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bin		bin
result_images		result_images
src		src
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Computer-Pointer-Controller

How it works

The Pipeline

Project Setup and Installation

Directory Structure

Command line options

Run the application

Benchmarking

Model-Load-Time

Inference Time

Frames per Second

Comparison

FP16

FP32

FP32-INT 8

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages