Face landmark detection in React — the right way

Magda Odrowąż-Żelezik
9 min readFeb 17, 2023

--

I wrote a simple face landmark detection app in React based on TensorflowJS. This one actually follows the Good Practices of React.

Finished product (gif)

Background

There is plenty of resources on how to implement face landmark detection in React using TensorflowJS library. I followed a few of them myself. Detecting a landmark of face features seems like magic, but the implementation is actually pretty simple thanks to the fact that Tensorflow library and its expansions basically do everything for us. The difficult part about such a project lies within actually making it performant in React.

There is an abundance of tutorials and StackOverflow threads, but they have one thing in common — they don’t follow the React best practices.

Venting a bit

So here is the thing. React is not dead and is very much useful and robust tool — especially if you don’t use Redux with it. It has also a quite low — in my personal judgement — difficulty threshold to get started. I think here lies the problem why there is so many bad practices lingering around in the web, even though React documentation explicitly advises against them.

I fell into that trap myself and if any hater browsed through my older GitHub projects, oh, the horror! But the thing is, it is ok to make mistakes as long as you learn from them. After going through a lot of performance issues, intangible code spaghettis and not so rare infinite rendering loops I can say I know a tiny bit better and would like to share that knowledge with you.

Actual tutorial

Getting started

Create a simple React app. You may use npx create-react-app or whatever you choose. I have my own basic template to reuse because I do it so often.

Install your dependencies

You will need:

Basically run:

yarn add @tensorflow/tfjs tensorflow/tfjs-backend-webgl @mediapipe/face_mesh @tensorflow-models/face-landmarks-detection react-webcam

Create a component

Let’s start with simple React component. Let’s call it App and render out a webcam and canvas. Webcam will gather the image data for our face detection to run on. Canvas will be used to draw the actual mesh after calculations.

import React, { useRef } from "react";
import * as tf from "@tensorflow/tfjs";
import * as facemesh from "@tensorflow-models/facemesh";
import Webcam from "react-webcam";

const inputResolution = {
width: 1080,
height: 900,
};

const videoConstraints = {
width: inputResolution.width,
height: inputResolution.height,
facingMode: "user",
};

function App() {
const webcamRef = useRef(null);
const canvasRef = useRef(null);
return (
<div>
<Webcam
style={{ position: "absolute" }}
videoConstraints={videoConstraints}
/>
<canvas
ref={canvasRef}
width={inputResolution.width}
height={inputResolution.height}
style={{ position: "absolute" }}
/>
</div>
);
}

export default App;

You may note that I also added some position: absolute so that both camera output and then later the mesh are on top of one another. I also extracted dimensions to a const, because DRY.

Detect the face

Let’s start with writing a simple script to detect the face landmarks using Tensorflow model. I would suggest to create a new file, let’s say detector.js to separate the logic from the view components.

You will need to do two things:

  1. Create a detector on a faceLandmarksDetection model from Tensorflow
const model = faceLandmarksDetection.SupportedModels.MediaPipeFaceMesh;
const detectorConfig = {
runtime: "tfjs",
};
const detector = await faceLandmarksDetection.createDetector(
model,
detectorConfig
);

2. Write a simple detect function to estimate the face using the model and the input (image, video etc.)

const detect = async (net) => {
const estimationConfig = { flipHorizontal: false };
const faces = await net.estimateFaces(video, estimationConfig);
};

Now let’s wrap them both together in one function called runDetector

import * as faceLandmarksDetection from "@tensorflow-models/face-landmarks-detection";

export const runDetector = async (video) => {
const model = faceLandmarksDetection.SupportedModels.MediaPipeFaceMesh;
const detectorConfig = {
runtime: "tfjs",
};
const detector = await faceLandmarksDetection.createDetector(
model,
detectorConfig
);
const detect = async (net) => {
const estimationConfig = { flipHorizontal: false };
const faces = await net.estimateFaces(video, estimationConfig);
};
detect(detector);
};

Nice. Now if we would call runDetector function on out video object, we could get our first estimate of the face landmark points! Before we do that though, since we are operating on a video input, let’s make sure that the detect function attempts to calculate the points again and again after if finished with the previous calculation. Framing the problem provides the solution — lets make the function recursive!

const detect = async (net) => {
const estimationConfig = { flipHorizontal: false };
const faces = await net.estimateFaces(video, estimationConfig);
detect(detector); //rerun the detect function after estimating
};
detect(detector); //first run of the detect function

Let’s now plug the detect function into our App.jsso we can “feed” it with the video input.

Skip useEffect on video load readyState

To do that, we will first have to figure out how to check if the video stream from camera is loaded. You would think to immediately use useEffect for that but no! What we can do instead is create a handle function that we will hook onto HTML DOM event of the video element! Just like that:

const [loaded, setLoaded] = useState(false);
const handleVideoLoad = (videoNode) => {
const video = videoNode.target;
if (video.readyState !== 4) return;
if (loaded) return;
runDetector(video); //running detection on video
setLoaded(true);
};
return (
<div>
<Webcam
width={inputResolution.width}
height={inputResolution.height}
style={{ visibility: "hidden", position: "absolute" }}
videoConstraints={videoConstraints}
onLoadedData={handleVideoLoad} //here we pass in the function
/>
(...)

Viola! You don’t need a useEffect! As webcam will run automatically on first render, we need to wait for the data to load as in default behavior of the video element. Using the default event we can simply check on that! I also added a small state of loaded so we can rely on this later on.

If you tried to log out the output of the detect function, you would see the collection of points from the face landmark. Let’s now try drawing the face on the HTML Canvas, so we can investigate the points a little bit further.

Draw the face

We will now create a new function called drawMesh. Our function will be responsible for drawing the points and lines of the face mesh on the screen. We would like to call it every time we have an update on the face landmark points:

const detect = async (net) => {
const estimationConfig = { flipHorizontal: false };
const faces = await net.estimateFaces(video, estimationConfig);
drawMesh(faces) //<-- we want to draw new set of points
detect(detector); //rerun the detect function after estimating
};
detect(detector); //first run of the detect function

But! We need to keep in mind, before attempting anything, that drawing, moving or basically displaying anything on the screen is very expensive operation for the browser.

In the example above, we are drawing an infinite loop of detection. If we would plug the drawing into this, we will create a very inefficient operation!

RequestAnimationFrame to the rescue

In various tutorial I have seen different approaches to this problem of too frequent redrawing, mostly relying on setInterval use. While this method is not wrong, there are better ways when handling web animations and in general handling the repaint of the web content. Why is requestAnimationFrame better idea is a subject for the whole article(s), so make sure to do your homework. To keep things short, lets just state that it is an efficient way of letting the browser handle when she needs the new frame, not arbitrarily forcing her to repaint.

  const detect = async (net) => {
const estimationConfig = { flipHorizontal: false };
const faces = await net.estimateFaces(video, estimationConfig);
requestAnimationFrame(() => drawMesh(faces[0]));
detect(detector);
};
detect(detector);

Canvas context

There is one more missing element before we approach the actual drawing — we need to have a canvas to draw on. We already defined one in the App.js so all we need to do is to pass it through to our draw function.

We will pass it to runDetector call in App.js:

const canvasRef = useRef(null)
const handleVideoLoad = (videoNode) => {
const video = videoNode.target;
if (video.readyState !== 4) return;
if (loaded) return;
runDetector(video, canvasRef.current);
setLoaded(true);
};

Then we will use in runDetector implementation as an argument and then pass it’s context to the draw function:

export const runDetector = async (video, canvas) => {
const model = faceLandmarksDetection.SupportedModels.MediaPipeFaceMesh;
const detectorConfig = {
runtime: "tfjs",
};
const detector = await faceLandmarksDetection.createDetector(
model,
detectorConfig
);
const detect = async (net) => {
const estimationConfig = { flipHorizontal: false };
const faces = await net.estimateFaces(video, estimationConfig);
const ctx = canvas.getContext("2d");
requestAnimationFrame(() => drawMesh(faces[0], ctx));
detect(detector);
};
detect(detector);
};

Draw the points

Lets start with drawing the points. We have the canvas context now, so all we have to do is for the face prediction pickup the feature points called keypoints and loop on them, while drawing a small circle on every point!

Each of the keypoints has an x, y and z coordinate. As we are using a 2D canvas, we will only need dimentions x and y.

export const drawMesh = (prediction, ctx) => {
if (!prediction) return; // do not draw if there is no mesh
const keyPoints = prediction.keypoints;
if (!keyPoints) return; // do not draw if there is no keypoints
ctx.clearRect(0, 0, ctx.canvas.width, ctx.canvas.height); //clear the canvas after every drawing

for (let keyPoint of keyPoints) {
ctx.beginPath();
ctx.arc(keyPoint.x, keyPoint.y, 1, 0, 3 * Math.PI);
ctx.fillStyle = "aqua";
ctx.fill();
}
};

Triangulation

To draw the face, we need the triangulation array. You might find it in the face landmark detection repo linked above. This array is basically a list of which points lay next to which. Here is a fragment for example:

export const TRIANGULATION = [
127, 34, 139, 11, 0, 37, 232...]

When drawing the face will have to rely on this array to create the triangles. From the example above we can tell that there is a triangle 1 with vertices nr. 127, 34 and 139, triangle 2 with vertices 11, 0, 37 and etc… Lets use that knowledge!

Draw triangles

To draw triangles, we will use a simple for loop. We will jump on every third item of the triangulation array, pick its two nearest colleagues and create a small array of this triple. Then we will map this array to find out an actual mesh keypoint on the given index!

 for (let i = 0; i < TRIANGULATION.length / 3; i++) {
const points = [
TRIANGULATION[i * 3],
TRIANGULATION[i * 3 + 1],
TRIANGULATION[i * 3 + 2],
].map((index) => keyPoints[index]);

}

Now all we need to do is to produce a drawPath function that will draw a line between each of the points. This is a very simple canvas draw line function that you may find in many canvas drawing examples:

const drawPath = (ctx, points, closePath) => {
const region = new Path2D();
region.moveTo(points[0].x, points[0].y);
for (let i = 1; i < points.length; i++) {
const point = points[i];
region.lineTo(point.x, point.y);
}
if (closePath) region.closePath();
ctx.stokeStyle = "black";
ctx.stroke(region);
};

We can now add this function to our triangles loop so we draw the triangle on every iteration. Here is the complete drawMesh function:

export const drawMesh = (prediction, ctx) => {
if (!prediction) return;
const keyPoints = prediction.keypoints;
if (!keyPoints) return;
ctx.clearRect(0, 0, ctx.canvas.width, ctx.canvas.height);
for (let i = 0; i < TRIANGULATION.length / 3; i++) {
const points = [
TRIANGULATION[i * 3],
TRIANGULATION[i * 3 + 1],
TRIANGULATION[i * 3 + 2],
].map((index) => keyPoints[index]);
drawPath(ctx, points, true);
}
for (let keyPoint of keyPoints) {
ctx.beginPath();
ctx.arc(keyPoint.x, keyPoint.y, 1, 0, 3 * Math.PI);
ctx.fillStyle = "aqua";
ctx.fill();
}
};

const drawPath = (ctx, points, closePath) => {
const region = new Path2D();
region.moveTo(points[0].x, points[0].y);
for (let i = 1; i < points.length; i++) {
const point = points[i];
region.lineTo(point.x, point.y);
}
if (closePath) region.closePath();
ctx.stokeStyle = "black";
ctx.stroke(region);
};

Wrap up

Congratulations, we built a complete working solution for detecting and drawing face detection landmarks in React!

To make things even more exciting, we did not use any useEffect incorrectly and we did not use any inefficient setInterval! We have a lightweight, robust and correct solution that you may use to kickstart your new creative project with!

There is plenty of resources on the web and it is an amazing thing! You can learn plenty of web development skills via free resources online. Please be mindful though, that there is also a lot of not entirely correct resources there. It is easy to pick up and repeat wrong patterns in new and new clickbait medium articles. Make sure never to copy other people solutions, but instead use them as inspiration to dive in more, look up the documentation, read the coding guru’s recipes or even the MDN website to always find the confirmation if what you are learning is actually correct.

It is also perfectly fine to make mistakes on the run! Lets make sure we (all) learn from them :)

Link to repo: https://github.com/magdazelena/face-landmark-detection

Demo: https://magdazelena.github.io/face-landmark-detection/

--

--

Magda Odrowąż-Żelezik

Creative front end developer, currently excited about learning 3D graphics. Visit magdazelezik.com