There are wide uses of face recognition from applying a mask to biometric locks in mobile phones. When it comes to webRTC or conferencing face recognition is widely used for applying different masks and effects to the face. As a starter to exploring ways to do this, we experimented with trying to identify faces using TensorFlow JS that is an amazing framework that has enabled us to run AI algorithms within the browser very effectively and easily.

So this article is about building a React App that enables face recognition in a webRTC video call without using any media servers.

Introduction to the code.

We are going to build a plug-able React component, where we can easily integrate it anywhere. This component will have 4 main HTML elements;

  1. Video tag - The media stream retrieved through the navigator API is displayed here.
  2. Canvas tag 1 - In order to process through TensorFlow, we have to give an image tag id or an image as an input. Since what we have is a video stream, we have to capture the frames and display them in an image tag. Then we are going to reference this image tag to the TensorFlow so that he can do the prediction. This is a 4 step process.

a. First we draw the frame of the video in canvas,

b. Then convert it into an image.

c. Feed it to the TensorFlow and get the predictions and

d. Draw the square that covers the face location.

So this canvas element will be used to draw the video frame which we discussed in step (a)

3.  Image tag - This is used to convert the frame drawn in canvas to an image and display. This image tag is referenced to the TensorFlow to process.

4. Canvas tag 2 - This canvas is used to draw the squares around the face locations.

In order to connect those 4 HTML elements, we are going to implement 4 main functions. Those will be;

  1. takePhoto() - This function will capture a frame from the video stream. Then it is drawn in the canvas. The drawn image is then converted to an image and given as a source to the image tag.
  2. loadDataProccessing() - This function is responsible for making the predictions. This will load the TensorFlow modal, and feed the image to the modal. The predictions will be drawn in canvas element 2.
  3. initializeCanvas() - We need to initiate the canvas elements with the required configurations.
  4. initiateResizeFunction() - The display size can get changed in the runtime because of other implementations. Then we need to resize out HTML components at the same time to make sure we are not dealing with over-cropping or under-cropping.  So this function will be attached to the resizing enventListner such that this will run whenever a window size change occurs.

Connecting the pieces

We are going to connect the discussed pieces together to build a face recognizer. Hope you have your react project.

Step 1: Paste the bellow codes in index.html which is in a public folder.

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@latest"> </script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/blazeface"></script>

This will allow the browser to access the modal.

Step 2: Use a navigator to access the media devices and get the camera feed. Let's store the camera feed in a react ref. The video tag is pointed to the ref so that it will display the camera feed. TFtoVideo is the plug-able component that we are going to build in the next step.

import React, { useEffect, useRef } from 'react';
import TFToVideo from './TFToVideo';
let _stream;

function App() {
  const camaraRef = useRef();
  useEffect(() => {
    navigator.mediaDevices
      .getUserMedia({ video: true })
      .then(function (stream) {
        _stream = stream;
        camaraRef.current.srcObject = stream;
      })
      .catch(function (err0r) {
        console.log('Something went wrong!', err0r);
      });
    return () => {
      if (_stream) {
        _stream.getTracks().forEach(function (track) {
          track.stop();
        });
      }
    };
  }, []);
  return (
    <TFToVideo videoId="videoId">
      <video
        style={{ width: '700px', height: '500px' }}
        autoPlay
        muted
        controls
        id="videoId"
        ref={camaraRef}
      ></video>
    </TFToVideo>
  );
}

export default App;

Step 3: Build the TFToVideo component that connects all the pieces.

import React, { useEffect, useRef, useState } from 'react';
import './App.css';
let model;
export default function TFToVideo(props) {
  const [childElementSize, setChildElementSize] = useState({
    width: 100,
    height: 70,
  });
  const canvasRef = useRef();
  const contextRef = useRef();

  const canvasOnVideoRef = useRef();
  const contextOnVideoRef = useRef();

  useEffect(() => {
    window.addEventListener('resize', initiateResizeFunction, false);
    async function init() {
      model = await window.blazeface.load();
      console.log('model >', model);
    }
    init();
    initializeCanvas();
    initiateResizeFunction();
    return () => {
      window.removeEventListener('resize', initiateResizeFunction);
    };
  }, []);

  useEffect(() => {
    const interval = setInterval(async () => {
      loadDataProccessing();
    }, 200);

    return () => {
      clearInterval(interval);
    };
  }, [childElementSize]);

  const initiateResizeFunction = () => {
    const videoElement = document.getElementById(props.videoId);
    const width = videoElement.offsetWidth;
    const height = videoElement.offsetHeight;
    document.getElementById('hiddenCanvas').style.width = `${width}px`;
    document.getElementById('hiddenCanvas').style.height = `${height}px`;
    document.getElementById('onTopCanvas').style.width = `${width}px`;
    document.getElementById('onTopCanvas').style.height = `${height}px`;
    canvasRef.current.width = width;
    canvasRef.current.height = height;
    canvasRef.current.style.width = width;
    canvasRef.current.style.height = height;
    canvasOnVideoRef.current.width = width;
    canvasOnVideoRef.current.height = height;
    canvasOnVideoRef.current.style.width = width;
    canvasOnVideoRef.current.style.height = height;
    contextOnVideoRef.current.lineCap = 'round';
    contextOnVideoRef.current.strokeStyle = 'green';
    contextOnVideoRef.current.lineWidth = 5;
    // console.log('hey', canvasRef.current.width, canvasRef.current.height);
    setChildElementSize({ width, height });
  };
  const initializeCanvas = () => {
    const canvas = canvasRef.current;
    canvas.width = childElementSize.width;
    canvas.height = childElementSize.height;
    canvas.style.width = `${childElementSize.width}px`;
    canvas.style.height = `${childElementSize.height}px`;
    const context = canvas.getContext('2d');
    context.scale(2, 2);
    context.lineCap = 'round';
    context.strokeStyle = 'green';
    context.lineWidth = 5;
    contextRef.current = context;

    const canvasOnVideo = canvasOnVideoRef.current;
    canvasOnVideo.width = childElementSize.width * 2;
    canvasOnVideo.height = childElementSize.height * 2;
    canvasOnVideo.style.width = `${childElementSize.width}px`;
    canvasOnVideo.style.height = `${childElementSize.height}px`;
    const contextOnVideo = canvasOnVideo.getContext('2d');
    contextOnVideo.scale(2, 2);
    contextOnVideo.lineCap = 'round';
    contextOnVideo.strokeStyle = 'green';
    contextOnVideo.lineWidth = 5;
    contextOnVideoRef.current = contextOnVideo;
  };
  const takePhoto = async () => {
    const video = document.querySelector('video');
    await contextRef.current.drawImage(
      video,
      0,
      0,
      childElementSize.width,
      childElementSize.height
    );
    var data = canvasRef.current.toDataURL('image/png');
    var photo = document.getElementById('imageTag');
    photo.setAttribute('src', data);
  };

  const loadDataProccessing = async () => {
    const returnTensors = false;
    await takePhoto();
    if (model) {
      const predictions = await model.estimateFaces(
        document.querySelector('img'),
        returnTensors
      );
      if (predictions.length > 0) {
        for (let i = 0; i < predictions.length; i++) {
          const start = predictions[i].topLeft;
          const end = predictions[i].bottomRight;
          const size = [end[0] - start[0], end[1] - start[1]];
          // console.log(canvasRef.current.width, canvasRef.current.height);
          console.log(start[0], start[1], size[0], size[1]);
          // Render a rectangle over each detected face.
          contextOnVideoRef.current.clearRect(
            0,
            0,
            childElementSize.width,
            childElementSize.height
          );
          contextOnVideoRef.current.beginPath();
          contextOnVideoRef.current.rect(start[0], start[1], size[0], size[1]);
          contextOnVideoRef.current.stroke();
          contextOnVideoRef.current.closePath();
        }
      }
    } else {
      console.log('no model');
    }
  };

  return (
    <div className="App">
      {props.children}
      <img
        src=""
        alt=""
        id="imageTag"
        width={`${childElementSize.width}`}
        height={`${childElementSize.height}`}
        hidden
      />
      <canvas id="hiddenCanvas" ref={canvasRef} hidden></canvas>
      <canvas
        id="onTopCanvas"
        className="topCanvas"
        ref={canvasOnVideoRef}
      ></canvas>
    </div>
  );
}

This is a simple implementation of a technology that could be used for advanced purposes which we will explore in future blogs.

We develop custom video applications using machine learning for more humanized applications. If you are interested send us an email with your requirement to support@telzee.io and one of our experts will get in touch with you.