Building a Video/Audio Chat Web App with WebRTC

mdiaconescu's picture

WebRTC State of the Art

WebRTC (Web Real Time Communication) is a new web standard currently supported by Google, Mozilla and Opera. It allows peer-to-peer communication between browsers. Its mission is to enable rich, high-quality RTC applications for the browser, mobile platforms, and the Web of Things (WoT), and allow them to communicate via a common set of protocols.

One of the last major challenges for the web is to enable human communication via voice and video without using special plugins and without having to pay for these services. The first WebRTC implementation was built in May 2011 by Ericsson. WebRTC defines open standards for real-time, plugin-free video, audio and data communication. Curently, many web services already use RTC, but require downloads, native apps or plugins. These includes Skype, Facebook (which uses Skype) and Google Hangouts (which use the Google Talk plugin). Downloading, installing and updating plugins can be complex, error prone and annoying and it's often difficult to convince people to install plugins in the first place!

How does it work?

In general, a WebRTC-enabled application needs to:

  • obtain an audio, video or other data stream;
  • gather network information (e.g., IP addresses and ports), and exchange this with other WebRTC clients;
  • a "signaling" communication is used to report errors, and initiate or close sessions;
  • clients must exchange information about media, such as resolution and codecs;
  • stream the audio, video or data;

WebRTC implements three APIs:

  • MediaStream - allows the client (e.g., the web browser) to access the stream, such as the one from a WebCam or microphone;
  • RTCPeerConnection - enable audio or video data transfer, with support for encryption and bandwidth management;
  • RTCDataChannel - enables peer-to-peer communication for any generic data.

In theory it is possible to create a simple WebRTC application, without any server components for signaling. In practice such application does not make much of a sense because it can be used only on a single page, thus it shares data amoung the same peer.

MediaStream

The MediaStream represents a synchronized stream(s) of media. Each MediaStream has an input and an output. The getUserMedia method has three parameters:

  • a constraints object;
  • a success callback method;
  • a failure callback method.

For example, a local WebCam stream can be shown in a HTML5 video element:

<!DOCTYPE html>
<html>
  <head>
    <script src="webrtc.js"></script>
    <title>WebRTC Test</title>
  </head>
  
  <body>
    <video id="localVideo" autoplay/>
    <script>
      window.addEventListener("load", function (evt) {
        navigator.getUserMedia({ audio: true, video: true},
          function(stream) {
            var video = document.getElementById('localVideo');
            video.src = window.URL.createObjectURL(stream);
          },
          function(err) {
            console.log("The following error occurred: " + err.name);
          }
        );
      });
    </script>
  </body>
</html>

RTCPeerConnection

The RTCPeerConnection interface represents a WebRTC connection between the local computer and a remote peer. It is used to handle efficient streaming of data between the two peers. Both parties (the caller and the called party) need to set up their own RTCPeerConnection instances to represent their end of the peer-to-peer connection. In general, we use a RTCPeerConnection::onaddstream event callback to take care of dealing with the audio/video stream, e.g., assigning it to a HTML5 video:

var peerConn= new RTCPeerConnection();
peerConn.onaddstream = function (evt) {
  var videoElem = document.createElement("video");
  document.appendChild(videoElem);
  videoElem.src = URL.createObjectURL(evt.stream);
};

The initiator of the call (the caller), needs to create an offer and using a signaling service (e.g., a NodeJS server application using WebSockets) send it to the callee:

navigator.getUserMedia({video: true}, function(stream) {
  videoElem.src = URL.createObjectURL(stream);
  peerConn.addStream(stream);

  peerConn.createOffer(function(offer) {
    peerConn.setLocalDescription(new RTCSessionDescription(offer), function() {
      // send the offer to a server to be forwarded to the other peer
    }, error);
  }, error);
});

The callee, which receives the offer and needs to "answer" the call has to create an answer and send it to the caller:

navigator.getUserMedia({video: true}, function(stream) {
  videoElem.src = URL.createObjectURL(stream);
  peerConn.addStream(stream);

  peerConn.setRemoteDescription(new RTCSessionDescription(offer), function() {
    peerConn.createAnswer(function(answer) {
      peerConn.setLocalDescription(new RTCSessionDescription(answer), function() {
        // send the answer to a server to be forwarded back to the caller
      }, error);
    }, error);
  }, error);
});

The setLocalDescription method takes three parameters: a session description, a success callback method and an error callback method. This method changes the local description associated with a connection. A description defines the properties of the connection like for example the codec.

RTCPeerConnection and Servers

In a real application, WebRTC needs servers (in general simple) for the following purposes:

  • users management;
  • exchange of information between peers;
  • data exchange about media, such as formats and video resolution:
  • the connections needs to traverse NAT gateways and firewalls.

The STUN protocol and its extension TURN are used by the ICE framework to enable RTCPeerConnection to cope with NAT traversal and other network specific details. ICE is a framework for connecting peers, such as two video chat clients. ICE tries to connect peers directly, with the lowest possible latency, via UDP. In this process, STUN servers have a single task: to enable a peer behind a NAT to find out its public address and port. Google and Mozilla provides a couple of STUN severs which can (for now) be used free of charge. For example, Google STUN servers are used to obtain ICE candidates, which are then forwarded to the other peer(s):

var peerConnCfg =  {'iceServers': [{'url': 'stun:stun.l.google.com:19302'}]},
    peerConn= new RTCPeerConnection(peerConnCfg),
    signalingChannel = new WebSocket('ws://my-websocket-server:port/');

peerConn.onicecandidate = function (evt) {
  // send any ice candidates to the other peer, i.e., evt.candidate
  signalingChannel.send(JSON.stringify({ "candidate": evt.candidate }));
};

signalingChannel.onmessage = function (evt) {
  var signal = JSON.parse(evt.data);
  if (signal.sdp)
    peerConn.setRemoteDescription(new RTCSessionDescription(signal.sdp));
  else if (signal.candidate)
    peerConn.addIceCandidate(new RTCIceCandidate(signal.candidate));
};

The signalingChannel represents the communication channel, based on WebSockets, XHR or something else, having the purpose of helping to exchange the required information for the peer-to-peer connection initialization.

The setRemoteDescription method takes three parameters: a session description, a success callback method and an error callback method. This method changes the remote description associated with a connection. A description defines the properties of the connection like for example the codec.

RTCDataChannel

The RTCDataChannel interface represents a bi-directional data channel between two peers of a connection. Objects of this type can be created using RTCPeerConnection.createDataChannel(), or are received in a datachannel event of type RTCDataChannelEvent on an existing RTCPeerConnection. Using a data channel capabilities is "natural", and makes use of messaging style events based communication:

var peerConn= new RTCPeerConnection(),
    dc = peerConn.createDataChannel("my channel");

dc.onmessage = function (event) {
  console.log("received: " + event.data);
};

dc.onopen = function () {
  console.log("datachannel open");
};

dc.onclose = function () {
  console.log("datachannel close");
};

Build a Simple Audio/Video-Chat Web-Application

In this section we'll learn how to build a basic Audio/Video-Chat Web-Application. It allows to perform a video call between two peers and displays the local and remote video. In a real application one has to deal with complex situations, users management, and all kind of errors. In this tutorial we skip error situations, and keep our application simple:

  • Two friends located on different Earth locations need to have a video call;
  • They are able to use a modern Web Browser, such as Google Chrome or Firefox;
  • They are able access the web application URL using their available internet connection (DSL, 3G or any other type);
  • One of the users initiates the video call by clicking the "Video Call" button;
  • Both users allows the browser to access their WebCams and microphones;
  • Now they are able to see and hear each other until one of users clicks the "End Call" button.

The HTML5 Web UI

The HTML5 code is fairly easy. We only define the relevant elements, and for simplicity reasons we don't use CSS to style it:

<!DOCTYPE html>
<html>
  <head>
    <script src="webrtc.js"></script>
    <title>WebRTC Audio/Video-Chat</title>
  </head>
  
  <body>
    <video id="remoteVideo" autoplay></video>
    <video id="localVideo" autoplay muted></video>
    <input id="videoCallButton" type="button" disabled value="Video Call"/>
    <input id="endCallButton" type="button" disabled value="End Call"/>
    <script type="text/javascript">
      window.addEventListener("load", pageReady);
    </script>
  </body>
</html>

Only four HTML elements are relevant here: the two video elements, used to display the remote and the local video and the two input elements, used to create the "Video Call" and "End Call" buttons. The script element at the end of the code registers a load event listener (which executes when the page was fully loaded). The relevant code, including the content of the pageReady method are part of the webrtc.js file included with the help of a script element (see head element).

The NodeJS WebSockets-based Signaling Server

The NodeJS server application has a very simple job: receive messages from one client and broadcast them to all the others. These messages are the signaling information required by the peers in order to initiate a peer-to-peer connection. For this, we use WebSockets, which is a built-in API in modern browsers, but requires to install the ws module for NodeJS.

At first we need to install the required NodeJS modules (e.g., ws) by executing npm install in a shell, inside the root folder of the NodeJS application. More information about this module are available on the npm ws module page.

Next, create a file named server.js with the following content:

const WebSocketServer = require('ws').Server,
  express = require('express'),
  https = require('https'),
  app = express(),
  fs = require('fs');

const pkey = fs.readFileSync('./ssl/key.pem'),
  pcert = fs.readFileSync('./ssl/cert.pem'),
  options = {key: pkey, cert: pcert, passphrase: '123456789'};
var wss = null, sslSrv = null;
 
// use express static to deliver resources HTML, CSS, JS, etc)
// from the public folder 
app.use(express.static('public'));

// start server (listen on port 443 - SSL)
sslSrv = https.createServer(options, app).listen(443);
console.log("The HTTPS server is up and running");

// create the WebSocket server
wss = new WebSocketServer({server: sslSrv});  
console.log("WebSocket Secure server is up and running.");

/** successful connection */
wss.on('connection', function (client) {
  console.log("A new WebSocket client was connected.");
  /** incomming message */
  client.on('message', function (message) {
    /** broadcast message to all clients */
    wss.broadcast(message, client);
  });
});
// broadcasting the message to all WebSocket clients.
wss.broadcast = function (data, exclude) {
  var i = 0, n = this.clients ? this.clients.length : 0, client = null;
  if (n < 1) return;
  console.log("Broadcasting message to all " + n + " WebSocket clients.");
  for (; i < n; i++) {
    client = this.clients[i];
    // don't send the message to the sender...
    if (client === exclude) continue;
    if (client.readyState === client.OPEN) client.send(data);
    else console.error('Error: the client state is ' + client.readyState);
  }
};

Note: since WebRTC works ONLY with SSL, for your convenience, we provide a free, self signed SSL certificate together with this application. This certificate shall not be used for other purposes outside playing with the provided demo application. Also, the web browsers will complain about the validity of the SSL certificate because it is not signed by a recognized authority. This means that you should add it to your exception list in order to be able to access the application. Otherwise, feel free to use your own certificate, meaning that you need to replace the two .pem files from the ssl subfolder.

The application communicate via Secure WebSockets on port 443. You can modify this port with other one if required. The above code simply allows WebSocket connections and broadcasts all the messages received from one client, to all other clients (excluding the sender).

To start the server application, execute node server.js from the folder where you created the file with the above content. If all went fine, you should see no error message and the server waits for WebSocket connections. Finally, use a Web Browser and navigate to http://your.domain and you should see the application start page. Using localhost only works for playing locally with the application, and for being able to have a WebRTC connection between two peers having internet connection, one need to use a live server with a public IP address.

If you are behind a corporate firewall, it is possible that all ports excepting 80 (and maybe 443) are closed. In such a case, one can use the mod_proxy_stunnel Apache module which allows to proxy WebSocket communication via the port 80. This module is bundled with Apache starting from version 2.4.5. However, most of the stable Linux systems, including CentOS 6.x provides only earlier Apache versons, such as 2.2.x. A pre-compiled version of this module, (Apache 2.2.15, available from the CentOS 6.7 repositories) is available for download on our server. Further, you have to modify the Apache configuration file, i.e., httpd.conf file (usually located under /etc/httpd/conf/) and add the following lines:

LoadModule proxy_wstunnel_module modules/mod_proxy_wstunnel.so

ProxyPass /websocket/ ws://localhost:3434/
ProxyPassReverse /websocket/ ws://localhost:3434/

Last, restart the Apache Web Server by executing service httpd restart command, for which you may need root privileges (i.e., you may have to use sudo or login as root). The "websocket" path from the above configuration lines can be replaced with whatever you like, but keep in mind that this is the last part of the URL used by the WebSocket client app to access the server. Also remember to use the same port number as the one used in server.js (e.g., 3434).

Note: the above information and examples are provided for a CentOS 6.7 Linux distribution, running Apache Web Server 2.2.15 from the official CentOS 6.7 repository. Different Linux distribution or other Apache version may or may not work the same way, so we can't provide any guaranty on that.

The Client JavaScript Code

In this section we discuss about the content of the webrtc.js file. The first part of this file defines the global variables:

var localVideoElem = null, remoteVideoElem = null, localVideoStream = null,
    videoCallButton = null, endCallButton = null,
    peerConn = null, wsc = new WebSocket('ws://my-web-domain.de/websocket/'),
    peerConnCfg = {'iceServers': 
      [{'url': 'stun:stun.services.mozilla.com'}, {'url': 'stun:stun.l.google.com:19302'}]
    };

The relevant variables are wsc, representing a new WebSocket connection (remember to replace ws://my-web-domain.de/websocket/ with your own URL) and peerConnCfg which specify the configurations parameters used to initiate a new RTCPeerConnection. We use Mozilla (and as a fallback Google) STUN services.

The localVideoElem, remoteVideoElem, videoCallButton and endCallButton are used to get reference to HTML elements representing the local and remote video containers (HTML5 video elements) and the two buttons (HTML input elements with type="button") used to initiate and end a call. Last, the localVideoStream will keep a reference to the local video stream, so we can close it (release the video and audio devices) when the call ends.

Further, we define the pageReady callback method assigned for the load event:

function pageReady() {
  videoCallButton = document.getElementById("videoCallButton");
  endCallButton = document.getElementById("endCallButton");
  localVideo = document.getElementById('localVideo');
  remoteVideo = document.getElementById('remoteVideo');
  // check browser WebRTC availability 
  if(navigator.getUserMedia) {
    videoCallButton = document.getElementById("videoCallButton");
    endCallButton = document.getElementById("endCallButton");
    localVideo = document.getElementById('localVideo');
    remoteVideo = document.getElementById('remoteVideo');
    videoCallButton.removeAttribute("disabled");
    videoCallButton.addEventListener("click", initiateCall);
    endCallButton.addEventListener("click", function (evt) {
      wsc.send(JSON.stringify({"closeConnection": true }));
    });
  } else {
    alert("Sorry, your browser does not support WebRTC!")
  }
};

Before taking any further actions, we need to check if the browser supports the required WebRTC features (avoid strange situations where nothing seems to work without an obvious reason). We do that by checking for the existence of the getUserMedia method, in the navigator global object. If no such method is found, the "Video Call" button remains disabled (no call can be initiated!) and we provide a warning/error message using alert. If the WebRTC is supported, then we enable the "Video Call" button and assign a click event listener to it, so the initiateCall method is executed when the "Video Call" button is clicked. In the same way, a click event listener is assigned to the "Enc Call" button (more details about this are discussed later on this tutorial).

Next, we take care of the WebSocket message exchange between the caller and the callee peers:

wsc.onmessage = function (evt) {
  var signal = JSON.parse(evt.data);
  if (!peerConn)
    answerCall();

  if (signal.sdp) {
    peerConn.setRemoteDescription(new RTCSessionDescription(signal.sdp));
  } else if (signal.candidate) {
    peerConn.addIceCandidate(new RTCIceCandidate(signal.candidate));
  } else if (signal.closeConnection){
      endCall();
  }
};

A peer connection is created (and assigned to peerConn variable) when the "Video Call" button is clicked. If no such (RTCPeerConnection) object exists, it means that we deal with the callee case, so an incoming call, which in our simple application is automatically answered it by invoking the answerCall method. In a more complex real world application, a ring audio signal may be used and the callee may answer the call by clicking an "Answer Call" button, but in our example we keep it simple, so the calls are automatically answered. Well, to be more exact, is more a semi-automatic answer, because the callee web browser asks about the permission to use the video and/or audio devices, so the human user can accept (or reject) these rights, in order to answer (or reject) the call.

The two peers needs to exchange local and remote audio and video media information, such as resolution and codec capabilities. Signaling to exchange media configuration information is made by exchanging an offer and an answer using the Session Description Protocol (SDP).

Initiating a Call

Lets have now a look at the initiateCall method:

function initiateCall() {
  prepareCall();
  navigator.getUserMedia({ "audio": true, "video": true }, function (stream) {
    localVideo.src = URL.createObjectURL(stream);
    peerConn.addStream(stream);
    createAndSendOffer();
  }, function(error) { console.log(error);});
};

First we make some initial preparations for the call (we explain more about this a bit later). Then, using getUserMedia we obtain the local video stream and assign it to a video element where we like to display it on our page (e.g., the video element with id localVideo in our case). Last we create and send a connection offer to the other peer, by invoking the createAndSendOffer method, explained later in this tutorial.

The prepareCall method (see below), is responsible for creating the RTCPeerConnection instance and assign the needed event listeners:

function prepareCall() {
  peerConn = new RTCPeerConnection(peerConnCfg);
  peerConn.onicecandidate = onIceCandidateHandler;
  peerConn.onaddstream = onAddStreamHandler;
};

function onIceCandidateHandler(evt) {
  if (!evt || !evt.candidate) return;
  wsc.send(JSON.stringify({"candidate": evt.candidate }));
};

function onAddStreamHandler(evt) {
  videoCallButton.setAttribute("disabled", true);
  endCallButton.removeAttribute("disabled"); 
  remoteVideo.src = URL.createObjectURL(evt.stream);
};

Any ICE candidate is forwarded to the signaling server for being sent to the other peer (see onIceCandidateHandler) while when receiving a remote stream, we assign it to our video element for being displayed (e.g., the video element with id remoteVideo in our case).

TOne last step is required for the caller, that is to create a connection offer and send it to the other peer:

function createAndSendOffer() {
  peerConn.createOffer(
    function (offer) {
      var off = new RTCSessionDescription(offer);
      peerConn.setLocalDescription(new RTCSessionDescription(off), 
        function() {
          wsc.send(JSON.stringify({"sdp": off }));
        }, 
        function(error) { 
          console.log(error);
        }
      );
    }, 
    function (error) { 
      console.log(error);
    }
  );
};

The offer contains information about how the two peers are about to be connected. The offer messages are forwarded by the signaling server to the other peer, which is being informed about this by using the onmessage event listener, as described earlier on this tutorial.

Answering a Call

Similar with a call initiation, the RTCPeerConnection is created and the event listeners are assigned. Further, a local stream is obtained by using getuserMedia and assigned to a video element. Last an answer is created and sent, in response to the received offer:

function answerCall() {
  prepareCall();
  // get the local stream, show it in the local video element and send it
  navigator.getUserMedia({ "audio": true, "video": true }, function (stream) {
    localVideo.src = URL.createObjectURL(stream);
    peerConn.addStream(stream);
    createAndSendAnswer();
  }, function(error) { console.log(error);});
};

The createAndSendAnswer will prepare the answer and using the WebSocket channel will send it to the singnaling server, which then forwards it to the other peer, so the connection is completed:

function createAndSendAnswer() {
  peerConn.createAnswer(
    function (answer) {
      var ans = new RTCSessionDescription(answer);
      peerConn.setLocalDescription(ans, function() {
          wsc.send(JSON.stringify({"sdp": ans }));
        }, 
        function (error) { 
          console.log(error);
        }
      );
    },
    function (error) { 
      console.log(error);
    }
  );
}

Ending a Call

Note: in theory, ending an WebRTC call may be slightly simpler: close the peer connection (i.e., calling peerConn.close()) then use the callback method assigned to peerConn.oniceconnectionstatechange and check if peerConn.iceConnectionState === "closed". However, we've found two problems with this approach: 1) it does not seem to work (at least not all the times) with both, Google Chrome and Firefox, and 2) a closed connection state may also occur when a temporarily break in the peer connection appears (bad internet connection, some big latencies, etc), which in many of the cases can be automatically restored (no need for additional code or management), so a "call end" may or may not be the exact situation. Because of this, we use the signaling server to notify the other peer about a "real end call" request.

In the pageReady method (called when the HTML page is fully loaded), we've added a click event listener, where we send a closeConnection signal to our signaling server, which forwards it to the other peers:

function pageReady() {
  if(navigator.getUserMedia) {
    // ...some more code here...
    endCallButton.addEventListener("click", function (evt) {
      wsc.send(JSON.stringify({"closeConnection": true }));
    });
  } else {
    alert("Sorry, your browser does not support WebRTC!")
  }
};

The endCall method has the following code:

function endCall() {
  peerConn.close();
  localVideoStream.getTracks().forEach(function (track) {
    track.stop();
  });
  localVideo.src = "";
  remoteVideo.src = "";
  videoCallButton.removeAttribute("disabled");
  endCallButton.setAttribute("disabled", true);
};

First step is to close the RTCPeerConnection by calling the close method. Further, we stop all the (video) tracks and we reset the stream sources of the remote and local video, so nothing is displayed by the video HTML5 elements (the last image frame remains visible if the source is not reset). Last, we take care to enable the "Video Call" button (allowing for a new call) and disable the "End Call" button.

Download the Code

The full client and server source code are available for download on GitHub.

Current Browsers Support

Not all the browsers support WebRTC. Mainly, one can use Google Chrome, Firefox and Opera. For iOS, Bowser, an Open Source web browser with WebRTC support, is available. Partial support is also available in EDGE web browser, and actually this technology is not supported at all by Safari. The complete list of WebRTC features supported by each web browser is available at iswebrtcreadyyet.com

Note: starting with 01.01.2016, using Google Chrome and Opera with WebRTC-based applications is possible only via a secure layer, thus HTTPS must be used instead HTTP!

Category: