Submitted by mdiaconescu on
WebRTC State of the Art
WebRTC (Web Real Time Communication) is a new web standard currently supported by Google, Mozilla and Opera. It allows peer-to-peer communication between browsers. Its mission is to enable rich, high-quality RTC applications for the browser, mobile platforms, and the Web of Things (WoT), and allow them to communicate via a common set of protocols.
One of the last major challenges for the web is to enable human communication via voice and video without using special plugins and without having to pay for these services. The first WebRTC implementation was built in May 2011 by Ericsson. WebRTC defines open standards for real-time, plugin-free video, audio and data communication. Curently, many web services already use RTC, but require downloads, native apps or plugins. These includes Skype, Facebook (which uses Skype) and Google Hangouts (which use the Google Talk plugin). Downloading, installing and updating plugins can be complex, error prone and annoying and it's often difficult to convince people to install plugins in the first place!
How does it work?
In general, a WebRTC-enabled application needs to:
- obtain an audio, video or other data stream;
- gather network information (e.g., IP addresses and ports), and exchange this with other WebRTC clients;
- a "signaling" communication is used to report errors, and initiate or close sessions;
- clients must exchange information about media, such as resolution and codecs;
- stream the audio, video or data;
WebRTC implements three APIs:
- MediaStream - allows the client (e.g., the web browser) to access the stream, such as the one from a WebCam or microphone;
- RTCPeerConnection - enable audio or video data transfer, with support for encryption and bandwidth management;
- RTCDataChannel - enables peer-to-peer communication for any generic data.
In theory it is possible to create a simple WebRTC application, without any server components for signaling. In practice such application does not make much of a sense because it can be used only on a single page, thus it shares data amoung the same peer.
MediaStream
The MediaStream
represents a synchronized stream(s) of media. Each MediaStream has an input
and an output. The getUserMedia
method has three parameters:
- a constraints object;
- a success callback method;
- a failure callback method.
For example, a local WebCam stream can be shown in a HTML5
video
element:
<!DOCTYPE html> <html> <head> <script src="webrtc.js"></script> <title>WebRTC Test</title> </head> <body> <video id="localVideo" autoplay/> <script> window.addEventListener("load", function (evt) { navigator.getUserMedia({ audio: true, video: true}, function(stream) { var video = document.getElementById('localVideo'); video.src = window.URL.createObjectURL(stream); }, function(err) { console.log("The following error occurred: " + err.name); } ); }); </script> </body> </html>
RTCPeerConnection
The RTCPeerConnection interface represents a WebRTC connection between the
local computer and a remote peer. It is used to handle efficient streaming
of data between the two peers. Both parties (the caller and the called
party) need to set up their own RTCPeerConnection
instances to
represent their end of the peer-to-peer connection. In general, we use a
RTCPeerConnection::onaddstream
event callback to take care of
dealing with the audio/video stream, e.g., assigning it to a HTML5
video
:
var peerConn= new RTCPeerConnection(); peerConn.onaddstream = function (evt) { var videoElem = document.createElement("video"); document.appendChild(videoElem); videoElem.src = URL.createObjectURL(evt.stream); };
The initiator of the call (the caller), needs to create an offer and using a signaling service (e.g., a NodeJS server application using WebSockets) send it to the callee:
navigator.getUserMedia({video: true}, function(stream) { videoElem.src = URL.createObjectURL(stream); peerConn.addStream(stream); peerConn.createOffer(function(offer) { peerConn.setLocalDescription(new RTCSessionDescription(offer), function() { // send the offer to a server to be forwarded to the other peer }, error); }, error); });
The callee, which receives the offer and needs to "answer" the call has to create an answer and send it to the caller:
navigator.getUserMedia({video: true}, function(stream) { videoElem.src = URL.createObjectURL(stream); peerConn.addStream(stream); peerConn.setRemoteDescription(new RTCSessionDescription(offer), function() { peerConn.createAnswer(function(answer) { peerConn.setLocalDescription(new RTCSessionDescription(answer), function() { // send the answer to a server to be forwarded back to the caller }, error); }, error); }, error); });
The setLocalDescription
method takes three parameters: a
session description, a success callback method and an error callback method.
This method changes the local description associated with a connection. A
description defines the properties of the connection like for example the
codec.
RTCPeerConnection and Servers
In a real application, WebRTC needs servers (in general simple) for the following purposes:
- users management;
- exchange of information between peers;
- data exchange about media, such as formats and video resolution:
- the connections needs to traverse NAT gateways and firewalls.
The STUN protocol and its extension TURN are used by the ICE framework to
enable RTCPeerConnection
to cope with NAT traversal and other
network specific details. ICE is a framework for connecting peers, such as
two video chat clients. ICE tries to connect peers directly, with the lowest
possible latency, via UDP. In this process, STUN servers have a single task:
to enable a peer behind a NAT to find out its public address and port.
Google and Mozilla provides a couple of STUN severs which can (for now) be
used free of charge. For example, Google STUN servers are used to obtain ICE
candidates, which are then forwarded to the other peer(s):
var peerConnCfg = {'iceServers': [{'url': 'stun:stun.l.google.com:19302'}]}, peerConn= new RTCPeerConnection(peerConnCfg), signalingChannel = new WebSocket('ws://my-websocket-server:port/'); peerConn.onicecandidate = function (evt) { // send any ice candidates to the other peer, i.e., evt.candidate signalingChannel.send(JSON.stringify({ "candidate": evt.candidate })); }; signalingChannel.onmessage = function (evt) { var signal = JSON.parse(evt.data); if (signal.sdp) peerConn.setRemoteDescription(new RTCSessionDescription(signal.sdp)); else if (signal.candidate) peerConn.addIceCandidate(new RTCIceCandidate(signal.candidate)); };
The signalingChannel
represents the communication channel,
based on WebSockets
, XHR
or something else, having
the purpose of helping to exchange the required information for the
peer-to-peer connection initialization.
The setRemoteDescription
method takes three parameters: a
session description, a success callback method and an error callback method.
This method changes the remote description associated with a connection. A
description defines the properties of the connection like for example the
codec.
RTCDataChannel
The RTCDataChannel interface represents a bi-directional data channel
between two peers of a connection. Objects of this type can be created using
RTCPeerConnection.createDataChannel()
, or are received in a
datachannel
event of type RTCDataChannelEvent
on
an existing RTCPeerConnection
. Using a data channel
capabilities is "natural", and makes use of messaging style events based
communication:
var peerConn= new RTCPeerConnection(), dc = peerConn.createDataChannel("my channel"); dc.onmessage = function (event) { console.log("received: " + event.data); }; dc.onopen = function () { console.log("datachannel open"); }; dc.onclose = function () { console.log("datachannel close"); };
Build a Simple Audio/Video-Chat Web-Application
In this section we'll learn how to build a basic Audio/Video-Chat Web-Application. It allows to perform a video call between two peers and displays the local and remote video. In a real application one has to deal with complex situations, users management, and all kind of errors. In this tutorial we skip error situations, and keep our application simple:
- Two friends located on different Earth locations need to have a video call;
- They are able to use a modern Web Browser, such as Google Chrome or Firefox;
- They are able access the web application URL using their available internet connection (DSL, 3G or any other type);
- One of the users initiates the video call by clicking the "Video Call" button;
- Both users allows the browser to access their WebCams and microphones;
- Now they are able to see and hear each other until one of users clicks the "End Call" button.
The HTML5 Web UI
The HTML5 code is fairly easy. We only define the relevant elements, and for simplicity reasons we don't use CSS to style it:
<!DOCTYPE html> <html> <head> <script src="webrtc.js"></script> <title>WebRTC Audio/Video-Chat</title> </head> <body> <video id="remoteVideo" autoplay></video> <video id="localVideo" autoplay muted></video> <input id="videoCallButton" type="button" disabled value="Video Call"/> <input id="endCallButton" type="button" disabled value="End Call"/> <script type="text/javascript"> window.addEventListener("load", pageReady); </script> </body> </html>
Only four HTML elements are relevant here: the two video
elements, used to display the remote and the local video and the two
input
elements, used to create the "Video Call" and "End Call"
buttons. The script
element at the end of the code registers a
load
event listener (which executes when the page was fully
loaded). The relevant code, including the content of the
pageReady
method are part of the webrtc.js
file
included with the help of a script
element (see
head
element).
The NodeJS WebSockets-based Signaling Server
The NodeJS server application has a very simple job: receive messages from
one client and broadcast them to all the others. These messages are the
signaling information required by the peers in order to initiate a
peer-to-peer connection. For this, we use WebSockets
, which is
a built-in API in modern browsers, but requires to install the
ws
module for NodeJS.
At first we need to install the required NodeJS modules (e.g.,
ws
) by executing npm install
in a shell, inside
the root folder of the NodeJS application. More information about this
module are available on the npm
ws module page.
Next, create a file named server.js
with the following
content:
const WebSocketServer = require('ws').Server, express = require('express'), https = require('https'), app = express(), fs = require('fs'); const pkey = fs.readFileSync('./ssl/key.pem'), pcert = fs.readFileSync('./ssl/cert.pem'), options = {key: pkey, cert: pcert, passphrase: '123456789'}; var wss = null, sslSrv = null; // use express static to deliver resources HTML, CSS, JS, etc) // from the public folder app.use(express.static('public')); // start server (listen on port 443 - SSL) sslSrv = https.createServer(options, app).listen(443); console.log("The HTTPS server is up and running"); // create the WebSocket server wss = new WebSocketServer({server: sslSrv}); console.log("WebSocket Secure server is up and running."); /** successful connection */ wss.on('connection', function (client) { console.log("A new WebSocket client was connected."); /** incomming message */ client.on('message', function (message) { /** broadcast message to all clients */ wss.broadcast(message, client); }); }); // broadcasting the message to all WebSocket clients. wss.broadcast = function (data, exclude) { var i = 0, n = this.clients ? this.clients.length : 0, client = null; if (n < 1) return; console.log("Broadcasting message to all " + n + " WebSocket clients."); for (; i < n; i++) { client = this.clients[i]; // don't send the message to the sender... if (client === exclude) continue; if (client.readyState === client.OPEN) client.send(data); else console.error('Error: the client state is ' + client.readyState); } };
Note: since WebRTC works ONLY with SSL, for your
convenience, we provide a free, self signed SSL certificate together with
this application. This certificate shall not be used for other purposes
outside playing with the provided demo application. Also, the web browsers
will complain about the validity of the SSL certificate because it is not
signed by a recognized authority. This means that you should add it to your
exception list in order to be able to access the application. Otherwise,
feel free to use your own certificate, meaning that you need to replace the
two .pem
files from the ssl
subfolder.
The application communicate via Secure WebSockets on port 443. You can modify this port with other one if required. The above code simply allows WebSocket connections and broadcasts all the messages received from one client, to all other clients (excluding the sender).
To start the server application, execute node server.js
from
the folder where you created the file with the above content. If all went
fine, you should see no error message and the server waits for WebSocket
connections.
Finally, use a Web Browser and navigate to http://your.domain and you should see the
application start page. Using localhost
only works for playing
locally with the application, and for being able to have a WebRTC connection
between two peers having internet connection, one need to use a live server
with a public IP address.
If you are behind a corporate firewall, it is possible that all ports
excepting 80 (and maybe 443) are closed. In such a case, one can use the
mod_proxy_stunnel
Apache module which allows to proxy WebSocket
communication via the port 80. This module is bundled with Apache starting
from version 2.4.5. However, most of the stable Linux systems, including
CentOS 6.x provides only earlier Apache versons, such as 2.2.x. A
pre-compiled version of this module, (Apache 2.2.15, available from the
CentOS 6.7 repositories) is available for download on our
server. Further, you have to modify the Apache configuration file,
i.e., httpd.conf
file (usually located under
/etc/httpd/conf/
) and add the following lines:
LoadModule proxy_wstunnel_module modules/mod_proxy_wstunnel.so ProxyPass /websocket/ ws://localhost:3434/ ProxyPassReverse /websocket/ ws://localhost:3434/
Last, restart the Apache Web Server by executing
service httpd restart
command, for which you may need
root
privileges (i.e., you may have to use sudo
or
login as root
). The "websocket" path from the above
configuration lines can be replaced with whatever you like, but keep in mind
that this is the last part of the URL used by the WebSocket client app to
access the server. Also remember to use the same port number as the one used
in server.js
(e.g., 3434).
Note: the above information and examples are provided for a CentOS 6.7 Linux distribution, running Apache Web Server 2.2.15 from the official CentOS 6.7 repository. Different Linux distribution or other Apache version may or may not work the same way, so we can't provide any guaranty on that.
The Client JavaScript Code
In this section we discuss about the content of the webrtc.js
file. The first part of this file defines the global variables:
var localVideoElem = null, remoteVideoElem = null, localVideoStream = null, videoCallButton = null, endCallButton = null, peerConn = null, wsc = new WebSocket('ws://my-web-domain.de/websocket/'), peerConnCfg = {'iceServers': [{'url': 'stun:stun.services.mozilla.com'}, {'url': 'stun:stun.l.google.com:19302'}] };
The relevant variables are wsc
, representing a new
WebSocket
connection (remember to replace
ws://my-web-domain.de/websocket/
with your own URL) and
peerConnCfg
which specify the configurations parameters used to
initiate a new RTCPeerConnection
. We use Mozilla (and as a
fallback Google) STUN services.
The localVideoElem
, remoteVideoElem
,
videoCallButton
and endCallButton
are used to get
reference to HTML elements representing the local and remote
video
containers (HTML5 video
elements) and the
two buttons (HTML input
elements with
type="button"
) used to initiate and end a call. Last, the
localVideoStream
will keep a reference to the local video
stream, so we can close it (release the video and audio devices) when the
call ends.
Further, we define the pageReady
callback method assigned for
the load
event:
function pageReady() { videoCallButton = document.getElementById("videoCallButton"); endCallButton = document.getElementById("endCallButton"); localVideo = document.getElementById('localVideo'); remoteVideo = document.getElementById('remoteVideo'); // check browser WebRTC availability if(navigator.getUserMedia) { videoCallButton = document.getElementById("videoCallButton"); endCallButton = document.getElementById("endCallButton"); localVideo = document.getElementById('localVideo'); remoteVideo = document.getElementById('remoteVideo'); videoCallButton.removeAttribute("disabled"); videoCallButton.addEventListener("click", initiateCall); endCallButton.addEventListener("click", function (evt) { wsc.send(JSON.stringify({"closeConnection": true })); }); } else { alert("Sorry, your browser does not support WebRTC!") } };
Before taking any further actions, we need to check if the browser supports
the required WebRTC features (avoid strange situations where nothing seems
to work without an obvious reason). We do that by checking for the existence
of the getUserMedia
method, in the navigator
global object. If no such method is found, the "Video Call" button remains
disabled (no call can be initiated!) and we provide a warning/error message
using alert
. If the WebRTC is supported, then we enable the
"Video Call" button and assign a click
event listener to it, so
the initiateCall
method is executed when the "Video Call"
button is clicked. In the same way, a click
event listener is
assigned to the "Enc Call" button (more details about this are discussed
later on this tutorial).
Next, we take care of the WebSocket message exchange between the caller and the callee peers:
wsc.onmessage = function (evt) { var signal = JSON.parse(evt.data); if (!peerConn) answerCall(); if (signal.sdp) { peerConn.setRemoteDescription(new RTCSessionDescription(signal.sdp)); } else if (signal.candidate) { peerConn.addIceCandidate(new RTCIceCandidate(signal.candidate)); } else if (signal.closeConnection){ endCall(); } };
A peer connection is created (and assigned to peerConn
variable) when the "Video Call" button is clicked. If no such
(RTCPeerConnection
) object exists, it means that we deal with
the callee case, so an incoming call, which in our simple application is
automatically answered it by invoking the answerCall
method. In
a more complex real world application, a ring audio signal may be used and
the callee may answer the call by clicking an "Answer Call" button, but in
our example we keep it simple, so the calls are automatically answered.
Well, to be more exact, is more a semi-automatic answer, because the callee
web browser asks about the permission to use the video and/or audio devices,
so the human user can accept (or reject) these rights, in order to answer
(or reject) the call.
The two peers needs to exchange local and remote audio and video media information, such as resolution and codec capabilities. Signaling to exchange media configuration information is made by exchanging an offer and an answer using the Session Description Protocol (SDP).
Initiating a Call
Lets have now a look at the initiateCall
method:
function initiateCall() { prepareCall(); navigator.getUserMedia({ "audio": true, "video": true }, function (stream) { localVideo.src = URL.createObjectURL(stream); peerConn.addStream(stream); createAndSendOffer(); }, function(error) { console.log(error);}); };
First we make some initial preparations for the call (we explain more about
this a bit later). Then, using getUserMedia
we obtain the local
video stream and assign it to a video
element where we like to
display it on our page (e.g., the video
element with id
localVideo
in our case). Last we create and send a connection
offer to the other peer, by invoking the createAndSendOffer
method, explained later in this tutorial.
The prepareCall
method (see below), is responsible for
creating the RTCPeerConnection
instance and assign the needed
event listeners:
function prepareCall() { peerConn = new RTCPeerConnection(peerConnCfg); peerConn.onicecandidate = onIceCandidateHandler; peerConn.onaddstream = onAddStreamHandler; }; function onIceCandidateHandler(evt) { if (!evt || !evt.candidate) return; wsc.send(JSON.stringify({"candidate": evt.candidate })); }; function onAddStreamHandler(evt) { videoCallButton.setAttribute("disabled", true); endCallButton.removeAttribute("disabled"); remoteVideo.src = URL.createObjectURL(evt.stream); };
Any ICE candidate is forwarded to the signaling server for being sent to
the other peer (see onIceCandidateHandler
) while when receiving
a remote stream, we assign it to our video
element for being
displayed (e.g., the video
element with id
remoteVideo
in our case).
TOne last step is required for the caller, that is to create a connection offer and send it to the other peer:
function createAndSendOffer() { peerConn.createOffer( function (offer) { var off = new RTCSessionDescription(offer); peerConn.setLocalDescription(new RTCSessionDescription(off), function() { wsc.send(JSON.stringify({"sdp": off })); }, function(error) { console.log(error); } ); }, function (error) { console.log(error); } ); };
The offer contains information about how the two peers are about to be
connected. The offer messages are forwarded by the signaling server to the
other peer, which is being informed about this by using the
onmessage
event listener, as described earlier on this
tutorial.
Answering a Call
Similar with a call initiation, the RTCPeerConnection is created and the
event listeners are assigned. Further, a local stream is obtained by using
getuserMedia
and assigned to a video
element. Last
an answer is created and sent, in response to the received offer:
function answerCall() { prepareCall(); // get the local stream, show it in the local video element and send it navigator.getUserMedia({ "audio": true, "video": true }, function (stream) { localVideo.src = URL.createObjectURL(stream); peerConn.addStream(stream); createAndSendAnswer(); }, function(error) { console.log(error);}); };
The createAndSendAnswer
will prepare the answer and using the
WebSocket channel will send it to the singnaling server, which then forwards
it to the other peer, so the connection is completed:
function createAndSendAnswer() { peerConn.createAnswer( function (answer) { var ans = new RTCSessionDescription(answer); peerConn.setLocalDescription(ans, function() { wsc.send(JSON.stringify({"sdp": ans })); }, function (error) { console.log(error); } ); }, function (error) { console.log(error); } ); }
Ending a Call
Note: in theory, ending an WebRTC call may be slightly
simpler: close the peer connection (i.e., calling
peerConn.close()
) then use the callback method assigned to
peerConn.oniceconnectionstatechange
and check if
peerConn.iceConnectionState === "closed"
. However, we've found
two problems with this approach: 1) it does not seem to work (at least not
all the times) with both, Google Chrome and Firefox, and 2) a
closed
connection state may also occur when a temporarily break
in the peer connection appears (bad internet connection, some big latencies,
etc), which in many of the cases can be automatically restored (no need for
additional code or management), so a "call end" may or may not be the exact
situation. Because of this, we use the signaling server to notify the other
peer about a "real end call" request.
In the pageReady
method (called when the HTML page is fully
loaded), we've added a click
event listener, where we send a
closeConnection
signal to our signaling server, which forwards
it to the other peers:
function pageReady() { if(navigator.getUserMedia) { // ...some more code here... endCallButton.addEventListener("click", function (evt) { wsc.send(JSON.stringify({"closeConnection": true })); }); } else { alert("Sorry, your browser does not support WebRTC!") } };
The endCall
method has the following code:
function endCall() { peerConn.close(); localVideoStream.getTracks().forEach(function (track) { track.stop(); }); localVideo.src = ""; remoteVideo.src = ""; videoCallButton.removeAttribute("disabled"); endCallButton.setAttribute("disabled", true); };
First step is to close the RTCPeerConnection
by calling the
close method. Further, we stop all the (video) tracks and we reset the
stream sources of the remote and local video, so nothing is displayed by the
video
HTML5 elements (the last image frame remains visible if
the source is not reset). Last, we take care to enable the "Video Call"
button (allowing for a new call) and disable the "End Call" button.
Download the Code
The full client and server source code are available for download on GitHub.
Current Browsers Support
Not all the browsers support WebRTC. Mainly, one can use Google Chrome, Firefox and Opera. For iOS, Bowser, an Open Source web browser with WebRTC support, is available. Partial support is also available in EDGE web browser, and actually this technology is not supported at all by Safari. The complete list of WebRTC features supported by each web browser is available at iswebrtcreadyyet.com
Note: starting with 01.01.2016, using Google Chrome and Opera with WebRTC-based applications is possible only via a secure layer, thus HTTPS must be used instead HTTP!