WebRTC for newbies!

If you're newcomer, newbie or beginner; you're suggested to try RTCMultiConnection.js or DataChannel.js libraries.

The exchange of real-time media between two browsers follows this process:

At the media source, input devices are opened for capture. ( getUserMedia )
Media from the input devices is encoded into packets that are transmitted across the network.
At the media destination, the packets are decoded and formed into a media stream.
The media stream is sent to output devices. ( onaddstream )

An application that wishes to enable two-way audio and video communications between peers can create four media streams (i.e. 4 RTP streams):

An audio stream in each direction, (i.e. outgoing/incoming audio RTP streams)
A video stream in each direction. (i.e. outgoing/incoming video RTP streams)

A signaling gateway. ICE server lies in that gateway.

A media server for large scale applications:

to store recorded media streams (audio/video/files/text/etc.) or data
to make SIP calls to PSTN (legacy) networks
to transcode/mix/merge streams

obviously you need a media server and many installations!

So there are three kinds of concrete servers:

Signalling Server (SIP,XMPP,WebSocket,Socket.io,XHR,etc.)
ICE Servers (STUN,TURN)
Media Servers (Asterisk, etc.)

First two are mandatory for each application.

It can be your custom signaling gateway. It needs installations; server side work and more!

It is suggested to use Firebase, PubNub or Pusher for signaling until you get mature!

Interoperable real-time media between browsers uses RTP.

RTP depends on the existence of a signaling channel to establish a common understanding of the meaning of packets.

This includes identification of different streams, codecs, and codec parameters.

Applications that establish peer-to-peer transports require that the IP addresses of a peer are signaled to the remote peer.

Each real-time port consists of an IP address, a port number, a username fragment and password. This information is exchanged with the remote peer using whatever signaling mechanism is chosen by the application.

In order to establish a transport between a local peer and remote peer, the following process is applied:

The local peer opens one or more real-time ports. ( RTP )
The local peer then has to learn of the ports that its remote peer has opened. This uses a signaling channel specific to the application. For instance, a web application could use previously HTTP requests or Websockets connections for this purpose.
A process of discovery is used to find a local and remote port pair (a candidate pair) that can exchange UDP packets. One or more connectivity checks are made from different local ports toward different remote ports. A successful connectivity check indicates that packets can reach the peer and that the peer consents to receive packets.
Finally, a real-time transport is established on the pair of ports. A security context is established so that secured media packets are able to flow in both directions between peers. Real-Time media streams can then be added to the transport.

The initial connection between peers must be accomplished via an application server that provides for user discovery, communication, and Network Address Translation (NAT) with data streaming.

Signalling is the mechanism by which peers send control messages to each other for the purpose of establishing the communication protocol, channel, and method. These are not specified in the WebRTC standard. Rather, the developer may choose any messaging protocol (such as SIP or XMPP), and any two-way communication channel (such as WebSocket or XMLHttpRequest) in tandem with a persistent connection server API (like the Google Channel API) for AppEngine.

Session Description Protocol (SDP): A protocol that is used to announce sessions, manage session invitations, and perform other types of initiation tasks for multimedia sessions.

A well-defined format for conveying sufficient information to discover and participate in a multimedia session (video conference).

A multimedia session is a set of multimedia senders and receivers and the data streams flowing from senders to receivers. A multimedia conference is an example of a multimedia session.

When initiating multimedia teleconferences, voice-over-IP calls, streaming video, or other sessions, there is a requirement to convey media details, transport addresses, and other session description metadata to the participants.

SDP provides a standard representation for such information, irrespective of how that information is transported.

Here is a simple (short) SDP ( about 1000 to 2200 characters text message ) generated by Google Chrome:

a=group:BUNDLE audio video data
...
a=rtpmap:103 ISAC/16000
a=rtpmap:111 opus/48000
a=rtpmap:0 PCMU/8000
...
a=rtpmap:100 VP8/90000
...

SessionDescription = RTCSessionDescription || mozRTCSessionDescription

He is a simple cross-browser "peer.createAnswer" API implementation:

offerSDP = new SessionDescription(offerSDP);
peerConnection.setRemoteDescription(offerSDP);

peerConnection.createAnswer(function (sessionDescription) {
    peerConnection.setLocalDescription(sessionDescription);
}, null, constraints);

First of all; create "offer sdp" by calling peerConnection.createOffer:

offerer.createOffer(function (offerSDP) {
    offerer.setLocalDescription(offerSDP);
    // use XHR/WebSocket/etc. to exchange offer-sdp with other peer(s)
}, null, constraints);

On the "answerer" side, set "remote descriptions" using "offer sdp":

offerSDP = new SessionDescription(offerSDP);
answerer.setRemoteDescription(offerSDP);

And then create "answer sdp":

answerer.createAnswer(function (answerSDP) {
    answerer.setLocalDescription(answerSDP);
    
    // use XHR/WebSocket/etc. to exchange answer-sdp with "offerer"
}, null, constraints);

On the offerer side, set remote descriptions using "answer sdp":

answerSDP = new SessionDescription(answerSDP);
offerer.setRemoteDescription(answerSDP);

The purpose of the ICE protocol is to establish a media path.

"Making a call starts by sending a SIP INVITE message with an SDP describing on which IP address(es) and port(s) the application can receive audio and/or video packets. These addresses and ports are known as candidates."

Specifically, a candidate is an IP address and port at which one peer can receive data from another peer.

There are 3 types of candidates:

Local candidate: A local IP address of the client.
Reflexive or STUN candidates: An IP address of the client's NAT (assuming they are only behind a single NAT). These are determined from another entity, and then communicated back to the client.
Relay or TURN candidate: An address on a relay server that has been allocated for use by the client.
"Traffic can always be sent successfully using relay candidates, unless a firewall blocks all traffic towards the client, in which case no legitimate firewall traversal technique can ever work. The problem with using relay candidates, however, is that they require server resources, and relayed traffic introduces additional delay, loss and jitter in the traffic stream."

You can use peerConnection.onicecandidate event to get ICE generated for local peer. You can send those ICE via XHR/WebSocket/WebSync toward destination.

a=candidate:1 1 UDP 2130706431 192.168.1.102 1816 typ host
a=candidate:2 1 UDP 2130706431 23.45.1.102 3456 typ srflx
a=candidate:3 1 UDP 2130706431 34.66.1.102 5678 typ relay

"Once the callee has sent its ICE candidates, and once the caller receives them, they each start the ICE connectivity checks. At this point, both the parties know about their peer’s potential candidates. Each possible pair of local and remote candidates is formed, creating a number of candidate pairs. A connectivity check is done by sending STUN messages from the local candidate to the remote candidate of each pair, starting with the highest priority (i.e. most preferred) candidate pair first. Both parties exchange STUN messages in this way to determine the best possible candidate pair that they can use to communicate. Once a valid (i.e. successful) message has been sent both ways on a single candidate pair, the connectivity check can stop and media can be sent/received using that candidate pair."

Now if the call has been established, both the caller and callee send media to/from their successful candidate addresses. (usually using RTP protocol)

peerConnection.onaddstream fires as soon as local peer gets clue of the remote stream. Remember, it takes a few seconds for remote stream to start flowing. You can check whether remote stream started flowing or not, by using something like this:

if (!(video.readyState <= HTMLMediaElement.HAVE_CURRENT_DATA 
    || video.paused
    || video.currentTime <= 0))  {
    alert('cool, remote stream is visible!');
} else {
    alert('nope, remote stream is not visible. Try to renegotiate!');
    rtcMultiConnection.renegotiate();
}

Are you interested in a "more" simple full-fledged guide? Read this tutorial.
Are you interested in a "beginners" guide? Read this tutorial.
You can find many tutorials here: https://www.webrtc-experiment.com/#documentations

Enter your email too; if you want "direct" reply!

WebRTC for newbies! ® Muaz Khan

Suggestions

Start here..

Any installation needed on the server?

I'm confused about node.js!

What is signaling and why it is needed?

What is SDP?

Cross browser declaration:

How to order WebRTC code?

What is ICE and is that mandatory?

onaddstream event:

Single Page Demos

Real Demos

Other Tutorials

Latest Updates

Feedback