Telecom Made Simple: SDP

RTP only carries the voice, and there must be some associated way to signal the codecs which are supported by each end. This is fundamentally a property of signaling, but, unlike call progress messages and advanced PBX features, is tied specifically to the bearer channel.

SIP uses SDP to negotiate codecs and RTP endpoints, including transports, port numbers, and every other aspect necessary to start RTP streams flowing. SDP, defined in RFC 4566, is a text-based protocol, as SIP itself is, for setting up the various legs of media streams. Each line represents a different piece of information, in the format of type = value.

Table 1 shows an example of an SDP description. This description is for a phone at IP address 192.168.0.10, who wishes to receive RTP on UDP port 9000. Let's go through each of the fields.

Type "v" represents the protocol version, which is 0.
Type "o" holds information about the originator of this request, and the session IDs. Specifically, it is divided up into the username, session ID, session version, network type, address type, and address. "7010" happens to be the dialing phone number. The two large numbers afterward are identifiers, to keep the SDP exchanges straight. The "IN" refers to the address being an Internet protocol address; specifically, "IP4" for IPv4, of "192.168.0.10". This is where the originator is.
Type "s" is the session name. The value given here, "A_conversation", is not particularly meaningful.
Type "c" specifies how the originator must be reached at—its connection data. This is a repetition of the IP address and type specifications for the phone.
Type "t" is the timing for the leg of the call. The first "0" represents the start time, and the second represents the end time. Therefore, there is no particular timing bounds for this call.
The "m" line specifies the media needed. In this case, as with most voice calls, there is only one voice stream from the device, so there is only one media line. The next parameters are the media type, port, application, and then the list of RTP types, for RTP. This call is an "audio" call, and the phone will be listening on port 9000. This is a UDP port, because the application is "RTP/AVP", meaning that it is plain RTP. ("AVP" means that this is standard UDP with no encryption. There is an "RTP/SAVP" option, mentioned shortly.) Finally, the RTP formats the phone can take are 0, 8, and 18.
The next three lines are the codecs that are supported in detail. The "a" field specifies an attribute. The "a=rtpmap" attribute means that the sender wants to map RTP packet types to specific codec setups. The line is formatted as packet type, encoded name/bitrate/parameters. In the first line, RTP packet type "0" is mapped to "PCMU" at 8000 samples per second. The default mapping of "0" is already PCM (G.711) with μ-law, so the new information is the sample rate. The second line asks for A-law, mapping it to 8. The third line asks for G.729, asking for 18 as the mapping. Because the phone only listed those three types, those are the only types it supports.
The last line is also an attribute. "a=ptime" is requesting that the other party send 20ms packets. The other party is not required to submit to this request, as it is only a suggestion. However, this is a pretty good sign that the sender of the SDP message will also send at 20ms.

Table 1: Example of an SDP Description
v=0 o=7010 1352822030 1434897705 IN IP4 192.168.0.10 s=A_conversation c=IN IP4 192.168.0.10 t=0 0 m=audio 9000 RTP/AVP 0 8 18 a=rtpmap:0 PCMU/8000/1 a=rtpmap:8 PCMA/8000/1 a=rtpmap:18 G729/8000/1 a=ptime:20

Table 1: Example of an SDP Description

v=0
o=7010 1352822030 1434897705 IN IP4 192.168.0.10
s=A_conversation
c=IN IP4 192.168.0.10
t=0 0
m=audio 9000 RTP/AVP 0 8 18
a=rtpmap:0 PCMU/8000/1
a=rtpmap:8 PCMA/8000/1
a=rtpmap:18 G729/8000/1
a=ptime:20

The setup message in Table 1 was originally given in a SIP INVITE message. The responding SIP OK message from the other party gave its SDP settings.

Table 2 shows this example response. Here, the other party, at IP address 10.0.0.10, wants to receive on UDP port 11690 an RTP stream with the three codecs PCMU, GSM, and PCMA. It can also receive a format known as "telephone-event." This corresponds to the RTP payload format for sending digits while in the middle of a call (RFC 4733). Some codecs, like G.729, can't carry a dialed digit as the usual audio beep, because the beep gets distorted by the codec. Instead, the digits have to be sent over RTP, embedded in the stream. The sender of this SDP is stating that they support it, and would like to be sent in RTP type 101, a dynamic type that the sender was allowed to choose without restriction.

Table 2: Example of an SDP Responding Description
v=0 o=root 10871 10871 IN IP4 10.0.0.10 s=session c=IN IP4 10.0.0.10 t=0 0 m=audio 11690 RTP/AVP 0 3 8 101 a=rtpmap:0 PCMU/8000 a=rtpmap:3 GSM/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-16 a=silenceSupp:off---- a=ptime:20 a=sendrecv

Table 2: Example of an SDP Responding Description

v=0
o=root 10871 10871 IN IP4 10.0.0.10
s=session
c=IN IP4 10.0.0.10
t=0 0
m=audio 11690 RTP/AVP 0 3 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:3 GSM/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=silenceSupp:off----
a=ptime:20
a=sendrecv

Corresponding to this is the attribute "a=fmtp", which applies to this 101-digit type, "fmtp" lines don't mean anything specific to SDP; instead, the request of "0-16" gets forwarded to the telephone event protocol handler. It is not necessary to go into further details here on what "0-16" means. The "a=silenceSupp" line would activate silence suppression, in which packets are not sent when the caller is not talking. Silence suppression has been disabled, however. Finally, the "a=sendrecv" line means that the originator can both send and receive streaming packets, meaning that the caller can both talk and listen. Some calls are intentionally one-way, such as lines into a voice conference where the listeners cannot speak. In that case, the listeners may have requested a flow with "a=recvonly".

After a device gets an SDP request, it knows enough information to send an RTP stream back to the requester. The receiver need only choose which media type it wishes to use. There is no requirement that both parties use the same codec; rather, if the receiver cannot handle the codec, the higher-layer signaling protocol needs to reject the setup. With SIP, the called party will not usually stream until it accepts the SIP INVITE, but there is no further handshaking necessary once the call is answered and there are packets to send.

For SRTP usage with SIPS, SDP allows for the SRTP key to be specified using a special header:

a=crypto:1 AES_CM_128_HMAC_SHA1_32 Þ
inline:c3bFaGA+Seagd117041az3g113geaG54aKgd50Gz

This specifies that the SRTP AES counter with HMAC_SHA1 is to be used, and specifies the key, encoded in base-64, that is to be used. Both sides of the call send their own randomly generated keys, under the cover of the TLS-protected link. This forms the basis of RTP/SAVP.

The two standard protocols that govern session control are the Session Initiation Protocol (SIP) and Session Description Protocol (SDP). These standards were originally intended for loosely controlled multimedia conferencing over the Internet; however, they have developed into a functional alternative to the H.323 suite. In particular, the combination of SIP and SDP is functionally equivalent to that of H.225.0 and H.245.

SIP (see RFC 2543) was initially standardized by the Multiparty Multimedia Session Control (mmusic) (see www.ietf.org/html.charters/mmusic-charter.html ) working group in the IETF Transport area. As the work had grown, a specialized SIP working group was created (see www.ietf.org/html.charters/sip-charter.html ).

SIP was designed to create and tear down multimedia sessions. In its syntax, SIP is similar to the Hypertext Transfer Protocol (HTTP)—defined in RFC 2616—and it reuses many HTTP header fields (such as authentication). Like HTTP, SIP is ASCII text encoded. Unlike HTTP, however, SIP was developed with the intention of addressing human users, for which reason the uniform resource identifier (URI) defined by SIP looks more like an e-mail address than the address of a World Wide Web page. For example, sip:hui-lan.lu@bell-labs.com is a SIP URI. For the purposes of integrating the PSTN and the Internet, it is important to note that SIP message headers can also carry other URIs (such as telephone URLs, defined by the IETF).

SIP is a client-server protocol: A client generates a request, to which a server sends one or more responses. A (potential) session participant can both generate and receive requests, which suggests that the end systems should have both the client and server capabilities. SIP also supports transaction capabilities. The RFC 2543 definition of a transaction is:

A SIP transaction occurs between a client and a server and comprises all messages from the first request sent from the client to the server up to a final . . . response sent from the server to the client.

Transactions are assigned the command sequence (CSeq) numbers. The SIP nomenclature (similar to that of SNMP and HTTP) alludes to the object-oriented model by defining the following methods that are carried in SIP requests (one method per request):

§ INVITE. Conveys the information about the call to invited participants. It is issued in order to set up a call, and once the call is set up, it can be issued by any party to the call in order to change the call parameters or to add another party.

§ BYE. Terminates a connection.

§ OPTIONS. Solicits information about a user’s capabilities.

§ CANCEL. Terminates the search for the user.

§ REGISTER. Makes the user’s location known to a SIP server.

§ ACK. Invokes the reliable message exchange for invitations. (Note that SIP has its own mechanism for invitation exchange; thus, it can run on top of an unreliable transport layer protocol such as UDP.)

The invitation to a session is accompanied by the Session Description Protocol (SDP) defined in RFC 2327, also developed by the mmusic working group (WG). SDP provides the description format (not the protocol) of the multicast and unicast addresses, the number and types (that is, audio, video, data, control) of streams involved, the codecs involved (that is, the payload types to be carried by the transport protocol), the transport protocol itself (for example, RTP or H.320), the UDP port, the list of starting and stopping times of the session, encryption keys, and so on. Keep in mind that SDP is only one possible payload of SIP; SIP can also carry all Multipurpose Mail Extensions (MIME) types, for example.

In the following section, we discuss location of clients and servers. In a perfectly valid degenerated case, both a client and server can be located in the same host. The opposite extremity, which is strategic to network-wide applications of SIP, is when several SIP servers located in different hosts act as proxies. In this case, server A, after having received a request from a client, may consult a local directory (by using LDAP, for example), only to find out that there is another SIP server, server B, which is better suited to respond to the request. Server A forwards the request to server B, which, from now on, will route its responses through server A.

The proxies can form a routing chain of any length. Figure 1 demonstrates such a chain. You have probably noticed a striking similarity between this figure and Figure 1. This similarity is actually profound, and it has been among the major factors that have influenced the pint working group to adopt SIP as the foundation of the PINT Protocol. SIP also offers a natural solution to the problem of gateway discovery, which is being worked on in the IP Telephony (iptel) working group (www.ietf.org/html.charters/iptel-charter.html).

Figure 1: SIP routing.

The use and applications of SIP are growing. The present specification of SIP, however, as far as size is concerned, is about one-tenth of that of the H.323 suite. In some cases (for example, session control), the protocols seem to be working to solve the same problem; other aspects (for example, definition of network functional elements and their respective roles) are different. For a good comparison of H.323 and SIP, please see “A Comparison of SIP and H.323 for Internet Telephony” (Schulzrinne and Rosenberg, 1998).

Telecom Made Simple

SDP and Codec Negotiations

Session Initiation Protocol and Session Description Protocol

Telecom Made Simple

Search This Blog

Blog Archive

Total Pageviews