Telecom Made Simple: Quality-of-Service Mechanisms and Provisioning

There are a few common ways for quality of service to be provided in networks, using enterprise-grade wireline infrastructure. The concepts all stem around handling the packets differently when it comes to queuing. Why? Most wireline networks can handle a fairly large amount of traffic, because the wireline technologies, such as Gigabit Ethernet, have enough throughput to make congestion be less of an issue. However, certain protocols are designed to take up as much bandwidth as they can—to specifically expand into the space that you give them. It will always be important on voice mobility networks to keep the voice traffic protected from these applications, especially if they cause changes in delay. Moreover, network congestion can cause loss rates to become problematic. All of the problems happen to the packets not as they are on the wire, but as they back up in queues within the choke points of the network, the routers or switches that connect the links together. What happens in those queues makes the difference.

Thankfully, using the packet classification capabilities from differentiated services, enterprise-grade wireline infrastructure can be used to both police flows that get out of hand and give the ones that are being squeezed out the help they need. These techniques go under the broad category of queuing disciplines, as they provide the discipline that is used to maintain order in the queues.

The idea is to take what was once one monolithic queue for the chokepoint, and to create possibly different queues, each queue leading to the same eventual chokepoint. As traffic heads towards the bottoms of the queues, an element called a scheduler chooses from which queues to take packets, and then provides those packets for transmission.

We'll take queuing disciplines and scheduling together for this discussion.

FIFO

The simplest behavior is to do no particularly new behavior at all. First-in, first-out (FIFO) queuing refers to using the one queue that is there, and to putting packets in with the same order in which they arrived, and pulling them out the same way. This sort of queuing is precisely what causes congestion and variable delays.

For the purposes of voice, the longer the queue gets, the longer the potential maximum delay the queue can cause the voice packet to suffer. The alternative is not much better: if the queue gets longer than it can handle, the packets will be dropped.

Classification

The first step is to determine whether there is any structure in the packets that can be used to differentiate them. Enterprise-grade classification techniques can use a wide, rich array of properties about the individual packet, including the sender, receiver, size, DSCP value, ports, applications, and routes. These can all be applied in a stateless manner, meaning that the router or switch need look at each packet only in isolation. An additional option exists for some routers and switches with a lot of memory and processing ability. They can use flow state to create stateful classification, in which previous packets that are related to the current one dictate the behavior. This distinction is identical to that used in firewalling. Once packets are classified, they can be placed into queues by their classes. These queues can be administratively created, or they can be created on the fly based on the class divisions, ensuring that packets from each class stay in separate queues.

Class-based queuing (CBQ) is an extension of this basic concept. Instead of having one level of discrimination, the concept can be extended to a hierarchy of queues, all set up by the administrator. This hierarchy can be powerful in preventing flows and users from stepping on each other, and for shaping the bursts and behavior of the traffic. Traffic shaping is a highly important function for variable bitrate, expansive applications, to prevent them from overwhelming other applications that may not deserve the highest prioritization, but still need to be metered.

Once the packets are classified into sibling queues, the schedulers need to be selected, to determine how to get the packets out of the queues.

Round-Robin

The simplest scheduler is the round-robin scheduler. As the name suggests, the round-robin scheduler takes packets in turn from each queue, wrapping around when it hits the last one. Queues with empty packets get skipped over, but otherwise, everyone gets a shot.

Round robin is good for creating packet fairness, were every class gets an equal shot at sending a packet. However, if some of the classes should have a higher priority than the other, then round robin will not suffice.

Strict Prioritization

Strict prioritization is a very simple scheduler. Classes are ordered, strictly, from highest to lowest. The scheduler always starts with the highest queue. If there are no packets in the highest-priority queue, it checks the one with the next highest priority. This continues until the scheduler finds a packet, which it then sends.

By draining the highest-priority queue before moving onto the others, strict prioritization ensures that the traffic with the highest prioritization moves right to the head of the line. Even if the lower-priority queues are heavily backed up and congested, if the highest-priority queue is empty and a highest-priority packet comes in, it will move right past the long lines and be sent first.

Strict prioritization is often good enough for voice, especially when the issue is preventing data from competing with voice. However, for elastic or variable applications where one should get more resources than the other, but not too much more, strict prioritization will not suffice either.

Weighted Fair Queuing

To provide a sense of both prioritization, of which strict prioritization may provide too much, and fairness, of which round robin may provide too little, there is the notion of fair queuing. In fair queuing, the goal is to provide a fair bitrate to each of the classes. Round robin provides a fair packet rate, which is the same only if the packets are all the same size. On top of fair queuing, however, the bitrate should be adjustable so that higher-quality flows get more throughput, without exhausting all the throughput available. This is the concept of weighted fair queuing (WFQ).

The idea behind WFQ is that each queue gets a relative weight. That relative weight is used to adjust the data rate that the queue gets. The amount of traffic that the queue gets is always based on how many other queues are active and for how long; the goal is not to tightly control throughputs or to ensure that no one queue gets ahead of the other, but that queues with equal amounts to send get their weight's worth of relative throughput.

The scheduler's goal is to give the appearance that each queue with a byte in it has a byte taken out fairly (as if, say, by round robin, though order does not matter). This gives rise to thinking about packets flowing through the queues like fluids. The output requires a given data rate, or velocity, and each of the packets are extruded through their queues a little at a time, in equal amounts. The first packet out, then, would be the one whose last byte gets drawn out first—that is, the one that finishes first. The problem, of course, is that packets are packets, not bytes, and cannot be drawn out in this manner.

What can be done is that the scheduler can do the math that simulates the bit-by-bit extraction, and make sure to dequeue packets, then, in that proportion. The scheduler calculates the expected time the packet at the end of each queue would get drawn out, in units of virtual time, that don't depend on real time but still flow forward. This gives the precise order of the packets that should come out. As a packet comes out, the new packet's virtual end time is calculated, and so on. This technique ensures that packets flow out in the order they should.

The weightings come into play by adjusting the velocity, in virtual time, that a queue extrudes its packets. Higher-weighted queues extrude packets more quickly, and thus those packets finish more quickly in virtual time, and hit the wire sooner.

It is important to observe that WFQ is a work-conserving process. Work conservation means that the scheduler never delays sending traffic. If there is a packet to send, in any queue, then at least one packet will be sent. At no time will a work-conserving process refuse to send traffic, or delay sending traffic, in hopes of getting a more even throughput. Work conservation is important for not wasting network resources for the sake of "quality."

Traffic Shaping

Traffic shaping is more severe than fair queuing. Whereas fair queuing is concerned with fairness, traffic shaping is concerned with ensuring that a precise rate of traffic is met by a given class.

Traffic shaping is usually performed through the use of some form of token bucket, first mentioned in the context of RSVP. To recap, the idea of a token bucket is that virtual tokens, corresponding to permission to send bytes, are deposited into the virtual bucket corresponding to the queue at a fixed rate. This rate is the goal at which traffic should be sent. The token bucket then requires that a packet from the queue have enough tokens before it can be let past. This requirement ensures a constant bit rate to the flow.

Token buckets are general ways of metering the flow of traffic. Using them to shape traffic, by holding up packets until there are enough tokens for them, is clearly not work-conserving, as the hold up will happen regardless of whether the line will go idle because of it. On the other hand, token buckets have a bucket depth for a reason. If traffic does happen to go idle for a while in the queue that owns the tokens, the queue is allowed to save up its backlog of tokens for when it might need it. Once the traffic resumes, it can use up all of its saved tokens without waiting. This allows for the average traffic rate to be more manageable, even if the incoming flow is not perfectly regular.

Traffic shaping holds an important place in keeping variable flows in check, so that they do not exceed specific service-level agreements (SLAs), which often specify a minimum available bandwidth. The goal of an SLA is to give a fat pipe that is shared among users the appearance that it really is a dedicated thin pipe for that one user. This is reminiscent of the reason we embarked on this journey, to make packet networks seem more like dedicated circuits. For voice, a constant, inelastic traffic, traffic shaping does not hold much interest in itself for what we need. However, traffic shaping does highlight one advantage of packet-based networks. They are flexible enough to provide circuit-like throughput guarantees for some services when needed while providing expandable prioritization for other services, all on the same wire.

Policing

Policing is the other side of the coin of scheduling and queuing discipline. Instead of deciding to hold onto the packet in a queue until it has met its criteria, classes are watched for the same criteria and their packets are dropped when they exceed it.

The point of policing is that it does not require building up the long lines of delayed packets as queuing would. Instead, the policer can just observe and drop packets that go over the mark. Policing is a lot less forgiving than queuing, but it requires fewer resources in the network.

Token buckets are often used for policing. With token bucket policing, when a packet comes by that does not have enough tokens, it is simply dropped. Packets never delay in this model.

Policing is a tough tactic to get right, because it works necessarily by dropping packets that could have been queued or sent. For voice networks, where the goal is to prevent data from interfering with voice, policing is useful only for preventing runaway or hijacked voice streams, being high priority, from taking over the network. Prioritization is a better method to keep data from affecting voice quality.

Random Early Detection

Along with policing comes the idea of how to drop a packet when the queue is filling. Congestion, for data, is a major issue, and as data backs up, it can cause major problems for any traffic that shares the link with it.

The concept behind random early detection (RED) is that congestion can be signaled to TCP, or any other elastic and responsive traffic protocol, before the congestion gets so bad that it caused unfair loss. Congestion causes that unfair loss by affecting whichever random flow whose packet happens to be the one too many for the queue and gets dropped first. As such a flow loses packets, it slows down, and other flows expand to fit their place. To bring back less broken symmetry between the flows, random early detect uses a sliding scale of random drop probabilities to keep the backup at bay. When the queue is nearly empty, nothing is dropped. As the queue fills, however, RED kicks in by increasing its drop probability. This slow but steady increase starts backing the flows off before the queue gets too full, but the fact that it stops dropping when the queue empties gives pressure when the queues are filling and permissiveness when there is plenty of room.

On top of RED is a concept called weighted random early detection (WRED). WRED uses weights, based on the classifications we have seen already, to alter the drop probabilities. Using the classifications for voice allows administrators to avoid having WRED kick in for voice, which is inelastic and will not respond to being dropped, if the administrator has no ability to place voice in a separate queue or route. For data, more critical data connections, such as TCP-based SIP needed in calls, can be given a higher probability by avoiding a higher drop probability, while allowing normal data to be slowed down.

The problem with RED is the problem with policing. Packets that may have been needed to prevent the queue from going idle even though there are resources for them, causing lost work and wasted resources.

Explicit Congestion Notification

Instead of using RED, routers have the option of marking the packets, rather than dropping them. TCP endpoints that know to read for the congestion-marked packets will consider it as if the packet had, somehow, been lost, and will back off or slow down, but without causing the packet's data to disappear. This increases the performance of the network and improves efficiency, though, needless to say, it does nothing if the endpoints are not aware of the congestion notification scheme.

On TCP, explicit congestion notification (ECN) works by the TCP endpoints negotiating that they support this protocol. Both sides need to support it, because the only way the sender can know if an intervening router has marked a packet is for the receiver to echo that fact back to the sender over TCP itself. Once a flow is established, the sender sets the ECN bit, bit 6, in the DSCP usage of the TOS field in IP. This lets routers know that the packet supports ECN. When a router uses RED to decide that the packet should be dropped early, but notices that the packet is marked for ECN support and the router supports ECN itself, it will not drop the packet. Instead, it will set the seventh bit in the ECN header, the CE or Congestion Experienced bit, marking that the packet should be handled as if it were to have been dropped.

The TCP receiver notices that the packet has been marked, and so needs to echo this fact back in the acknowledgment. The receiver sets the ECE, or ECN-echo bit in the TCP flags field in the acknowledgement. The sender gets the acknowledgement, and uses this flag to cut its congestion window in half, as if the original packet were lost.

Telecom Made Simple

Quality-of-Service Mechanisms and Provisioning