How to Measure Voice Quality Yourself


The Expensive, Accurate Approach: End-to-End Voice Quality Testers

As mentioned in the discussion of PESQ, existing tools can measure the quality of the voice network by directly pumping in prerecording voice samples andcomparing the output. These tools are either expensive or home-grown, and are used to test large networks as a part of a planning or predeployment phase.
This sort of testing is more of a tuning exercise, and-much like how piano tuning is a rare and complicated enough exercise that it is not performed frequently-direct end-to-end testing is not diagnostic. Telephone equipment testing companies do make the sort of equipment to perform this end-to-end inspection, and these tools can be rented. Unfortunately, it is very difficult to know where to invest in this sort of heavily proactive effort.
More likely, the voice quality is measured by having administrators walk around the network with some number of phones in question, ensuring themselves that whatever problems they may face will likely be manageable. The problem with both forms of proactive testing is that they normally occur on only lightly loaded networks, and thus are not able to measure the effect of network load on voice quality. Network load is generally the largest impact on voice quality, in fact, partly because voice mobility network managers do a good job of testing their networks before they launch them for basic problems, which they quickly correct, and partly because voice mobility networks are more likely to be robust enough out of the box for basic voice connectivity.

Network Specific: Packet Capture Tests

Most of the major packet capture tools, for wireline and for wireless, make modules that are able to indirectly infer the MOS values using E-model calculations. Sometimes, these work by tracing the voice setup protocols, such as SIP, and determining what RTP flows map to phone calls and the properties of the phone calls. Other times, these tools will just look directly at the RTP streams, and not try to find out what phone numbers the streams map to In both cases, the tools then use the sequence number and timestamp fields in the RTP stream to determine values such as loss, delay, and jitter. Using assumed values for the jitter buffer, with the option of having the user overwrite them, the tools then model the expected effect and produce a score.
The major issue with these tools is that they show quality only up to the point where they are inserted. An easy example of the problem is to look at wireless networks. On a Wi-Fi network, a packet capture tool may be able to directly determine what packets it sees and come up with a score. By looking at the Wi-Fi protocol, the tool may do a good job of inferring whether the mobile phone received the packet from the access point, and at what time, and may produce a reasonably close call quality number. On the other hand, the upstream flow is likely to look quite good from the point of view of the test tool, because there is only one network in between the client and the tool. The entirety of the network upstream from the client goes missing, and the upstream MOS value can be entirely misleading.
Some network infrastructure devices are able to do these inferences within themselves, as they pass the data through. This may be a reasonable thing to do, again depending on the point of insertion and how well they are able to capture information as late into the network as possible. It is important, when using all of these tools, for you to consult with the vendor or maker of the tools to find out where the tools are measuring. For a wireless controller with voice metric capabilities, for example, make sure that the downstream metrics are measured on the access point, based on what happened over the air, and not just passing through the controller. For wireless overlay monitoring, make sure that there is an option to do a similar capture using a wired mirror port on one of the switches, for cases in which voice quality might begin to suffer and the network needs direct attention. Overall, do not rely on just one tool, and believe what the users say-no matter what the tool tells you.


The Device Itself

The most accurate and reasonable way to measure voice quality is from the endpoints themselves. Both some handsets and PBXs offer the ability for the device to produce the one-way MOS value or R-value for the receive side at the device itself. These numbers are based entirely on E-model calculations, assuming best-case or known-default scenarios for the rest of the system, but are likely to be the most accurate. Of course, it is difficult to ask a user to determine what the voice quality is of a call while on it, especially given that voice quality is not something a user wants to measure. However, for diagnosing locations that are having troubles, this tool is valuable for the administrator herself, who is able to avoid having to guess as to whether the call sounds reasonable, and may be able to detect variations in the MOS value or R-value.
In the end, keep in mind that the absolute values produced by any of the methods deserve being taken with a grain of salt. As time goes on, the administrator of a voice mobility network should be able to learn what the real quality means for any given value the tool suggests, even when the tool is placing results a half a MOS point too high or too low. However, the variation of the scores, especially when the network has changed, can be a valuable tool for point the way towards the solution.

1 comment: