Evolution of Operation, Administration, and Maintenance (OA&M)

There are several functions performed in the PSTN under the common name OA&M. These functions include provisioning (that is, distributing all the necessary software to make systems available for delivering services), billing, maintenance, and ensuring the expected level of quality of service. The scope of the OA&M field is enormous—it deals with transmission facilities, switches, network databases, common channel signaling network elements, and so on. Because of its scope, referring to OA&M as a single task would be as much a generalization as referring to a universal computer application. As we show later in this section, the development of the PSTN OA&M has been evolutionary; as new pieces of equipment and new functions were added to the PSTN, new OA&M functions (and often new pieces of equipment) were created to deal with the administration of these new pieces of equipment and new functions. This development has been posing tremendous administrative problems for network operators. Many hope that as the PSTN and Internet ultimately converge into one network, the operations of the new network will be simpler than they are today.
Initially, all OA&M functions were performed by humans, but they have been progressively becoming automated. In the 1970s, each task associated with a piece of transmission or switching equipment was run by a task-specific application developed only for this task’s purpose. As a result, all applications were developed separately from one another. They had a simple text-based user interface—administrators used teletype terminals connected directly to the entities they administered.

In the 1980s, many tasks previously performed by humans had become fully automated. The applications were developed to emulate humans (up to the point of the programs exchanging text messages as they would appear on the screen of a teletype terminal). These applications, called operations support systems (OSSs), have been developed for the myriad OA&M functions. In most cases, a computer executing a particular OSS was connected to several systems (such as switches or pieces of transmission equipment) by RS-232 lines, and a crude ad hoc protocol was developed for the purpose of this OSS. Later, these computers served as concentrators, and they were in turn connected to the mainframes executing the OSSs. Often, introduction of a new OSS meant that more computers and more lines to connect the computers to the managed elements were needed.

You may ask why the common channel signaling network was not used for interconnection to operations systems. The answer is that this network was designed only for signaling and could not bear any additional—and unpredictable—load. As a matter of fact, a common network for interconnecting OSSs and managed elements has never been developed, although in the late 1980s and early 1990s there was a plan to develop such a network based on the OSI model. In some cases, X.25 was used; in others proprietary data networks developed by the manufacturers of computer equipment were used by telephone companies. A serious industry attempt to create a common mechanism to be used by all OA&M applications has resulted in a standard called Telecommunications Management Network (TMN) and, specifically, its part known as the Common Management Identification Protocol (CMIP), developed jointly by the International Organization for Standardization (ISO) and ITU-T.

We could not possibly even list all existing OA&M tasks. Instead we review one specific task called network traffic management (NTM). This task is important to the subject for the following three reasons. First, the very problem this task deals with is a good illustration of the vulnerability of the PSTN to events it has not been engineered to handle. (One such event—overload of PSTN circuits because of Internet traffic—has resulted in significant reengineering of the access to the Internet.) Second, the problems of switch overload and network overload are not peculiar to the PSTN—they exist (and are dealt with) today in data networks. Yet, the very characteristics of voice traffic are likely to create in the Internet and IP networks exactly the same problems once IP telephony takes off. Similar problems have similar solutions, so we expect the network traffic management applications to be useful in IP telephony. Third, IN and NTM often work on the same problems; it has been long recognized that they need to be integrated. The integration has not taken place in the PSTN yet, so it remains among the most important design tasks for the next-generation network.
NTM was developed to ensure quality of service (QoS) for PSTN voice calls. Traditionally, quality of service in the PSTN has been defined by factors like postdial delay or the fraction of calls blocked by one of the network switches. The QoS problem exists because it would be prohibitively expensive to build switches and networks that would allow us to interconnect all telephone users all the time. On the other hand, it is not necessary to do so, because not all people are using their telephones all the time. Studies have determined the proportion of users making their calls at any given time of the day and day of the week in a given time zone, and the PSTN has consequently been engineered to handle just as much traffic as needed. (Actually, the PSTN has been slightly overengineered to make up for potential fluctuations in traffic.) If a particular local switch is overloaded (that is, if all its trunks or interconnection facilities are busy), it is designed to block (that is, reject) calls.

Initially, the switches were designed to block calls only when they could not handle them independently. By the end of the 1970s, however, the understanding of a peculiar phenomenon observed in the Bell Telephone System—called the Mother’s Day phenomenon—resulted in a significant change in the way calls were blocked (as well as other aspects of the network operation).

Figure 1 demonstrates what happens with the toll network in peak circumstances. The network, engineered at the time to handle a maximum load of 1800 erlangs (an erlang is a unit measuring the load of the network: 1 erlang = 3600 calls x sec), was supposed to behave in response to ever increasing load just as depicted in the top line in the graph—to approach the maximum load and more or less stay there. In reality, however, the network experienced inexplicably decreasing performance way below the engineered level as the load increased. What was especially puzzling was that only a small portion of switches were overloaded at any time. Similar problems occurred during natural disasters—earthquakes and floods. (Fortunately, disasters have not occurred with great frequency.) Detailed studies produced an explanation: As the network attempted to build circuits to the switches that were overloaded, these circuits could not be used by other callers—even those whose calls would pass through or terminate at the underutilized switches. Thus, the root of the problem was that ineffective call attempts had been made that tied up the usable resources.

Figure 1: The Mother’s Day phenomenon.

The only solution was to block the ineffective call attempts. In order to determine such attempts, the network needed to collect in one place much information about the whole network. For this purpose, an NTM system was developed. The system polled the switches every now and then to determine their states; in addition, switches could themselves report certain extraordinary events (called alarms) asynchronously with polling. For example, every five minutes the NTM collects the values of attempts per circuit per hour (ACH) and connections per circuit per hour (CCH) from all switches in the network. If ACH is much higher than CCH, it is clear that ineffective attempts are being made. The NTM applications have been using artificial intelligence technology to develop the inference engines that would pinpoint network problems and suggest the necessary corrective actions, although they still rely on a human’s ability to infer the cause of any problem.

Overall, the problems may arise because of transmission facilities malfunction (as in cases when rats or moles chew up a fiber link—sharks have been known to do the same at the bottom of the ocean) or a breakdown of the common channel signaling system. In a physically healthy network, however, the problems are caused by use above the engineered level (for example, on holidays) or what is called focused overload, in which many calls are directed into the same geographical area. Not only natural disasters can cause overload. A PSTN service called televoting has been expected to do just that, and so is—for obvious reasons—the freephone service, such as 800 numbers in the United States. (Televoting has typically been used by TV and radio stations to gauge the number of viewers or listeners who are asked a question and invited to call either of the two given numbers free of charge. One of the numbers corresponds to a “yes” answer; the other to “no.” Fortunately, IN has built-in mechanisms for blocking such calls to prevent overload.)

Once the cause of the congestion in the network is detected, the NTM OSS deals with the problem by applying controls, that is, sending to switches and IN SCPs the commands that affect their operation. Such controls can be restrictive (for example, directionalization of trunks, making them available only in the direction leading from the congested switch; cancellation of alternative routes through congested switches; or blocking calls that are directed to congested areas) or expansive (for example, overflowing traffic to unusual routes in order to bypass congested areas). Although the idea of an expansive control appears strange at first glance, this type of control has been used systematically in the United States to fix congestion in the Northeast Corridor between Washington, D.C., and Boston, which often takes place between 9 and 11 o’clock in the morning. Since during this period most offices are still closed in California (which is three hours behind), it is not unusual for a call from Philadelphia to Boston to be routed through a toll switch in Oakland.

Overall, the applications of global network management (as opposed to specific protocols) have been at the center of attention in the PSTN industry. This trend continues today. The initial agent/manager paradigm on which both the Open Systems Interconnection (OSI) and Internet models are based has evolved into an agent-based approach, as described by Bieszad et al. (1999). In that paper, an (intelligent) agent is defined as computational entity “which acts on behalf of others, is autonomous, . . . and exhibits a certain degree of capabilities to learn, cooperate and move.” Most of the research on this subject comes in the form of application of artificial intelligence to network management problems. Agents communicate with each other using specially designed languages [such as Agent Communication Language (ACL)]; they also use specialized protocols [such as Contract-Net Protocol (CNP)]. As the result of the intensive research, two agent systems—Foundation for Intelligent Physical Agents (FIPA) and Mobile Agent System Interoperability Facilities (MASIF)—have been proposed. These specifications, however, are not applicable to the products and services described, for which reason they are not addressed here. Consider them, though, as an important reference to a technology in the making.

No comments:

Post a Comment