|
MASRR:
Multi-Agent System for Resource Reliability
Maintaining network resource reliability using a decentralized
system of autonomous agents
Background
The rapid growth of computer networks
over the past ten years has resulted in a highly complicated operating
environment. Networks are susceptible to a variety of attacks and malfunctions
capable of compromising reliability at all levels. Whether network
resource reliability is lost due to malicious behavior, device failure,
administrative misconfigurations, or even congestion from valid traffic,
the cost of this loss can be astronomical,
in terms of lost productivity, the possible theft or destruction
of sensitive documents, and civilian or military health, safety, and
mission risks resulting from the loss of communication and control.
Computer networks are composed of an assortment
of computer platforms, network devices, communication protocols, architectures
and configurations. Networks are subject to a number of failures stemming
from hardware and software faults, power outages, or demands that exceed
capacity. The difficulty is compounded by vulnerabilities, known and unknown,
that allow new attacks to be found and exploited.
In order to maintain network reliability
in the face of this complexity, a software solution is needed that not
only works across heterogeneous networks and inter-networks, but is also
scalable to a variety of network sizes and able to continue functioning
even when portions of the network are unavailable or have become compromised.
SHAI introduces MASRR, a "Multi-Agent System for network Resource
Reliability".
Top of page
MASRR home page
MASRR focuses
on meeting the following criteria:
Scalability: The agents operate
autonomously or semi-autonomously within a decentralized architecture,
allowing MASRR to manage large networks with varying topologies. Additionally,
agents employ message passing protocols that minimize bandwidth consumed
by management operations.
Robustness: Each agent operates
by observing the state of its own network element(s) and those of neighboring
elements. One agent will continue operating even when faced with failures
in other parts of the network, adapting to and minimizing the effect of
failures by providing alternate routing, modifying security policies, or
taking other remedial actions.
Interoperability:
Agents are developed using platform-independent code. They are designed
to operate within the current TCP/IP and SNMP framework. While they obtain
information from their neighbors, they make decisions based on their observations
and do not need to be concerned with implementation details of neighboring
elements.
Pro-active management: MASRR agents
use models of network element and user behavior and misbehavior to detect
both known and unknown network-degrading events. Agents reason about their
observations, assess the source of a fault, and select and execute actions
to mitigate, contain, and recover from problems in the network.
Top of page
MASRR home page
Innovations:
In developing MASRR, SHAI contributes
to the development of these technologies:
-
Decentralized network monitoring and
maintenance.
Agents respond to network degrading events at the local level and with
as much information as might be available from peer MASRR agents.
In addition, the MASRR agents can perform tasks autonomously and semi-autonomously.
-
Reconciliation of differing local views.
Agents reconcile their own views of the network with the views from peer
agents, allowing each agent to utilize data from disparate sources in the
selection of appropriate network maintenance and security actions.
-
Acting under uncertainty. MASRR agents
will be able to select reasonable responses to network degrading events
even though available information is incomplete, enabling the agent to, unlike
signature-based intrusion detection systems, respond to situations never
previously observed.
Recent Developments:
The contract for this project has been completed. We are currently
looking for potential partners who can provide domain expertise and are
interested in pursuing additional research funding with us.
Summary:
SHAI has developed an architecture to support distributed monitoring agents
that share information and perform localized correlation and decision making.
Lacking an actual network on which to deploy our agents, we have created a
prototype system that uses the Scalable Simulation Framework
(SSFNet) network simulator as its
runtime platform. We have created a simple scenario to illustrate detection
and resolution of a problem cause by an attack. The simulator also confirms
the potential of an important project innovation, Change Detection modeling:
Change Detection modeling.
A critical ability for MASRR
agents is change detection -- noting when network traffic and
application behaviors have departed from what is normal or expected. But
this expectation varies throughout the day, the week, the month; moreover,
the network itself changes as stations and devices are added, removed, and
reconfigured. The appearance of "normal" is unique to
each network element and thus differs from the perspective of individual
agents. We must, therefore, build a system in which each agent can learn
patterns for itself and adapt to changes, and yet detect when changes
signal faults or attacks. SHAI is using recent advances in data mining to
enable agents to model fluctuating, normal patterns of behavior and
to classify current observations. When the agent encounters
anomalous behaviors, additional diagnostics and reasoning are performed
as the agent decides on a course of action.
Technical report on SHAI's Change and Anomaly Detection (ChAD)
system [html].
Presentation on ChAD
[Power Point]. The presentation includes detailed
speaker notes; if you are viewing it as a Power Point Show, click the
control in the lower left version and use the menu to view the notes.
Top of page
MASRR home page
Artificial Intelligence (AI) technologies:
Agents
MASRR scales well to networks of any size,
replacing centralized management (a network bottleneck) with a
decentralized system of agents that observe network status and respond
accordingly. Agent communication is similar to the link-state model, as
they share their own views of the network without excessive or redundant
polling. Agents act locally to correlate information and select appropriate
actions.
MASRR agents execute in secure environments; they are not mobile code.
Case Based Reasoning
Agents use case based reasoning to determine
whether a fault or attack might be occurring and to select the best possible
action to repair, mitigate, or circumvent the problems. Such an approach is useful for
identifying anomolies of both known and unknown types, based on their symptoms.
Because a fault can cause more than one set of symptoms, and a set of symptons
can have multiple causes, agents correlate evidence indicated by their
neighbors' views of the network. Scenarios form "cases", consisting of
the network status and the steps the agent should take in a similar situation.
Data Mining
SHAI utilizes data mining tools to build
models of normal network usage and behavior. Our important innovation is the
ability to develop this model of normal using real data collected in real
time. Typical systems seeking to classify normal behavior have required
extensive - and expensive - data collection and cleaning to create a set
of labeled examples from which normal behaviors can be distinguished from
abnormal ones. Aside from the expense of creating this training data,
the models may fail when confronted with abnormalities not represented in
the data. In the network management and security domain, this means they
would fail to detect new attacks or faults.
MASRR
agents compare their current observations to the learned models and draw conclusions
about network performance. The models also serve as an explanation facility,
so that system administrators can not only learn about their current network
configurations, but also offer feedback to the agents.
Machine Learning
Each agent keeps a history of the network
observations and its actions. This information will be used for periodic,
off-line learning of which actions were the most successful under which
circumstances. Thus the agent can become more effective over time.
Time Series Prediction
In order to respond rapidly to an attack
or other failure, each agent uses correlated evidence to predict the most
likely cause and head off damage. For example, agents detecting a compromised
router can quickly act to reroute traffic. They remedy the situation before
other network degradation can occur, as well as isolate the compromised
element from the rest of the network and contain the damage.
Top of page
MASRR home page
Ongoing development of MASRR is funded
by a DARPA contract under the Small Business Innovation Research (SBIR)
program.
For more information on MASRR, please contact the project manager,
Lynn Jones.
|
|