rfc9940.original   rfc9940.txt 
Network Working Group N. Davis, Ed. Internet Engineering Task Force (IETF) N. Davis, Ed.
Internet-Draft Ciena Request for Comments: 9940 Ciena
Intended status: Informational A. Farrel, Ed. Category: Informational A. Farrel, Ed.
Expires: 19 February 2026 Old Dog Consulting ISSN: 2070-1721 Old Dog Consulting
T. Graf T. Graf
Swisscom Swisscom
Q. Wu Q. Wu
Huawei Huawei
C. Yu C. Yu
Huawei Technologies Huawei Technologies
18 August 2025 February 2026
Some Key Terms for Network Fault and Problem Management Some Key Terms for Network Fault and Problem Management
draft-ietf-nmop-terminology-23
Abstract Abstract
This document sets out some terms that are fundamental to a common This document sets out some terms that are fundamental to a common
understanding of network fault and problem management within the understanding of network fault and problem management within the
IETF. IETF.
The purpose of this document is to bring clarity to discussions and The purpose of this document is to bring clarity to discussions and
other work related to network fault and problem management, in other work related to network fault and problem management -- in
particular to YANG data models and management protocols that report, particular, to YANG data models and management protocols that report,
make visible, or manage network faults and problems. make visible, or manage network faults and problems.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This document is not an Internet Standards Track specification; it is
provisions of BCP 78 and BCP 79. published for informational purposes.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are candidates for any level of Internet
Standard; see Section 2 of RFC 7841.
This Internet-Draft will expire on 19 February 2026. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9940.
Copyright Notice Copyright Notice
Copyright (c) 2025 IETF Trust and the persons identified as the Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document. Code Components extracted from this document must
described in Section 4.e of the Trust Legal Provisions and are include Revised BSD License text as described in Section 4.e of the
provided without warranty as described in the Revised BSD License. Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction
2. Usage of Terms . . . . . . . . . . . . . . . . . . . . . . . 3 2. Usage of Terms
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Terminology
3.1. Context Terminology . . . . . . . . . . . . . . . . . . . 4 3.1. Context Terminology
3.2. Core Terms . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Core Terms
3.3. Other Terms . . . . . . . . . . . . . . . . . . . . . . . 9 3.3. Other Terms
4. Workflow Explanations . . . . . . . . . . . . . . . . . . . . 9 4. Workflow Explanations
5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 5. Security Considerations
6. Privacy Considerations . . . . . . . . . . . . . . . . . . . 14 6. Privacy Considerations
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 7. IANA Considerations
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 15 8. Informative References
Informative References . . . . . . . . . . . . . . . . . . . . . 15 Acknowledgments
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 Authors' Addresses
1. Introduction 1. Introduction
Successful operation of large networks depends on effective network Successful operation of large networks depends on effective network
management. This requires a virtuous circle of network control, management. This requires a virtuous circle of network control,
network observability, network analytics, network assurance, and back network observability, network analytics, network assurance, and back
to network control. Network fault and problem management [RFC6632] to network control. Network fault and problem management [RFC6632]
is an important aspect of network management and control solutions. is an important aspect of network management and control solutions.
It deals with the detection, reporting, inspection, isolation, It deals with the detection, reporting, inspection, isolation,
correlation, and management of events within the network. The correlation, and management of events within the network. The
skipping to change at page 3, line 7 skipping to change at line 94
negative effect on the network's ability to forward traffic according negative effect on the network's ability to forward traffic according
to expected behavior and so deliver services, the ability to control to expected behavior and so deliver services, the ability to control
and operate the network, and other faults that reduce the quality or and operate the network, and other faults that reduce the quality or
reliability of the delivered service. The concept of fault and reliability of the delivered service. The concept of fault and
problem management extends to include actions taken to determine the problem management extends to include actions taken to determine the
causes of problems and to work toward recovery of expected network causes of problems and to work toward recovery of expected network
behavior. behavior.
A number of work efforts within the IETF seek to provide components A number of work efforts within the IETF seek to provide components
of a fault management system, such as YANG data models or management of a fault management system, such as YANG data models or management
protocols. It is important that a common terminology is used so that protocols. It is important that a common terminology be used so that
there is a clear understanding of how the elements of the management there is a clear understanding of how the elements of the management
and control solutions fit together, and how faults and problems will and control solutions fit together and how faults and problems will
be handled. be handled.
This document sets out some terms that are fundamental to a common This document sets out some terms that are fundamental to a common
understanding of network fault and problem management. While understanding of network fault and problem management. While
"faults" and "problems" are concepts that apply at all levels of "faults" and "problems" are concepts that apply at all levels of
technology in the Internet, the scope of this document is restricted technology in the Internet, the scope of this document is restricted
to the network layer and below, hence this document is specifically to the network layer and below; hence, this document is specifically
about "network fault and problem management." The concept of about "network fault and problem management." The concept of
"incidents" is also touched on in this document, where an incident "incidents" is also touched on in this document, where an incident
results from one or more problems and is the disruption of a network results from one or more problems and is the disruption of a network
service. service.
Note that some useful terms are defined in [RFC3877] and [RFC8632]. Note that some useful terms are defined in [RFC3877] and [RFC8632].
The definitions in this document are informed by those documents, but The definitions in this document are informed by those documents, but
they are not dependent on that prior work. they are not dependent on that prior work.
2. Usage of Terms 2. Usage of Terms
The terms defined in this document are intended for consistent use The terms defined in this document are intended for consistent use
within the IETF in the scope of network fault and problem management. within the IETF in the scope of network fault and problem management.
Where similar concepts are described in other bodies, an attempt has Where similar concepts are described in other bodies, an attempt has
been made to harmonize with those other descriptions, but there is been made to harmonize with those other descriptions, but care is
care needed where terms are not used consistently between bodies or needed where terms are not used consistently between bodies or where
where terms are applied outside the network layer. If other bodies terms are applied outside the network layer. If other bodies find
find the terminology defined in this document useful, they are free the terminology defined in this document useful, they are free to use
to use it. it.
The purpose of this document is to define the following terms for use The purpose of this document is to define the following terms for use
in other documents. Other terms are defined to enable those in other documents. Other terms are defined to enable those
definitions and may also be used by other documents, although that is definitions and may also be used by other documents, although that is
not the principal purpose of their definitions here. not the principal purpose of their definitions here.
* Event * Event
* State * State
* Fault * Fault
* Problem * Problem
skipping to change at page 4, line 4 skipping to change at line 137
not the principal purpose of their definitions here. not the principal purpose of their definitions here.
* Event * Event
* State * State
* Fault * Fault
* Problem * Problem
* Symptom * Symptom
* Cause * Cause
* Alert * Alert
* Alarm * Alarm
When other documents make use of the terms as defined in this When other documents make use of the terms as defined in this
document, it is suggested here that such uses should use document, it is suggested here that such uses should use
capitalization of the terms as in this document to help distinguish capitalization of the terms as in this document to help distinguish
them from colloquial uses, and should include an early section them from colloquial uses and should include an early section listing
listing the terms inherited from this document with a citation. the terms inherited from this document with a citation.
3. Terminology 3. Terminology
This section contains key terms. It is split into three subsections. This section contains key terms. It is split into three subsections.
* Section 3.1 contains terms that help to set the context for * Section 3.1 contains terms that help set the context for network
network fault and problem management systems. fault and problem management systems.
* Section 3.2 includes specific and detailed core terms that will be * Section 3.2 includes specific and detailed core terms that will be
used in other documents that describe elements of the network used in other documents that describe elements of the network
fault and problem management systems. fault and problem management systems.
* Section 3.3 provides three further terms that may be helpful. * Section 3.3 provides three further terms that may be helpful.
3.1. Context Terminology 3.1. Context Terminology
This section includes some terminology that helps describe the This section includes some terminology that helps describe the
context for the rest of this work. The terms may be viewed as a context for the rest of this work. The terms may be viewed as a
cascaded sequence of processes, starting with Network Telemetry and cascaded sequence of processes, starting with Network Telemetry and
building to Network Observability. The definitions are deliberately building to Network Observability. The definitions are deliberately
kept relatively terse. Further documents may expand on these terms kept relatively terse. Further documents may expand on these terms
without loss of specificity. Such contextualization (if any) should without loss of specificity. Such contextualization (if any) should
be highlighted clearly in those documents. be highlighted clearly in those documents.
Network Telemetry: This is defined in [RFC9232] and describes the Network Telemetry: This is defined in [RFC9232] and describes the
process of collecting operational network data categorized process of collecting operational network data categorized
according to the network plane (e.g., layer 3, layer 2, and layer according to the network plane (e.g., Layer 3, Layer 2, and Layer
1) from which it was derived. Data collected through the Network 1) from which it was derived. Data collected through the Network
Telemetry process does not contain any data related to service Telemetry process does not contain any data related to service
definitions (i.e., "intent" per Section 3.1 of [RFC9315]). definitions (i.e., "intent" per Section 3.1 of [RFC9315]).
Network Monitoring: This is the process of keeping a continuous Network Monitoring: This is the process of keeping a continuous
record of functions related to a network topology. It involves record of functions related to a network topology. It involves
tracking various aspects such as traffic patterns, device health, tracking various aspects such as traffic patterns, device health,
performance metrics, and overall network behaviour. This approach performance metrics, and overall network behavior. This approach
differentiates network monitoring from resource or device differentiates network monitoring from resource or device
monitoring, which focuses on individual components or resources monitoring, which focuses on individual resources or components
(Section 3.2). (Section 3.2).
Network Analytics: This is the process of deriving analytical Network Analytics: This is the process of deriving analytical
insights from operational network data. A process could be insights from operational network data. A process could be
executed by a piece of software, a system, or a human that executed by a piece of software, a system, or a human that
analyzes operational data and outputs new analytical data related analyzes operational data and outputs new analytical data related
to the operational data, for example, a symptom. to the operational data -- for example, a symptom.
Network Observability: This is the process of enabling network Network Observability: This is the process of enabling network
behavioral assessment through analysis of observed operational behavioral assessment through analysis of observed operational
network data (logs, alarms, traces, etc.) with the aim of network data (logs, alarms, traces, etc.) with the aim of
detecting symptoms of network behavior, and to identify anomalies detecting symptoms of network behavior, and to identify anomalies
and their causes. Network Observability begins with information and their causes. Network Observability begins with information
gathered using Network Monitoring tools and that may be further gathered using Network Monitoring tools and that may be further
enriched with other operational data. The expected outcome of the enriched with other operational data. The expected outcome of the
observability processes is identification and analysis of observability processes is identification and analysis of
deviations in observed state versus the expected state of a deviations in observed state versus the expected state of a
skipping to change at page 5, line 37 skipping to change at line 216
data gathered in Network Telemetry. data gathered in Network Telemetry.
* Network Analytics is the process of deriving insight through the * Network Analytics is the process of deriving insight through the
data recorded in Network Monitoring. data recorded in Network Monitoring.
* Network Observability is the process of enabling behavioral * Network Observability is the process of enabling behavioral
assessment of a network through Network Analytics. assessment of a network through Network Analytics.
3.2. Core Terms 3.2. Core Terms
The terms are presented below in an order that is intended to flow The terms in this section are presented in an order that is intended
such that it is possible to gain understanding reading top to bottom. to flow such that it is possible to gain understanding reading top to
The figures and explanations in Section 4 may aid understanding the bottom. The figures and explanations in Section 4 may aid
terms set out here. understanding the terms set out here.
Resource: An element of a network system. Resource: An element of a network system.
Resource is a recursive concept so that a Resource may be a * Resource is a recursive concept so that a Resource may be a
collection of other Resources (for example, a network node collection of other Resources (for example, a network node
comprises a collection of network interfaces). comprises a collection of network interfaces).
Characteristic: Observable or measurable aspect or behavior Characteristic: Observable or measurable aspect or behavior
associated with a Resource. associated with a Resource.
* A Characteristic may be considered to be built on facts (see * A Characteristic may be considered to be built on facts (see
'Value', below) and the contexts and descriptors that identify 'Value', below) and the contexts and descriptors that identify
and give meaning to the facts. and give meaning to the facts.
* The term "Metric" [RFC9417] is another word for a measurable * The term "Metric" [RFC9417] is another word for a measurable
Characteristic which may also be thought of as analogous to a Characteristic which may also be thought of as analogous to a
'variable'. 'variable'.
Value: A Value is a measure of a Characteristic associated with a Value: A measure of a Characteristic associated with a Resource. It
Resource. It may be in the form of a categorization (e.g., high may be in the form of a categorization (e.g., high or low), an
or low), an integer (e.g., a count or gauge), or a reading of a integer (e.g., a count or gauge), or a reading of a continuous
continuous variable (e.g., an analog measurement), etc. variable (e.g., an analog measurement), etc.
Change: In the context of Network Monitoring, a Change is the Change: In the context of Network Monitoring, the variation in the
variation in the Value of a Characteristic associated with a Value of a Characteristic associated with a Resource. A Change
Resource and may arise over a period of time. may arise over a period of time.
* Not all Changes are noteworthy (i.e., they do not have * Not all Changes are noteworthy (i.e., they do not have
Relevance). Relevance).
* Perception of Change depends upon Detection, the sampling * Perception of Change depends upon Detection, the sampling
rate/accuracy/detail, and perspective. rate/accuracy/detail, and perspective.
* It may be helpful to qualify this as "Value Change" because the * It may be helpful to qualify this as "Value Change" because the
English word "change" is often heavily used. English word "change" is often heavily used.
Event: The variation in Value of a Characteristic of a Resource at a Event: The variation in Value of a Characteristic of a Resource at a
distinct moment in time (i.e., the period is negligible). distinct moment in time (i.e., the period is negligible).
* Compared with a Change, which may be over a period of time, an * Compared with a Change, which may be over a period of time, an
Event happens at a distinct moment in time. Thus, an Event may Event happens at a distinct moment in time. Thus, an Event may
be the observation of a Change. be the observation of a Change.
Condition: A Condition is an interpretation of the Values of a set Condition: An interpretation of the Values of a set of one or more
of one or more Characteristics of a Resource (with respect to Characteristics of a Resource (with respect to working order or
working order or some other aspect relevant to the Resource some other aspect relevant to the Resource purpose/application) --
purpose/application), for example "low available memory". Thus, for example, "low available memory". Thus, it is the output of a
it is the output of a function applied to a set of one or more function applied to a set of one or more variables.
variables.
State: A particular Condition that a Resource has (i.e., it is in a State: A particular Condition that a Resource has (i.e., it is in a
State) at a specific time. For example, a router may report the State) at a specific time. For example, a router may report the
total amount of memory it has, and how much is free. These are total amount of memory it has and how much is free. These are the
the Values of two Characteristics of a Resource. These Values can Values of two Characteristics of a Resource. These Values can be
be interpreted to determine the Condition of the Resource, and interpreted to determine the Condition of the Resource, and that
that may determine the State of the router, such as shortage of may determine the State of the router, such as shortage of memory.
memory.
* While a State may be observed at a specific moment in time, it * While a State may be observed at a specific moment in time, it
is actually determined by summarizing measurement over time in is actually determined by summarizing measurement over time in
a process sometimes called State compression. a process sometimes called State compression.
* It may be helpful to qualify this as "Resource State" to make * It may be helpful to qualify this as "Resource State" to make
clear the distinction between this and other uses of "state" clear the distinction between this and other uses of "state"
such as "protocol state". such as "protocol state".
* This term may be contrasted with "Operational State" as used in * This term may be contrasted with "Operational State" as used in
[RFC8342]. For example, the state of a link might be up/down/ [RFC8342]. For example, the state of a link might be up/down/
degraded, but the operational state of link would include a degraded, but the operational state of the link would include a
collection of Values of Characteristics of the link. collection of Values of Characteristics of the link.
Detect (hence Detected, Detection): To notice the presence of Detect (hence Detected, Detection): To notice the presence of
something (State, Change, Event, activity, etc.). something (State, Change, Event, activity, etc.) and hence also to
notice a Change (from the perspective of an observer such as a
* Hence also to notice a Change (from the perspective of an monitoring system).
observer such as a monitoring system).
Relevance: Consideration of an Event, State, or Value (through the Relevance: Consideration of an Event, State, or Value (through the
application of policy, relative to a specific perspective, intent, application of policy, relative to a specific perspective, intent,
and in relation to other Events, States, and Values) to determine and in relation to other Events, States, and Values) to determine
whether it is of note to the system that controls or manages the whether it is of note to the system that controls or manages the
network. Note, for example, that not all Changes are Relevant. network. Note, for example, that not all Changes are Relevant.
* This term may also be used as "Relevant Event", "Relevant * This term may also be used as "Relevant Event", "Relevant
State", or "Relevant Value". State", or "Relevant Value".
Occurrence: A Relevant Event or a particular Relevant Change. Occurrence: A Relevant Event or a particular Relevant Change.
* An Occurrence may be an aggregation or abstraction of multiple * An Occurrence may be an aggregation or abstraction of multiple
fine-grain Events or Changes. fine-grained Events or Changes.
* An Occurrence may occur at any macro or micro scale because * An Occurrence may occur at any macro or micro scale because
Resources are a recursive concept, and may be perceived Resources are a recursive concept, and may be perceived,
depending on the scope of observation (i.e., according to the depending on the scope of observation (i.e., according to the
level of Resource recursion that is examined). That is, level of Resource recursion that is examined). That is,
Occurrences, themselves are a recursive concept. Occurrences, themselves, are a recursive concept.
Fault: An Occurrence (i.e., an Event or a Change) that is not Fault: An Occurrence (i.e., an Event or a Change) that is not
desired/required (as it may be indicative of a current or future desired/required (as it may be indicative of a current or future
undesired State). Thus, a Fault happens at a moment in time. A undesired State). Thus, a Fault happens at a moment in time. A
Fault can potentially be associated with a Cause. See [RFC8632] Fault can potentially be associated with a Cause. See [RFC8632]
for a more detailed discussion of network faults. for a more detailed discussion of network faults.
* Note that there is a distinction between a Fault and a Problem * Note that there is a distinction between a Fault and a Problem
that depends on context. For example, in a connectivity that depends on context. For example, in a connectivity
service where redundancy is present, a link down is a Problem, service where redundancy is present, a link down is a Problem,
skipping to change at page 8, line 23 skipping to change at line 342
* Note that there is a historic aspect to the concept of a * Note that there is a historic aspect to the concept of a
Problem. The current State may be operational, but there could Problem. The current State may be operational, but there could
have been a Fault that is unexplained, and the fact of that have been a Fault that is unexplained, and the fact of that
unexplained recent Fault is a Problem. unexplained recent Fault is a Problem.
* Note that while a Problem is unresolved it may continue to * Note that while a Problem is unresolved it may continue to
require attention. A record of resolved Problems may be require attention. A record of resolved Problems may be
maintained in a log. maintained in a log.
* Note that there may be a State which is considered to be a * Note that there may be a State that is considered to be a
Problem from several perspectives. For example, consider a Problem from several perspectives. For example, consider a
"loss of light" State that may cause multiple services to fail. "loss of light" State that may cause multiple services to fail.
In this example, a new State (the light recovers) may cause the In this example, a new State (the light recovers) may cause the
Problem to be resolved from one perspective (the services are Problem to be resolved from one perspective (the services are
operational once more), but may leave the Problem as unresolved operational once more) but may leave the Problem as unresolved
(because the loss of light has not been explained). Further, (because the loss of light has not been explained). Further,
in this example, there could be another development (the reason in this example, there could be another development (the reason
for the temporary loss of light is traced to a microbend in the for the temporary loss of light is traced to a microbend in the
fiber that is repaired) resulting in that unresolved Problem fiber that is repaired) resulting in that unresolved Problem
now being resolved. But, in this example, this still leaves a now being resolved. But, in this example, this still leaves a
further Problem unresolved (a microbend occurred, and that further Problem unresolved (a microbend occurred, and that
Problem is not resolved until it is understood how it occurred Problem is not resolved until it is understood how it occurred
and a remedy is put in place to prevent recurrence). and a remedy is put in place to prevent recurrence).
Cause: The Events (Detected or otherwise) that gave rise to a Fault/ Cause: The Events (Detected or otherwise) that gave rise to a Fault/
Problem. Problem.
Incident: A (Network) Incident is an undesired Occurrence such as an Incident: Also referred to as "Network Incident". An Incident is an
unexpected interruption of a network service, degradation of the undesired Occurrence such as an unexpected interruption of a
quality of a network service, or the below-target performance of a network service, degradation of the quality of a network service,
network service. An Incident results from one or more Problems, or the below-target performance of a network service. An Incident
and a Problem may give rise to or contribute to one or more results from one or more Problems, and a Problem may give rise to
Incidents. Greater discussion of Network Incident relationships, or contribute to one or more Incidents. Greater discussion of
including Customer Incidents and Incident management, can be found Network Incident relationships, including Customer Incidents and
in [I-D.ietf-nmop-network-incident-yang]. Incident management, can be found in [Net-Incident-Mgmt-YANG].
Symptom: An observable Value, Change, State, Event, or Condition Symptom: An observable Value, Change, State, Event, or Condition
considered as an indication of a Problem or potential Problem. considered as an indication of a Problem or potential Problem.
Anomaly: A (Network) Anomaly is an unusual or unexpected Event or Anomaly: Also referred to as "Network Anomaly". An Anomaly is an
pattern in network data in the forwarding plane, control plane, or unusual or unexpected Event or pattern in network data in the
management plane that deviates from the normal, expected behavior. forwarding plane, control plane, or management plane that deviates
See [I-D.ietf-nmop-network-anomaly-architecture] for more details. from the normal, expected behavior. See [Net-Anomaly-Arch] for
more details.
Alert: An indication of a Fault. Alert: An indication of a Fault.
Alarm: As specified in [RFC8632], an Alarm signifies an undesirable Alarm: As specified in [RFC8632], signifies an undesirable State in
State in a Resource that requires corrective action. From a a Resource that requires corrective action. From a management
management point of view, an Alarm can be seen as a State in its point of view, an Alarm can be seen as a State in its own right
own right and the transition to this State may result in an Alert and the transition to this State may result in an Alert being
being issued. The receipt of this Alert may give rise to a issued. The receipt of this Alert may give rise to a continuous
continuous indication (to a human operator) highlighting the indication (to a human operator) highlighting the potential or
potential or actual presence of a Problem. actual presence of a Problem.
3.3. Other Terms 3.3. Other Terms
Three other terms may be helpful: Three other terms may be helpful:
Intermittent: A State that is not continuous, but keeps recurring in Intermittent: A State that is not continuous but that keeps
some time frame. recurring in some time frame.
Transient: A State that is not continuous, and occurs once in some Transient: A State that is not continuous and that occurs once in
time frame. some time frame.
Recurrent: A Problem that is actively resolved, but returns. Recurrent: A Problem that is actively resolved but returns.
4. Workflow Explanations 4. Workflow Explanations
This section aims to add information about the relationship between This section aims to add information about the relationship between
the terms defined in Section 3.2 in the context of network fault and the terms defined in Section 3.2 in the context of network fault and
problem management. The text and figures here are for explanation problem management. The text and figures here are for explanation
and are not normative for the definition of terms. and are not normative for the definition of terms.
The relationship between Resources and Characteristics is shown in The relationship between Resources and Characteristics is shown in
Figure 1. Note that there is a 1:n relationship between Network Figure 1. Note that there is a 1:n relationship between a Network
system and Resources, and between Resources and Characteristics: this system and Resources and between Resources and Characteristics: For
is not shown on the figure for clarity. clarity, this is not shown in the figure.
Characteristics Characteristics
^ ^
| |
Resources Resources
^ ^
| |
Network system Network system
Figure 1: Resources and Characteristics Figure 1: Resources and Characteristics
The Value of a Characteristic of a Resource may change over time. The Value of a Characteristic of a Resource may change over time.
Specific Changes in Value may be noticed at a specific time (as Specific Changes in Value may be noticed at a specific time (as
digital Changes), Detected, and treated as Events. This is shown on digital Changes), Detected, and treated as Events. This is shown on
the left of Figure 2. the left-hand side of Figure 2.
The center of Figure 2 shows how the Value of a Characteristic may The center of Figure 2 shows how the Value of a Characteristic may
change over time. The Value may be Detected at specific times or change over time. The Value may be Detected at specific times or
periodically and give rise to Conditions that are States (and periodically and give rise to Conditions that are States (and
consequently State Changes). consequently State Changes).
In practice, the Characteristic may vary in an analog manner over In practice, the Characteristic may vary in an analog manner over
time as shown on the right-hand side of Figure 2. The Value can be time as shown on the right-hand side of Figure 2. The Value can be
read or reported (i.e., Detected) periodically leading to analog read or reported (i.e., Detected) periodically leading to analog
Values that may be deemed Relevant Values, or may be evaluated over Values that may be deemed Relevant Values, or it may be evaluated
time as shown in Figure 6. over time as shown in Figure 6.
Event State Value Event State Value
Condition Condition
^ ^ ^ ^ ^ ^
Detect : Detect : Detect : Detect : Detect : Detect :
: : : : : :
^ ^ ^ ^ ^ /\ ^ ^ ^ ^ ^ /\
: : : : : / \ : : : : : / \
: : : : : /\ / \ : : : : : /\ / \
skipping to change at page 11, line 33 skipping to change at line 494
| |
Event Event
Figure 3: Event and Dependent Terms Figure 3: Event and Dependent Terms
Parallel to the workflow for Events, Figure 4 shows the workflow Parallel to the workflow for Events, Figure 4 shows the workflow
progress for States. As shown in Figure 2, Change noted at a progress for States. As shown in Figure 2, Change noted at a
particular time gives rise to State. The State may be deemed to have particular time gives rise to State. The State may be deemed to have
Relevance considering policy, relative to a specific perspective, Relevance considering policy, relative to a specific perspective,
with a view to intent, and in relation to other Events, States, and with a view to intent, and in relation to other Events, States, and
Values. A Relevant State may be deemed a Problem, or may indicate a Values. A Relevant State may be deemed a Problem, or it may indicate
Problem or potential Problem. a Problem or potential Problem.
Problems may be considered based on Symptoms and may map directly or Problems may be considered based on Symptoms and may map directly or
indirectly to Causes. An Incident results from one or more Problems. indirectly to Causes. An Incident results from one or more Problems.
An Alarm may be raised as the result of a Problem, and the transition An Alarm may be raised as the result of a Problem, and the transition
to an Alarmed state may give rise to an Alert. to an Alarmed state may give rise to an Alert.
Alarm - - -> Alert Alarm - - -> Alert
^ ^
| ------> Incident | ------> Incident
| | | |
skipping to change at page 12, line 26 skipping to change at line 523
| |
State State
Figure 4: State and Dependent Terms Figure 4: State and Dependent Terms
Figure 5 shows how Faults and Problems may be consolidated to Figure 5 shows how Faults and Problems may be consolidated to
determine the Causes. The arrows show how one item may give rise to determine the Causes. The arrows show how one item may give rise to
another. another.
A Cause can be indicated by or determined from Faults, Problems, and A Cause can be indicated by or determined from Faults, Problems, and
Symptoms. It may be that one Cause points to another, and can also Symptoms. It may be that one Cause points to another, and it can
be considered as a Symptom. The determination of Causes can consider also be considered as a Symptom. The determination of Causes can
multiple inputs. An Incident results from one or more Problems. consider multiple inputs. An Incident results from one or more
Problems.
--------- ---------
------------- | | ------------- | |
| ----------> | Symptom | | ----------> | Symptom |
| | | | | | | |
| | --------- | | ---------
v | ^ v | ^
--------- | --------- |
------->| Cause |<--------- | ------->| Cause |<--------- |
| --------- | | | --------- | |
skipping to change at page 13, line 13 skipping to change at line 555
Figure 5: Consolidation of Symptoms and Causes Figure 5: Consolidation of Symptoms and Causes
Figure 6 shows how thresholds are important in the consideration of Figure 6 shows how thresholds are important in the consideration of
analog Values and Events. The arrows in the figure show how one item analog Values and Events. The arrows in the figure show how one item
may give rise to or utilize another. The use of threshold-driven may give rise to or utilize another. The use of threshold-driven
Events and States (and the Alerts that they might give rise to) must Events and States (and the Alerts that they might give rise to) must
be treated with caution to dampen any "flapping" (so that consistent be treated with caution to dampen any "flapping" (so that consistent
States may be observed) and to avoid overwhelming management States may be observed) and to avoid overwhelming management
processes or systems. Analog Values may be read or notified from the processes or systems. Analog Values may be read or notified from the
Resource and could transition a threshold, be deemed Relevant Values, Resource and could transition a threshold, be deemed Relevant Values,
or evaluated over time. Events may be counted, and the Count may or be evaluated over time. Events may be counted, and the Count may
cross a threshold or reach a Relevant Value. cross a threshold or reach a Relevant Value.
The Threshold Process may be implementation-specific and subject to The Threshold Process may be implementation specific and subject to
policies. When a threshold is crossed and any other conditions are policies. When a threshold is crossed and any other conditions are
matched, an Event may be determined, and treated like any other matched, an Event may be determined and may be treated like any other
Event. Event.
Occurrence Occurrence
^ ^
| |
|---------------------> State |---------------------> State
| |
| ------- Relevance | ------- Relevance
|------>| Count |-----------------------------> Value |------>| Count |-----------------------------> Value
| ------- | ^ | ------- | ^
skipping to change at page 14, line 13 skipping to change at line 599
Figure 6: Counts, Thresholds, and Values Figure 6: Counts, Thresholds, and Values
5. Security Considerations 5. Security Considerations
This document specifies terminology and has no direct effect on the This document specifies terminology and has no direct effect on the
security of implementations or deployments. However, protocol security of implementations or deployments. However, protocol
solutions and management models need to be aware of several aspects: solutions and management models need to be aware of several aspects:
* The exposure of information pertaining to Faults and Problems may * The exposure of information pertaining to Faults and Problems may
make available knowledge of the internal workings of a network (in make available knowledge of the internal workings of a network (in
particular its vulnerabilities) that may be of use to an attacker. particular, its vulnerabilities) that may be of use to an
attacker.
* Systems that generate management information (messages, * Systems that generate management information (messages,
notifications, etc.) when Faults occur, may be attacked by causing notifications, etc.) when Faults occur may be attacked by causing
them to generate so much information that the system that manages them to generate so much information that the system that manages
the network is swamped and unable to properly manage the network. the network is swamped and unable to properly manage the network.
* Reporting false information about Faults (or masking reports of * Reporting false information about Faults (or masking reports of
Faults) may cause the system that manages the network to function Faults) may cause the system that manages the network to function
incorrectly. incorrectly.
6. Privacy Considerations 6. Privacy Considerations
Network fault and problem management should preserve user privacy by Network fault and problem management should preserve user privacy by
not exposing user data or information about end-user activities. not exposing user data or information about end-user activities.
Network Telemetry involves observing network traffic and collecting Network Telemetry involves observing network traffic and collecting
operational data from the network, while Network Monitoring is the operational data from the network, while Network Monitoring is the
process of keeping records of data gathered in Network Telemetry. process of keeping records of data gathered in Network Telemetry.
Therefore, it is possible that the data observed and collected Therefore, it is possible that the data observed and collected
includes users' privacy information. Such information must be includes users' privacy information. Such information must be
protected and controlled to avoid exposure to unauthorised parties. protected and controlled to avoid exposure to unauthorized parties.
Particular care may need to be exercised over stores of such Particular care may need to be exercised over stores of such
information which might be accessed at any time (including far into information that might be accessed at any time (including far into
the future). the future).
Additionally, a network operator will be concerned to keep control of Additionally, a network operator will be concerned about keeping
all information about Faults to protect their own privacy and the control of all information about Faults to protect their own privacy
details of how they operate their network. and the details of how they operate their network.
7. IANA Considerations 7. IANA Considerations
This document makes no requests for IANA action. This document has no IANA actions.
Acknowledgments
The authors would like to thank Med Boucadair, Wanting Du, Joe
Clarke, Javier Antich, Benoit Claise, Christopher Janz, Sherif
Mostafa, Kristian Larsson, Dirk Hugo, Carsten Bormann, Hilarie Orman,
Stewart Bryant, Bo Wu, Paul Kyzivat, Jouni Korhonen, Reshad Rahman,
Rob Wilton, Mahesh Jethanandani, Tim Bray, Paul Aitken, and Deb
Cooley for their helpful comments.
Special thanks to the team that met at a side meeting at IETF-120 to
discuss some of the thorny issues:
* Benoit Claise
* Watson Ladd
* Brad Peters
* Bo Wu
* Georgios Karagiannis
* Olga Havel
* Vincenzo Riccobene
* Yi Lin
* Jie Dong
* Aihua Guo
* Thomas Graf
* Qin Wu
* Chaode Yu
* Adrian Farrel
Informative References 8. Informative References
[I-D.ietf-nmop-network-anomaly-architecture] [Net-Anomaly-Arch]
Graf, T., Du, W., Francois, P., and A. H. Feng, "A Graf, T., Du, W., Francois, P., and A. Huang Feng, "A
Framework for a Network Anomaly Detection Architecture", Framework for a Network Anomaly Detection Architecture",
Work in Progress, Internet-Draft, draft-ietf-nmop-network- Work in Progress, Internet-Draft, draft-ietf-nmop-network-
anomaly-architecture-04, 4 July 2025, anomaly-architecture-06, 21 November 2025,
<https://datatracker.ietf.org/doc/html/draft-ietf-nmop- <https://datatracker.ietf.org/doc/html/draft-ietf-nmop-
network-anomaly-architecture-04>. network-anomaly-architecture-06>.
[I-D.ietf-nmop-network-incident-yang] [Net-Incident-Mgmt-YANG]
Hu, T., Contreras, L. M., Wu, Q., Davis, N., and C. Feng, Hu, T., Contreras, L. M., Wu, Q., Davis, N., and C. Feng,
"A YANG Data Model for Network Incident Management", Work "A YANG Data Model for Network Incident Management", Work
in Progress, Internet-Draft, draft-ietf-nmop-network- in Progress, Internet-Draft, draft-ietf-nmop-network-
incident-yang-05, 6 July 2025, incident-yang-08, 13 February 2026,
<https://datatracker.ietf.org/doc/html/draft-ietf-nmop- <https://datatracker.ietf.org/doc/html/draft-ietf-nmop-
network-incident-yang-05>. network-incident-yang-08>.
[RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management
Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877,
September 2004, <https://www.rfc-editor.org/info/rfc3877>. September 2004, <https://www.rfc-editor.org/info/rfc3877>.
[RFC6632] Ersue, M., Ed. and B. Claise, "An Overview of the IETF [RFC6632] Ersue, M., Ed. and B. Claise, "An Overview of the IETF
Network Management Standards", RFC 6632, Network Management Standards", RFC 6632,
DOI 10.17487/RFC6632, June 2012, DOI 10.17487/RFC6632, June 2012,
<https://www.rfc-editor.org/info/rfc6632>. <https://www.rfc-editor.org/info/rfc6632>.
skipping to change at page 16, line 34 skipping to change at line 685
[RFC9315] Clemm, A., Ciavaglia, L., Granville, L. Z., and J. [RFC9315] Clemm, A., Ciavaglia, L., Granville, L. Z., and J.
Tantsura, "Intent-Based Networking - Concepts and Tantsura, "Intent-Based Networking - Concepts and
Definitions", RFC 9315, DOI 10.17487/RFC9315, October Definitions", RFC 9315, DOI 10.17487/RFC9315, October
2022, <https://www.rfc-editor.org/info/rfc9315>. 2022, <https://www.rfc-editor.org/info/rfc9315>.
[RFC9417] Claise, B., Quilbeuf, J., Lopez, D., Voyer, D., and T. [RFC9417] Claise, B., Quilbeuf, J., Lopez, D., Voyer, D., and T.
Arumugam, "Service Assurance for Intent-Based Networking Arumugam, "Service Assurance for Intent-Based Networking
Architecture", RFC 9417, DOI 10.17487/RFC9417, July 2023, Architecture", RFC 9417, DOI 10.17487/RFC9417, July 2023,
<https://www.rfc-editor.org/info/rfc9417>. <https://www.rfc-editor.org/info/rfc9417>.
Acknowledgments
The authors would like to thank Med Boucadair, Wanting Du, Joe
Clarke, Javier Antich, Benoit Claise, Christopher Janz, Sherif
Mostafa, Kristian Larsson, Dirk Hugo, Carsten Bormann, Hilarie Orman,
Stewart Bryant, Bo Wu, Paul Kyzivat, Jouni Korhonen, Reshad Rahman,
Rob Wilton, Mahesh Jethanandani, Tim Bray, Paul Aitken, and Deb
Cooley for their helpful comments.
Special thanks to the team that met at a side meeting at IETF 120 to
discuss some of the thorny issues:
* Benoit Claise
* Watson Ladd
* Brad Peters
* Bo Wu
* Georgios Karagiannis
* Olga Havel
* Vincenzo Riccobene
* Yi Lin
* Jie Dong
* Aihua Guo
* Thomas Graf
* Qin Wu
* Chaode Yu
* Adrian Farrel
Authors' Addresses Authors' Addresses
Nigel Davis (editor) Nigel Davis (editor)
Ciena Ciena
United Kingdom United Kingdom
Email: ndavis@ciena.com Email: ndavis@ciena.com
Adrian Farrel (editor) Adrian Farrel (editor)
Old Dog Consulting Old Dog Consulting
United Kingdom United Kingdom
 End of changes. 63 change blocks. 
180 lines changed or deleted 180 lines changed or added

This html diff was produced by rfcdiff 1.48.