| rfc9940.original | rfc9940.txt | |||
|---|---|---|---|---|
| Network Working Group N. Davis, Ed. | Internet Engineering Task Force (IETF) N. Davis, Ed. | |||
| Internet-Draft Ciena | Request for Comments: 9940 Ciena | |||
| Intended status: Informational A. Farrel, Ed. | Category: Informational A. Farrel, Ed. | |||
| Expires: 19 February 2026 Old Dog Consulting | ISSN: 2070-1721 Old Dog Consulting | |||
| T. Graf | T. Graf | |||
| Swisscom | Swisscom | |||
| Q. Wu | Q. Wu | |||
| Huawei | Huawei | |||
| C. Yu | C. Yu | |||
| Huawei Technologies | Huawei Technologies | |||
| 18 August 2025 | February 2026 | |||
| Some Key Terms for Network Fault and Problem Management | Some Key Terms for Network Fault and Problem Management | |||
| draft-ietf-nmop-terminology-23 | ||||
| Abstract | Abstract | |||
| This document sets out some terms that are fundamental to a common | This document sets out some terms that are fundamental to a common | |||
| understanding of network fault and problem management within the | understanding of network fault and problem management within the | |||
| IETF. | IETF. | |||
| The purpose of this document is to bring clarity to discussions and | The purpose of this document is to bring clarity to discussions and | |||
| other work related to network fault and problem management, in | other work related to network fault and problem management -- in | |||
| particular to YANG data models and management protocols that report, | particular, to YANG data models and management protocols that report, | |||
| make visible, or manage network faults and problems. | make visible, or manage network faults and problems. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
| provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Not all documents | |||
| approved by the IESG are candidates for any level of Internet | ||||
| Standard; see Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 19 February 2026. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9940. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2025 IETF Trust and the persons identified as the | Copyright (c) 2026 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
| 2. Usage of Terms . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Usage of Terms | |||
| 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 3. Terminology | |||
| 3.1. Context Terminology . . . . . . . . . . . . . . . . . . . 4 | 3.1. Context Terminology | |||
| 3.2. Core Terms . . . . . . . . . . . . . . . . . . . . . . . 5 | 3.2. Core Terms | |||
| 3.3. Other Terms . . . . . . . . . . . . . . . . . . . . . . . 9 | 3.3. Other Terms | |||
| 4. Workflow Explanations . . . . . . . . . . . . . . . . . . . . 9 | 4. Workflow Explanations | |||
| 5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | 5. Security Considerations | |||
| 6. Privacy Considerations . . . . . . . . . . . . . . . . . . . 14 | 6. Privacy Considerations | |||
| 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 | 7. IANA Considerations | |||
| Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 8. Informative References | |||
| Informative References . . . . . . . . . . . . . . . . . . . . . 15 | Acknowledgments | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 | Authors' Addresses | |||
| 1. Introduction | 1. Introduction | |||
| Successful operation of large networks depends on effective network | Successful operation of large networks depends on effective network | |||
| management. This requires a virtuous circle of network control, | management. This requires a virtuous circle of network control, | |||
| network observability, network analytics, network assurance, and back | network observability, network analytics, network assurance, and back | |||
| to network control. Network fault and problem management [RFC6632] | to network control. Network fault and problem management [RFC6632] | |||
| is an important aspect of network management and control solutions. | is an important aspect of network management and control solutions. | |||
| It deals with the detection, reporting, inspection, isolation, | It deals with the detection, reporting, inspection, isolation, | |||
| correlation, and management of events within the network. The | correlation, and management of events within the network. The | |||
| skipping to change at page 3, line 7 ¶ | skipping to change at line 94 ¶ | |||
| negative effect on the network's ability to forward traffic according | negative effect on the network's ability to forward traffic according | |||
| to expected behavior and so deliver services, the ability to control | to expected behavior and so deliver services, the ability to control | |||
| and operate the network, and other faults that reduce the quality or | and operate the network, and other faults that reduce the quality or | |||
| reliability of the delivered service. The concept of fault and | reliability of the delivered service. The concept of fault and | |||
| problem management extends to include actions taken to determine the | problem management extends to include actions taken to determine the | |||
| causes of problems and to work toward recovery of expected network | causes of problems and to work toward recovery of expected network | |||
| behavior. | behavior. | |||
| A number of work efforts within the IETF seek to provide components | A number of work efforts within the IETF seek to provide components | |||
| of a fault management system, such as YANG data models or management | of a fault management system, such as YANG data models or management | |||
| protocols. It is important that a common terminology is used so that | protocols. It is important that a common terminology be used so that | |||
| there is a clear understanding of how the elements of the management | there is a clear understanding of how the elements of the management | |||
| and control solutions fit together, and how faults and problems will | and control solutions fit together and how faults and problems will | |||
| be handled. | be handled. | |||
| This document sets out some terms that are fundamental to a common | This document sets out some terms that are fundamental to a common | |||
| understanding of network fault and problem management. While | understanding of network fault and problem management. While | |||
| "faults" and "problems" are concepts that apply at all levels of | "faults" and "problems" are concepts that apply at all levels of | |||
| technology in the Internet, the scope of this document is restricted | technology in the Internet, the scope of this document is restricted | |||
| to the network layer and below, hence this document is specifically | to the network layer and below; hence, this document is specifically | |||
| about "network fault and problem management." The concept of | about "network fault and problem management." The concept of | |||
| "incidents" is also touched on in this document, where an incident | "incidents" is also touched on in this document, where an incident | |||
| results from one or more problems and is the disruption of a network | results from one or more problems and is the disruption of a network | |||
| service. | service. | |||
| Note that some useful terms are defined in [RFC3877] and [RFC8632]. | Note that some useful terms are defined in [RFC3877] and [RFC8632]. | |||
| The definitions in this document are informed by those documents, but | The definitions in this document are informed by those documents, but | |||
| they are not dependent on that prior work. | they are not dependent on that prior work. | |||
| 2. Usage of Terms | 2. Usage of Terms | |||
| The terms defined in this document are intended for consistent use | The terms defined in this document are intended for consistent use | |||
| within the IETF in the scope of network fault and problem management. | within the IETF in the scope of network fault and problem management. | |||
| Where similar concepts are described in other bodies, an attempt has | Where similar concepts are described in other bodies, an attempt has | |||
| been made to harmonize with those other descriptions, but there is | been made to harmonize with those other descriptions, but care is | |||
| care needed where terms are not used consistently between bodies or | needed where terms are not used consistently between bodies or where | |||
| where terms are applied outside the network layer. If other bodies | terms are applied outside the network layer. If other bodies find | |||
| find the terminology defined in this document useful, they are free | the terminology defined in this document useful, they are free to use | |||
| to use it. | it. | |||
| The purpose of this document is to define the following terms for use | The purpose of this document is to define the following terms for use | |||
| in other documents. Other terms are defined to enable those | in other documents. Other terms are defined to enable those | |||
| definitions and may also be used by other documents, although that is | definitions and may also be used by other documents, although that is | |||
| not the principal purpose of their definitions here. | not the principal purpose of their definitions here. | |||
| * Event | * Event | |||
| * State | * State | |||
| * Fault | * Fault | |||
| * Problem | * Problem | |||
| skipping to change at page 4, line 4 ¶ | skipping to change at line 137 ¶ | |||
| not the principal purpose of their definitions here. | not the principal purpose of their definitions here. | |||
| * Event | * Event | |||
| * State | * State | |||
| * Fault | * Fault | |||
| * Problem | * Problem | |||
| * Symptom | * Symptom | |||
| * Cause | * Cause | |||
| * Alert | * Alert | |||
| * Alarm | * Alarm | |||
| When other documents make use of the terms as defined in this | When other documents make use of the terms as defined in this | |||
| document, it is suggested here that such uses should use | document, it is suggested here that such uses should use | |||
| capitalization of the terms as in this document to help distinguish | capitalization of the terms as in this document to help distinguish | |||
| them from colloquial uses, and should include an early section | them from colloquial uses and should include an early section listing | |||
| listing the terms inherited from this document with a citation. | the terms inherited from this document with a citation. | |||
| 3. Terminology | 3. Terminology | |||
| This section contains key terms. It is split into three subsections. | This section contains key terms. It is split into three subsections. | |||
| * Section 3.1 contains terms that help to set the context for | * Section 3.1 contains terms that help set the context for network | |||
| network fault and problem management systems. | fault and problem management systems. | |||
| * Section 3.2 includes specific and detailed core terms that will be | * Section 3.2 includes specific and detailed core terms that will be | |||
| used in other documents that describe elements of the network | used in other documents that describe elements of the network | |||
| fault and problem management systems. | fault and problem management systems. | |||
| * Section 3.3 provides three further terms that may be helpful. | * Section 3.3 provides three further terms that may be helpful. | |||
| 3.1. Context Terminology | 3.1. Context Terminology | |||
| This section includes some terminology that helps describe the | This section includes some terminology that helps describe the | |||
| context for the rest of this work. The terms may be viewed as a | context for the rest of this work. The terms may be viewed as a | |||
| cascaded sequence of processes, starting with Network Telemetry and | cascaded sequence of processes, starting with Network Telemetry and | |||
| building to Network Observability. The definitions are deliberately | building to Network Observability. The definitions are deliberately | |||
| kept relatively terse. Further documents may expand on these terms | kept relatively terse. Further documents may expand on these terms | |||
| without loss of specificity. Such contextualization (if any) should | without loss of specificity. Such contextualization (if any) should | |||
| be highlighted clearly in those documents. | be highlighted clearly in those documents. | |||
| Network Telemetry: This is defined in [RFC9232] and describes the | Network Telemetry: This is defined in [RFC9232] and describes the | |||
| process of collecting operational network data categorized | process of collecting operational network data categorized | |||
| according to the network plane (e.g., layer 3, layer 2, and layer | according to the network plane (e.g., Layer 3, Layer 2, and Layer | |||
| 1) from which it was derived. Data collected through the Network | 1) from which it was derived. Data collected through the Network | |||
| Telemetry process does not contain any data related to service | Telemetry process does not contain any data related to service | |||
| definitions (i.e., "intent" per Section 3.1 of [RFC9315]). | definitions (i.e., "intent" per Section 3.1 of [RFC9315]). | |||
| Network Monitoring: This is the process of keeping a continuous | Network Monitoring: This is the process of keeping a continuous | |||
| record of functions related to a network topology. It involves | record of functions related to a network topology. It involves | |||
| tracking various aspects such as traffic patterns, device health, | tracking various aspects such as traffic patterns, device health, | |||
| performance metrics, and overall network behaviour. This approach | performance metrics, and overall network behavior. This approach | |||
| differentiates network monitoring from resource or device | differentiates network monitoring from resource or device | |||
| monitoring, which focuses on individual components or resources | monitoring, which focuses on individual resources or components | |||
| (Section 3.2). | (Section 3.2). | |||
| Network Analytics: This is the process of deriving analytical | Network Analytics: This is the process of deriving analytical | |||
| insights from operational network data. A process could be | insights from operational network data. A process could be | |||
| executed by a piece of software, a system, or a human that | executed by a piece of software, a system, or a human that | |||
| analyzes operational data and outputs new analytical data related | analyzes operational data and outputs new analytical data related | |||
| to the operational data, for example, a symptom. | to the operational data -- for example, a symptom. | |||
| Network Observability: This is the process of enabling network | Network Observability: This is the process of enabling network | |||
| behavioral assessment through analysis of observed operational | behavioral assessment through analysis of observed operational | |||
| network data (logs, alarms, traces, etc.) with the aim of | network data (logs, alarms, traces, etc.) with the aim of | |||
| detecting symptoms of network behavior, and to identify anomalies | detecting symptoms of network behavior, and to identify anomalies | |||
| and their causes. Network Observability begins with information | and their causes. Network Observability begins with information | |||
| gathered using Network Monitoring tools and that may be further | gathered using Network Monitoring tools and that may be further | |||
| enriched with other operational data. The expected outcome of the | enriched with other operational data. The expected outcome of the | |||
| observability processes is identification and analysis of | observability processes is identification and analysis of | |||
| deviations in observed state versus the expected state of a | deviations in observed state versus the expected state of a | |||
| skipping to change at page 5, line 37 ¶ | skipping to change at line 216 ¶ | |||
| data gathered in Network Telemetry. | data gathered in Network Telemetry. | |||
| * Network Analytics is the process of deriving insight through the | * Network Analytics is the process of deriving insight through the | |||
| data recorded in Network Monitoring. | data recorded in Network Monitoring. | |||
| * Network Observability is the process of enabling behavioral | * Network Observability is the process of enabling behavioral | |||
| assessment of a network through Network Analytics. | assessment of a network through Network Analytics. | |||
| 3.2. Core Terms | 3.2. Core Terms | |||
| The terms are presented below in an order that is intended to flow | The terms in this section are presented in an order that is intended | |||
| such that it is possible to gain understanding reading top to bottom. | to flow such that it is possible to gain understanding reading top to | |||
| The figures and explanations in Section 4 may aid understanding the | bottom. The figures and explanations in Section 4 may aid | |||
| terms set out here. | understanding the terms set out here. | |||
| Resource: An element of a network system. | Resource: An element of a network system. | |||
| Resource is a recursive concept so that a Resource may be a | * Resource is a recursive concept so that a Resource may be a | |||
| collection of other Resources (for example, a network node | collection of other Resources (for example, a network node | |||
| comprises a collection of network interfaces). | comprises a collection of network interfaces). | |||
| Characteristic: Observable or measurable aspect or behavior | Characteristic: Observable or measurable aspect or behavior | |||
| associated with a Resource. | associated with a Resource. | |||
| * A Characteristic may be considered to be built on facts (see | * A Characteristic may be considered to be built on facts (see | |||
| 'Value', below) and the contexts and descriptors that identify | 'Value', below) and the contexts and descriptors that identify | |||
| and give meaning to the facts. | and give meaning to the facts. | |||
| * The term "Metric" [RFC9417] is another word for a measurable | * The term "Metric" [RFC9417] is another word for a measurable | |||
| Characteristic which may also be thought of as analogous to a | Characteristic which may also be thought of as analogous to a | |||
| 'variable'. | 'variable'. | |||
| Value: A Value is a measure of a Characteristic associated with a | Value: A measure of a Characteristic associated with a Resource. It | |||
| Resource. It may be in the form of a categorization (e.g., high | may be in the form of a categorization (e.g., high or low), an | |||
| or low), an integer (e.g., a count or gauge), or a reading of a | integer (e.g., a count or gauge), or a reading of a continuous | |||
| continuous variable (e.g., an analog measurement), etc. | variable (e.g., an analog measurement), etc. | |||
| Change: In the context of Network Monitoring, a Change is the | Change: In the context of Network Monitoring, the variation in the | |||
| variation in the Value of a Characteristic associated with a | Value of a Characteristic associated with a Resource. A Change | |||
| Resource and may arise over a period of time. | may arise over a period of time. | |||
| * Not all Changes are noteworthy (i.e., they do not have | * Not all Changes are noteworthy (i.e., they do not have | |||
| Relevance). | Relevance). | |||
| * Perception of Change depends upon Detection, the sampling | * Perception of Change depends upon Detection, the sampling | |||
| rate/accuracy/detail, and perspective. | rate/accuracy/detail, and perspective. | |||
| * It may be helpful to qualify this as "Value Change" because the | * It may be helpful to qualify this as "Value Change" because the | |||
| English word "change" is often heavily used. | English word "change" is often heavily used. | |||
| Event: The variation in Value of a Characteristic of a Resource at a | Event: The variation in Value of a Characteristic of a Resource at a | |||
| distinct moment in time (i.e., the period is negligible). | distinct moment in time (i.e., the period is negligible). | |||
| * Compared with a Change, which may be over a period of time, an | * Compared with a Change, which may be over a period of time, an | |||
| Event happens at a distinct moment in time. Thus, an Event may | Event happens at a distinct moment in time. Thus, an Event may | |||
| be the observation of a Change. | be the observation of a Change. | |||
| Condition: A Condition is an interpretation of the Values of a set | Condition: An interpretation of the Values of a set of one or more | |||
| of one or more Characteristics of a Resource (with respect to | Characteristics of a Resource (with respect to working order or | |||
| working order or some other aspect relevant to the Resource | some other aspect relevant to the Resource purpose/application) -- | |||
| purpose/application), for example "low available memory". Thus, | for example, "low available memory". Thus, it is the output of a | |||
| it is the output of a function applied to a set of one or more | function applied to a set of one or more variables. | |||
| variables. | ||||
| State: A particular Condition that a Resource has (i.e., it is in a | State: A particular Condition that a Resource has (i.e., it is in a | |||
| State) at a specific time. For example, a router may report the | State) at a specific time. For example, a router may report the | |||
| total amount of memory it has, and how much is free. These are | total amount of memory it has and how much is free. These are the | |||
| the Values of two Characteristics of a Resource. These Values can | Values of two Characteristics of a Resource. These Values can be | |||
| be interpreted to determine the Condition of the Resource, and | interpreted to determine the Condition of the Resource, and that | |||
| that may determine the State of the router, such as shortage of | may determine the State of the router, such as shortage of memory. | |||
| memory. | ||||
| * While a State may be observed at a specific moment in time, it | * While a State may be observed at a specific moment in time, it | |||
| is actually determined by summarizing measurement over time in | is actually determined by summarizing measurement over time in | |||
| a process sometimes called State compression. | a process sometimes called State compression. | |||
| * It may be helpful to qualify this as "Resource State" to make | * It may be helpful to qualify this as "Resource State" to make | |||
| clear the distinction between this and other uses of "state" | clear the distinction between this and other uses of "state" | |||
| such as "protocol state". | such as "protocol state". | |||
| * This term may be contrasted with "Operational State" as used in | * This term may be contrasted with "Operational State" as used in | |||
| [RFC8342]. For example, the state of a link might be up/down/ | [RFC8342]. For example, the state of a link might be up/down/ | |||
| degraded, but the operational state of link would include a | degraded, but the operational state of the link would include a | |||
| collection of Values of Characteristics of the link. | collection of Values of Characteristics of the link. | |||
| Detect (hence Detected, Detection): To notice the presence of | Detect (hence Detected, Detection): To notice the presence of | |||
| something (State, Change, Event, activity, etc.). | something (State, Change, Event, activity, etc.) and hence also to | |||
| notice a Change (from the perspective of an observer such as a | ||||
| * Hence also to notice a Change (from the perspective of an | monitoring system). | |||
| observer such as a monitoring system). | ||||
| Relevance: Consideration of an Event, State, or Value (through the | Relevance: Consideration of an Event, State, or Value (through the | |||
| application of policy, relative to a specific perspective, intent, | application of policy, relative to a specific perspective, intent, | |||
| and in relation to other Events, States, and Values) to determine | and in relation to other Events, States, and Values) to determine | |||
| whether it is of note to the system that controls or manages the | whether it is of note to the system that controls or manages the | |||
| network. Note, for example, that not all Changes are Relevant. | network. Note, for example, that not all Changes are Relevant. | |||
| * This term may also be used as "Relevant Event", "Relevant | * This term may also be used as "Relevant Event", "Relevant | |||
| State", or "Relevant Value". | State", or "Relevant Value". | |||
| Occurrence: A Relevant Event or a particular Relevant Change. | Occurrence: A Relevant Event or a particular Relevant Change. | |||
| * An Occurrence may be an aggregation or abstraction of multiple | * An Occurrence may be an aggregation or abstraction of multiple | |||
| fine-grain Events or Changes. | fine-grained Events or Changes. | |||
| * An Occurrence may occur at any macro or micro scale because | * An Occurrence may occur at any macro or micro scale because | |||
| Resources are a recursive concept, and may be perceived | Resources are a recursive concept, and may be perceived, | |||
| depending on the scope of observation (i.e., according to the | depending on the scope of observation (i.e., according to the | |||
| level of Resource recursion that is examined). That is, | level of Resource recursion that is examined). That is, | |||
| Occurrences, themselves are a recursive concept. | Occurrences, themselves, are a recursive concept. | |||
| Fault: An Occurrence (i.e., an Event or a Change) that is not | Fault: An Occurrence (i.e., an Event or a Change) that is not | |||
| desired/required (as it may be indicative of a current or future | desired/required (as it may be indicative of a current or future | |||
| undesired State). Thus, a Fault happens at a moment in time. A | undesired State). Thus, a Fault happens at a moment in time. A | |||
| Fault can potentially be associated with a Cause. See [RFC8632] | Fault can potentially be associated with a Cause. See [RFC8632] | |||
| for a more detailed discussion of network faults. | for a more detailed discussion of network faults. | |||
| * Note that there is a distinction between a Fault and a Problem | * Note that there is a distinction between a Fault and a Problem | |||
| that depends on context. For example, in a connectivity | that depends on context. For example, in a connectivity | |||
| service where redundancy is present, a link down is a Problem, | service where redundancy is present, a link down is a Problem, | |||
| skipping to change at page 8, line 23 ¶ | skipping to change at line 342 ¶ | |||
| * Note that there is a historic aspect to the concept of a | * Note that there is a historic aspect to the concept of a | |||
| Problem. The current State may be operational, but there could | Problem. The current State may be operational, but there could | |||
| have been a Fault that is unexplained, and the fact of that | have been a Fault that is unexplained, and the fact of that | |||
| unexplained recent Fault is a Problem. | unexplained recent Fault is a Problem. | |||
| * Note that while a Problem is unresolved it may continue to | * Note that while a Problem is unresolved it may continue to | |||
| require attention. A record of resolved Problems may be | require attention. A record of resolved Problems may be | |||
| maintained in a log. | maintained in a log. | |||
| * Note that there may be a State which is considered to be a | * Note that there may be a State that is considered to be a | |||
| Problem from several perspectives. For example, consider a | Problem from several perspectives. For example, consider a | |||
| "loss of light" State that may cause multiple services to fail. | "loss of light" State that may cause multiple services to fail. | |||
| In this example, a new State (the light recovers) may cause the | In this example, a new State (the light recovers) may cause the | |||
| Problem to be resolved from one perspective (the services are | Problem to be resolved from one perspective (the services are | |||
| operational once more), but may leave the Problem as unresolved | operational once more) but may leave the Problem as unresolved | |||
| (because the loss of light has not been explained). Further, | (because the loss of light has not been explained). Further, | |||
| in this example, there could be another development (the reason | in this example, there could be another development (the reason | |||
| for the temporary loss of light is traced to a microbend in the | for the temporary loss of light is traced to a microbend in the | |||
| fiber that is repaired) resulting in that unresolved Problem | fiber that is repaired) resulting in that unresolved Problem | |||
| now being resolved. But, in this example, this still leaves a | now being resolved. But, in this example, this still leaves a | |||
| further Problem unresolved (a microbend occurred, and that | further Problem unresolved (a microbend occurred, and that | |||
| Problem is not resolved until it is understood how it occurred | Problem is not resolved until it is understood how it occurred | |||
| and a remedy is put in place to prevent recurrence). | and a remedy is put in place to prevent recurrence). | |||
| Cause: The Events (Detected or otherwise) that gave rise to a Fault/ | Cause: The Events (Detected or otherwise) that gave rise to a Fault/ | |||
| Problem. | Problem. | |||
| Incident: A (Network) Incident is an undesired Occurrence such as an | Incident: Also referred to as "Network Incident". An Incident is an | |||
| unexpected interruption of a network service, degradation of the | undesired Occurrence such as an unexpected interruption of a | |||
| quality of a network service, or the below-target performance of a | network service, degradation of the quality of a network service, | |||
| network service. An Incident results from one or more Problems, | or the below-target performance of a network service. An Incident | |||
| and a Problem may give rise to or contribute to one or more | results from one or more Problems, and a Problem may give rise to | |||
| Incidents. Greater discussion of Network Incident relationships, | or contribute to one or more Incidents. Greater discussion of | |||
| including Customer Incidents and Incident management, can be found | Network Incident relationships, including Customer Incidents and | |||
| in [I-D.ietf-nmop-network-incident-yang]. | Incident management, can be found in [Net-Incident-Mgmt-YANG]. | |||
| Symptom: An observable Value, Change, State, Event, or Condition | Symptom: An observable Value, Change, State, Event, or Condition | |||
| considered as an indication of a Problem or potential Problem. | considered as an indication of a Problem or potential Problem. | |||
| Anomaly: A (Network) Anomaly is an unusual or unexpected Event or | Anomaly: Also referred to as "Network Anomaly". An Anomaly is an | |||
| pattern in network data in the forwarding plane, control plane, or | unusual or unexpected Event or pattern in network data in the | |||
| management plane that deviates from the normal, expected behavior. | forwarding plane, control plane, or management plane that deviates | |||
| See [I-D.ietf-nmop-network-anomaly-architecture] for more details. | from the normal, expected behavior. See [Net-Anomaly-Arch] for | |||
| more details. | ||||
| Alert: An indication of a Fault. | Alert: An indication of a Fault. | |||
| Alarm: As specified in [RFC8632], an Alarm signifies an undesirable | Alarm: As specified in [RFC8632], signifies an undesirable State in | |||
| State in a Resource that requires corrective action. From a | a Resource that requires corrective action. From a management | |||
| management point of view, an Alarm can be seen as a State in its | point of view, an Alarm can be seen as a State in its own right | |||
| own right and the transition to this State may result in an Alert | and the transition to this State may result in an Alert being | |||
| being issued. The receipt of this Alert may give rise to a | issued. The receipt of this Alert may give rise to a continuous | |||
| continuous indication (to a human operator) highlighting the | indication (to a human operator) highlighting the potential or | |||
| potential or actual presence of a Problem. | actual presence of a Problem. | |||
| 3.3. Other Terms | 3.3. Other Terms | |||
| Three other terms may be helpful: | Three other terms may be helpful: | |||
| Intermittent: A State that is not continuous, but keeps recurring in | Intermittent: A State that is not continuous but that keeps | |||
| some time frame. | recurring in some time frame. | |||
| Transient: A State that is not continuous, and occurs once in some | Transient: A State that is not continuous and that occurs once in | |||
| time frame. | some time frame. | |||
| Recurrent: A Problem that is actively resolved, but returns. | Recurrent: A Problem that is actively resolved but returns. | |||
| 4. Workflow Explanations | 4. Workflow Explanations | |||
| This section aims to add information about the relationship between | This section aims to add information about the relationship between | |||
| the terms defined in Section 3.2 in the context of network fault and | the terms defined in Section 3.2 in the context of network fault and | |||
| problem management. The text and figures here are for explanation | problem management. The text and figures here are for explanation | |||
| and are not normative for the definition of terms. | and are not normative for the definition of terms. | |||
| The relationship between Resources and Characteristics is shown in | The relationship between Resources and Characteristics is shown in | |||
| Figure 1. Note that there is a 1:n relationship between Network | Figure 1. Note that there is a 1:n relationship between a Network | |||
| system and Resources, and between Resources and Characteristics: this | system and Resources and between Resources and Characteristics: For | |||
| is not shown on the figure for clarity. | clarity, this is not shown in the figure. | |||
| Characteristics | Characteristics | |||
| ^ | ^ | |||
| | | | | |||
| Resources | Resources | |||
| ^ | ^ | |||
| | | | | |||
| Network system | Network system | |||
| Figure 1: Resources and Characteristics | Figure 1: Resources and Characteristics | |||
| The Value of a Characteristic of a Resource may change over time. | The Value of a Characteristic of a Resource may change over time. | |||
| Specific Changes in Value may be noticed at a specific time (as | Specific Changes in Value may be noticed at a specific time (as | |||
| digital Changes), Detected, and treated as Events. This is shown on | digital Changes), Detected, and treated as Events. This is shown on | |||
| the left of Figure 2. | the left-hand side of Figure 2. | |||
| The center of Figure 2 shows how the Value of a Characteristic may | The center of Figure 2 shows how the Value of a Characteristic may | |||
| change over time. The Value may be Detected at specific times or | change over time. The Value may be Detected at specific times or | |||
| periodically and give rise to Conditions that are States (and | periodically and give rise to Conditions that are States (and | |||
| consequently State Changes). | consequently State Changes). | |||
| In practice, the Characteristic may vary in an analog manner over | In practice, the Characteristic may vary in an analog manner over | |||
| time as shown on the right-hand side of Figure 2. The Value can be | time as shown on the right-hand side of Figure 2. The Value can be | |||
| read or reported (i.e., Detected) periodically leading to analog | read or reported (i.e., Detected) periodically leading to analog | |||
| Values that may be deemed Relevant Values, or may be evaluated over | Values that may be deemed Relevant Values, or it may be evaluated | |||
| time as shown in Figure 6. | over time as shown in Figure 6. | |||
| Event State Value | Event State Value | |||
| Condition | Condition | |||
| ^ ^ ^ | ^ ^ ^ | |||
| Detect : Detect : Detect : | Detect : Detect : Detect : | |||
| : : : | : : : | |||
| ^ ^ ^ ^ ^ /\ | ^ ^ ^ ^ ^ /\ | |||
| : : : : : / \ | : : : : : / \ | |||
| : : : : : /\ / \ | : : : : : /\ / \ | |||
| skipping to change at page 11, line 33 ¶ | skipping to change at line 494 ¶ | |||
| | | | | |||
| Event | Event | |||
| Figure 3: Event and Dependent Terms | Figure 3: Event and Dependent Terms | |||
| Parallel to the workflow for Events, Figure 4 shows the workflow | Parallel to the workflow for Events, Figure 4 shows the workflow | |||
| progress for States. As shown in Figure 2, Change noted at a | progress for States. As shown in Figure 2, Change noted at a | |||
| particular time gives rise to State. The State may be deemed to have | particular time gives rise to State. The State may be deemed to have | |||
| Relevance considering policy, relative to a specific perspective, | Relevance considering policy, relative to a specific perspective, | |||
| with a view to intent, and in relation to other Events, States, and | with a view to intent, and in relation to other Events, States, and | |||
| Values. A Relevant State may be deemed a Problem, or may indicate a | Values. A Relevant State may be deemed a Problem, or it may indicate | |||
| Problem or potential Problem. | a Problem or potential Problem. | |||
| Problems may be considered based on Symptoms and may map directly or | Problems may be considered based on Symptoms and may map directly or | |||
| indirectly to Causes. An Incident results from one or more Problems. | indirectly to Causes. An Incident results from one or more Problems. | |||
| An Alarm may be raised as the result of a Problem, and the transition | An Alarm may be raised as the result of a Problem, and the transition | |||
| to an Alarmed state may give rise to an Alert. | to an Alarmed state may give rise to an Alert. | |||
| Alarm - - -> Alert | Alarm - - -> Alert | |||
| ^ | ^ | |||
| | ------> Incident | | ------> Incident | |||
| | | | | | | |||
| skipping to change at page 12, line 26 ¶ | skipping to change at line 523 ¶ | |||
| | | | | |||
| State | State | |||
| Figure 4: State and Dependent Terms | Figure 4: State and Dependent Terms | |||
| Figure 5 shows how Faults and Problems may be consolidated to | Figure 5 shows how Faults and Problems may be consolidated to | |||
| determine the Causes. The arrows show how one item may give rise to | determine the Causes. The arrows show how one item may give rise to | |||
| another. | another. | |||
| A Cause can be indicated by or determined from Faults, Problems, and | A Cause can be indicated by or determined from Faults, Problems, and | |||
| Symptoms. It may be that one Cause points to another, and can also | Symptoms. It may be that one Cause points to another, and it can | |||
| be considered as a Symptom. The determination of Causes can consider | also be considered as a Symptom. The determination of Causes can | |||
| multiple inputs. An Incident results from one or more Problems. | consider multiple inputs. An Incident results from one or more | |||
| Problems. | ||||
| --------- | --------- | |||
| ------------- | | | ------------- | | | |||
| | ----------> | Symptom | | | ----------> | Symptom | | |||
| | | | | | | | | | | |||
| | | --------- | | | --------- | |||
| v | ^ | v | ^ | |||
| --------- | | --------- | | |||
| ------->| Cause |<--------- | | ------->| Cause |<--------- | | |||
| | --------- | | | | --------- | | | |||
| skipping to change at page 13, line 13 ¶ | skipping to change at line 555 ¶ | |||
| Figure 5: Consolidation of Symptoms and Causes | Figure 5: Consolidation of Symptoms and Causes | |||
| Figure 6 shows how thresholds are important in the consideration of | Figure 6 shows how thresholds are important in the consideration of | |||
| analog Values and Events. The arrows in the figure show how one item | analog Values and Events. The arrows in the figure show how one item | |||
| may give rise to or utilize another. The use of threshold-driven | may give rise to or utilize another. The use of threshold-driven | |||
| Events and States (and the Alerts that they might give rise to) must | Events and States (and the Alerts that they might give rise to) must | |||
| be treated with caution to dampen any "flapping" (so that consistent | be treated with caution to dampen any "flapping" (so that consistent | |||
| States may be observed) and to avoid overwhelming management | States may be observed) and to avoid overwhelming management | |||
| processes or systems. Analog Values may be read or notified from the | processes or systems. Analog Values may be read or notified from the | |||
| Resource and could transition a threshold, be deemed Relevant Values, | Resource and could transition a threshold, be deemed Relevant Values, | |||
| or evaluated over time. Events may be counted, and the Count may | or be evaluated over time. Events may be counted, and the Count may | |||
| cross a threshold or reach a Relevant Value. | cross a threshold or reach a Relevant Value. | |||
| The Threshold Process may be implementation-specific and subject to | The Threshold Process may be implementation specific and subject to | |||
| policies. When a threshold is crossed and any other conditions are | policies. When a threshold is crossed and any other conditions are | |||
| matched, an Event may be determined, and treated like any other | matched, an Event may be determined and may be treated like any other | |||
| Event. | Event. | |||
| Occurrence | Occurrence | |||
| ^ | ^ | |||
| | | | | |||
| |---------------------> State | |---------------------> State | |||
| | | | | |||
| | ------- Relevance | | ------- Relevance | |||
| |------>| Count |-----------------------------> Value | |------>| Count |-----------------------------> Value | |||
| | ------- | ^ | | ------- | ^ | |||
| skipping to change at page 14, line 13 ¶ | skipping to change at line 599 ¶ | |||
| Figure 6: Counts, Thresholds, and Values | Figure 6: Counts, Thresholds, and Values | |||
| 5. Security Considerations | 5. Security Considerations | |||
| This document specifies terminology and has no direct effect on the | This document specifies terminology and has no direct effect on the | |||
| security of implementations or deployments. However, protocol | security of implementations or deployments. However, protocol | |||
| solutions and management models need to be aware of several aspects: | solutions and management models need to be aware of several aspects: | |||
| * The exposure of information pertaining to Faults and Problems may | * The exposure of information pertaining to Faults and Problems may | |||
| make available knowledge of the internal workings of a network (in | make available knowledge of the internal workings of a network (in | |||
| particular its vulnerabilities) that may be of use to an attacker. | particular, its vulnerabilities) that may be of use to an | |||
| attacker. | ||||
| * Systems that generate management information (messages, | * Systems that generate management information (messages, | |||
| notifications, etc.) when Faults occur, may be attacked by causing | notifications, etc.) when Faults occur may be attacked by causing | |||
| them to generate so much information that the system that manages | them to generate so much information that the system that manages | |||
| the network is swamped and unable to properly manage the network. | the network is swamped and unable to properly manage the network. | |||
| * Reporting false information about Faults (or masking reports of | * Reporting false information about Faults (or masking reports of | |||
| Faults) may cause the system that manages the network to function | Faults) may cause the system that manages the network to function | |||
| incorrectly. | incorrectly. | |||
| 6. Privacy Considerations | 6. Privacy Considerations | |||
| Network fault and problem management should preserve user privacy by | Network fault and problem management should preserve user privacy by | |||
| not exposing user data or information about end-user activities. | not exposing user data or information about end-user activities. | |||
| Network Telemetry involves observing network traffic and collecting | Network Telemetry involves observing network traffic and collecting | |||
| operational data from the network, while Network Monitoring is the | operational data from the network, while Network Monitoring is the | |||
| process of keeping records of data gathered in Network Telemetry. | process of keeping records of data gathered in Network Telemetry. | |||
| Therefore, it is possible that the data observed and collected | Therefore, it is possible that the data observed and collected | |||
| includes users' privacy information. Such information must be | includes users' privacy information. Such information must be | |||
| protected and controlled to avoid exposure to unauthorised parties. | protected and controlled to avoid exposure to unauthorized parties. | |||
| Particular care may need to be exercised over stores of such | Particular care may need to be exercised over stores of such | |||
| information which might be accessed at any time (including far into | information that might be accessed at any time (including far into | |||
| the future). | the future). | |||
| Additionally, a network operator will be concerned to keep control of | Additionally, a network operator will be concerned about keeping | |||
| all information about Faults to protect their own privacy and the | control of all information about Faults to protect their own privacy | |||
| details of how they operate their network. | and the details of how they operate their network. | |||
| 7. IANA Considerations | 7. IANA Considerations | |||
| This document makes no requests for IANA action. | This document has no IANA actions. | |||
| Acknowledgments | ||||
| The authors would like to thank Med Boucadair, Wanting Du, Joe | ||||
| Clarke, Javier Antich, Benoit Claise, Christopher Janz, Sherif | ||||
| Mostafa, Kristian Larsson, Dirk Hugo, Carsten Bormann, Hilarie Orman, | ||||
| Stewart Bryant, Bo Wu, Paul Kyzivat, Jouni Korhonen, Reshad Rahman, | ||||
| Rob Wilton, Mahesh Jethanandani, Tim Bray, Paul Aitken, and Deb | ||||
| Cooley for their helpful comments. | ||||
| Special thanks to the team that met at a side meeting at IETF-120 to | ||||
| discuss some of the thorny issues: | ||||
| * Benoit Claise | ||||
| * Watson Ladd | ||||
| * Brad Peters | ||||
| * Bo Wu | ||||
| * Georgios Karagiannis | ||||
| * Olga Havel | ||||
| * Vincenzo Riccobene | ||||
| * Yi Lin | ||||
| * Jie Dong | ||||
| * Aihua Guo | ||||
| * Thomas Graf | ||||
| * Qin Wu | ||||
| * Chaode Yu | ||||
| * Adrian Farrel | ||||
| Informative References | 8. Informative References | |||
| [I-D.ietf-nmop-network-anomaly-architecture] | [Net-Anomaly-Arch] | |||
| Graf, T., Du, W., Francois, P., and A. H. Feng, "A | Graf, T., Du, W., Francois, P., and A. Huang Feng, "A | |||
| Framework for a Network Anomaly Detection Architecture", | Framework for a Network Anomaly Detection Architecture", | |||
| Work in Progress, Internet-Draft, draft-ietf-nmop-network- | Work in Progress, Internet-Draft, draft-ietf-nmop-network- | |||
| anomaly-architecture-04, 4 July 2025, | anomaly-architecture-06, 21 November 2025, | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-nmop- | <https://datatracker.ietf.org/doc/html/draft-ietf-nmop- | |||
| network-anomaly-architecture-04>. | network-anomaly-architecture-06>. | |||
| [I-D.ietf-nmop-network-incident-yang] | [Net-Incident-Mgmt-YANG] | |||
| Hu, T., Contreras, L. M., Wu, Q., Davis, N., and C. Feng, | Hu, T., Contreras, L. M., Wu, Q., Davis, N., and C. Feng, | |||
| "A YANG Data Model for Network Incident Management", Work | "A YANG Data Model for Network Incident Management", Work | |||
| in Progress, Internet-Draft, draft-ietf-nmop-network- | in Progress, Internet-Draft, draft-ietf-nmop-network- | |||
| incident-yang-05, 6 July 2025, | incident-yang-08, 13 February 2026, | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-nmop- | <https://datatracker.ietf.org/doc/html/draft-ietf-nmop- | |||
| network-incident-yang-05>. | network-incident-yang-08>. | |||
| [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management | [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management | |||
| Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, | Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, | |||
| September 2004, <https://www.rfc-editor.org/info/rfc3877>. | September 2004, <https://www.rfc-editor.org/info/rfc3877>. | |||
| [RFC6632] Ersue, M., Ed. and B. Claise, "An Overview of the IETF | [RFC6632] Ersue, M., Ed. and B. Claise, "An Overview of the IETF | |||
| Network Management Standards", RFC 6632, | Network Management Standards", RFC 6632, | |||
| DOI 10.17487/RFC6632, June 2012, | DOI 10.17487/RFC6632, June 2012, | |||
| <https://www.rfc-editor.org/info/rfc6632>. | <https://www.rfc-editor.org/info/rfc6632>. | |||
| skipping to change at page 16, line 34 ¶ | skipping to change at line 685 ¶ | |||
| [RFC9315] Clemm, A., Ciavaglia, L., Granville, L. Z., and J. | [RFC9315] Clemm, A., Ciavaglia, L., Granville, L. Z., and J. | |||
| Tantsura, "Intent-Based Networking - Concepts and | Tantsura, "Intent-Based Networking - Concepts and | |||
| Definitions", RFC 9315, DOI 10.17487/RFC9315, October | Definitions", RFC 9315, DOI 10.17487/RFC9315, October | |||
| 2022, <https://www.rfc-editor.org/info/rfc9315>. | 2022, <https://www.rfc-editor.org/info/rfc9315>. | |||
| [RFC9417] Claise, B., Quilbeuf, J., Lopez, D., Voyer, D., and T. | [RFC9417] Claise, B., Quilbeuf, J., Lopez, D., Voyer, D., and T. | |||
| Arumugam, "Service Assurance for Intent-Based Networking | Arumugam, "Service Assurance for Intent-Based Networking | |||
| Architecture", RFC 9417, DOI 10.17487/RFC9417, July 2023, | Architecture", RFC 9417, DOI 10.17487/RFC9417, July 2023, | |||
| <https://www.rfc-editor.org/info/rfc9417>. | <https://www.rfc-editor.org/info/rfc9417>. | |||
| Acknowledgments | ||||
| The authors would like to thank Med Boucadair, Wanting Du, Joe | ||||
| Clarke, Javier Antich, Benoit Claise, Christopher Janz, Sherif | ||||
| Mostafa, Kristian Larsson, Dirk Hugo, Carsten Bormann, Hilarie Orman, | ||||
| Stewart Bryant, Bo Wu, Paul Kyzivat, Jouni Korhonen, Reshad Rahman, | ||||
| Rob Wilton, Mahesh Jethanandani, Tim Bray, Paul Aitken, and Deb | ||||
| Cooley for their helpful comments. | ||||
| Special thanks to the team that met at a side meeting at IETF 120 to | ||||
| discuss some of the thorny issues: | ||||
| * Benoit Claise | ||||
| * Watson Ladd | ||||
| * Brad Peters | ||||
| * Bo Wu | ||||
| * Georgios Karagiannis | ||||
| * Olga Havel | ||||
| * Vincenzo Riccobene | ||||
| * Yi Lin | ||||
| * Jie Dong | ||||
| * Aihua Guo | ||||
| * Thomas Graf | ||||
| * Qin Wu | ||||
| * Chaode Yu | ||||
| * Adrian Farrel | ||||
| Authors' Addresses | Authors' Addresses | |||
| Nigel Davis (editor) | Nigel Davis (editor) | |||
| Ciena | Ciena | |||
| United Kingdom | United Kingdom | |||
| Email: ndavis@ciena.com | Email: ndavis@ciena.com | |||
| Adrian Farrel (editor) | Adrian Farrel (editor) | |||
| Old Dog Consulting | Old Dog Consulting | |||
| United Kingdom | United Kingdom | |||
| End of changes. 63 change blocks. | ||||
| 180 lines changed or deleted | 180 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||