Diff: rfc9940.original

	rfc9940.original	rfc9940.txt


	Network Working Group N. Davis, Ed.	Internet Engineering Task Force (IETF) N. Davis, Ed.
	Internet-Draft Ciena	Request for Comments: 9940 Ciena
	Intended status: Informational A. Farrel, Ed.	Category: Informational A. Farrel, Ed.
	Expires: 19 February 2026 Old Dog Consulting	ISSN: 2070-1721 Old Dog Consulting
	T. Graf	T. Graf
	Swisscom	Swisscom
	Q. Wu	Q. Wu
	Huawei	Huawei
	C. Yu	C. Yu
	Huawei Technologies	Huawei Technologies

	18 August 2025	February 2026

	Some Key Terms for Network Fault and Problem Management	Some Key Terms for Network Fault and Problem Management

	draft-ietf-nmop-terminology-23

	Abstract	Abstract

	This document sets out some terms that are fundamental to a common	This document sets out some terms that are fundamental to a common
	understanding of network fault and problem management within the	understanding of network fault and problem management within the
	IETF.	IETF.

	The purpose of this document is to bring clarity to discussions and	The purpose of this document is to bring clarity to discussions and

	other work related to network fault and problem management, in	other work related to network fault and problem management -- in
	particular to YANG data models and management protocols that report,	particular, to YANG data models and management protocols that report,
	make visible, or manage network faults and problems.	make visible, or manage network faults and problems.

	Status of This Memo	Status of This Memo


	This Internet-Draft is submitted in full conformance with the	This document is not an Internet Standards Track specification; it is
	provisions of BCP 78 and BCP 79.	published for informational purposes.

	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF). Note that other groups may also distribute
	working documents as Internet-Drafts. The list of current Internet-
	Drafts is at https://datatracker.ietf.org/drafts/current/.


	Internet-Drafts are draft documents valid for a maximum of six months	This document is a product of the Internet Engineering Task Force
	and may be updated, replaced, or obsoleted by other documents at any	(IETF). It represents the consensus of the IETF community. It has
	time. It is inappropriate to use Internet-Drafts as reference	received public review and has been approved for publication by the
	material or to cite them other than as "work in progress."	Internet Engineering Steering Group (IESG). Not all documents
		approved by the IESG are candidates for any level of Internet
		Standard; see Section 2 of RFC 7841.


	This Internet-Draft will expire on 19 February 2026.	Information about the current status of this document, any errata,
		and how to provide feedback on it may be obtained at
		https://www.rfc-editor.org/info/rfc9940.

	Copyright Notice	Copyright Notice


	Copyright (c) 2025 IETF Trust and the persons identified as the	Copyright (c) 2026 IETF Trust and the persons identified as the
	document authors. All rights reserved.	document authors. All rights reserved.

	This document is subject to BCP 78 and the IETF Trust's Legal	This document is subject to BCP 78 and the IETF Trust's Legal

	Provisions Relating to IETF Documents (https://trustee.ietf.org/	Provisions Relating to IETF Documents
	license-info) in effect on the date of publication of this document.	(https://trustee.ietf.org/license-info) in effect on the date of
	Please review these documents carefully, as they describe your rights	publication of this document. Please review these documents
	and restrictions with respect to this document. Code Components	carefully, as they describe your rights and restrictions with respect
	extracted from this document must include Revised BSD License text as	to this document. Code Components extracted from this document must
	described in Section 4.e of the Trust Legal Provisions and are	include Revised BSD License text as described in Section 4.e of the
	provided without warranty as described in the Revised BSD License.	Trust Legal Provisions and are provided without warranty as described
		in the Revised BSD License.

	Table of Contents	Table of Contents


	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2	1. Introduction
	2. Usage of Terms . . . . . . . . . . . . . . . . . . . . . . . 3	2. Usage of Terms
	3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4	3. Terminology
	3.1. Context Terminology . . . . . . . . . . . . . . . . . . . 4	3.1. Context Terminology
	3.2. Core Terms . . . . . . . . . . . . . . . . . . . . . . . 5	3.2. Core Terms
	3.3. Other Terms . . . . . . . . . . . . . . . . . . . . . . . 9	3.3. Other Terms
	4. Workflow Explanations . . . . . . . . . . . . . . . . . . . . 9	4. Workflow Explanations
	5. Security Considerations . . . . . . . . . . . . . . . . . . . 14	5. Security Considerations
	6. Privacy Considerations . . . . . . . . . . . . . . . . . . . 14	6. Privacy Considerations
	7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14	7. IANA Considerations
	Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 15	8. Informative References
	Informative References . . . . . . . . . . . . . . . . . . . . . 15	Acknowledgments
	Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16	Authors' Addresses

	1. Introduction	1. Introduction

	Successful operation of large networks depends on effective network	Successful operation of large networks depends on effective network
	management. This requires a virtuous circle of network control,	management. This requires a virtuous circle of network control,
	network observability, network analytics, network assurance, and back	network observability, network analytics, network assurance, and back
	to network control. Network fault and problem management [RFC6632]	to network control. Network fault and problem management [RFC6632]
	is an important aspect of network management and control solutions.	is an important aspect of network management and control solutions.
	It deals with the detection, reporting, inspection, isolation,	It deals with the detection, reporting, inspection, isolation,
	correlation, and management of events within the network. The	correlation, and management of events within the network. The

	skipping to change at page 3, line 7 ¶	skipping to change at line 94 ¶
	negative effect on the network's ability to forward traffic according	negative effect on the network's ability to forward traffic according
	to expected behavior and so deliver services, the ability to control	to expected behavior and so deliver services, the ability to control
	and operate the network, and other faults that reduce the quality or	and operate the network, and other faults that reduce the quality or
	reliability of the delivered service. The concept of fault and	reliability of the delivered service. The concept of fault and
	problem management extends to include actions taken to determine the	problem management extends to include actions taken to determine the
	causes of problems and to work toward recovery of expected network	causes of problems and to work toward recovery of expected network
	behavior.	behavior.

	A number of work efforts within the IETF seek to provide components	A number of work efforts within the IETF seek to provide components
	of a fault management system, such as YANG data models or management	of a fault management system, such as YANG data models or management

	protocols. It is important that a common terminology is used so that	protocols. It is important that a common terminology be used so that
	there is a clear understanding of how the elements of the management	there is a clear understanding of how the elements of the management

	and control solutions fit together, and how faults and problems will	and control solutions fit together and how faults and problems will
	be handled.	be handled.

	This document sets out some terms that are fundamental to a common	This document sets out some terms that are fundamental to a common
	understanding of network fault and problem management. While	understanding of network fault and problem management. While
	"faults" and "problems" are concepts that apply at all levels of	"faults" and "problems" are concepts that apply at all levels of
	technology in the Internet, the scope of this document is restricted	technology in the Internet, the scope of this document is restricted

	to the network layer and below, hence this document is specifically	to the network layer and below; hence, this document is specifically
	about "network fault and problem management." The concept of	about "network fault and problem management." The concept of
	"incidents" is also touched on in this document, where an incident	"incidents" is also touched on in this document, where an incident
	results from one or more problems and is the disruption of a network	results from one or more problems and is the disruption of a network
	service.	service.

	Note that some useful terms are defined in [RFC3877] and [RFC8632].	Note that some useful terms are defined in [RFC3877] and [RFC8632].
	The definitions in this document are informed by those documents, but	The definitions in this document are informed by those documents, but
	they are not dependent on that prior work.	they are not dependent on that prior work.

	2. Usage of Terms	2. Usage of Terms

	The terms defined in this document are intended for consistent use	The terms defined in this document are intended for consistent use
	within the IETF in the scope of network fault and problem management.	within the IETF in the scope of network fault and problem management.
	Where similar concepts are described in other bodies, an attempt has	Where similar concepts are described in other bodies, an attempt has

	been made to harmonize with those other descriptions, but there is	been made to harmonize with those other descriptions, but care is
	care needed where terms are not used consistently between bodies or	needed where terms are not used consistently between bodies or where
	where terms are applied outside the network layer. If other bodies	terms are applied outside the network layer. If other bodies find
	find the terminology defined in this document useful, they are free	the terminology defined in this document useful, they are free to use
	to use it.	it.

	The purpose of this document is to define the following terms for use	The purpose of this document is to define the following terms for use
	in other documents. Other terms are defined to enable those	in other documents. Other terms are defined to enable those
	definitions and may also be used by other documents, although that is	definitions and may also be used by other documents, although that is
	not the principal purpose of their definitions here.	not the principal purpose of their definitions here.

	* Event	* Event
	* State	* State
	* Fault	* Fault
	* Problem	* Problem

	skipping to change at page 4, line 4 ¶	skipping to change at line 137 ¶
	not the principal purpose of their definitions here.	not the principal purpose of their definitions here.

	* Event	* Event
	* State	* State
	* Fault	* Fault
	* Problem	* Problem
	* Symptom	* Symptom
	* Cause	* Cause
	* Alert	* Alert
	* Alarm	* Alarm


	When other documents make use of the terms as defined in this	When other documents make use of the terms as defined in this
	document, it is suggested here that such uses should use	document, it is suggested here that such uses should use
	capitalization of the terms as in this document to help distinguish	capitalization of the terms as in this document to help distinguish

	them from colloquial uses, and should include an early section	them from colloquial uses and should include an early section listing
	listing the terms inherited from this document with a citation.	the terms inherited from this document with a citation.

	3. Terminology	3. Terminology

	This section contains key terms. It is split into three subsections.	This section contains key terms. It is split into three subsections.


	* Section 3.1 contains terms that help to set the context for	* Section 3.1 contains terms that help set the context for network
	network fault and problem management systems.	fault and problem management systems.

	* Section 3.2 includes specific and detailed core terms that will be	* Section 3.2 includes specific and detailed core terms that will be
	used in other documents that describe elements of the network	used in other documents that describe elements of the network
	fault and problem management systems.	fault and problem management systems.

	* Section 3.3 provides three further terms that may be helpful.	* Section 3.3 provides three further terms that may be helpful.

	3.1. Context Terminology	3.1. Context Terminology

	This section includes some terminology that helps describe the	This section includes some terminology that helps describe the
	context for the rest of this work. The terms may be viewed as a	context for the rest of this work. The terms may be viewed as a
	cascaded sequence of processes, starting with Network Telemetry and	cascaded sequence of processes, starting with Network Telemetry and
	building to Network Observability. The definitions are deliberately	building to Network Observability. The definitions are deliberately
	kept relatively terse. Further documents may expand on these terms	kept relatively terse. Further documents may expand on these terms
	without loss of specificity. Such contextualization (if any) should	without loss of specificity. Such contextualization (if any) should
	be highlighted clearly in those documents.	be highlighted clearly in those documents.

	Network Telemetry: This is defined in [RFC9232] and describes the	Network Telemetry: This is defined in [RFC9232] and describes the
	process of collecting operational network data categorized	process of collecting operational network data categorized

	according to the network plane (e.g., layer 3, layer 2, and layer	according to the network plane (e.g., Layer 3, Layer 2, and Layer
	1) from which it was derived. Data collected through the Network	1) from which it was derived. Data collected through the Network
	Telemetry process does not contain any data related to service	Telemetry process does not contain any data related to service
	definitions (i.e., "intent" per Section 3.1 of [RFC9315]).	definitions (i.e., "intent" per Section 3.1 of [RFC9315]).

	Network Monitoring: This is the process of keeping a continuous	Network Monitoring: This is the process of keeping a continuous
	record of functions related to a network topology. It involves	record of functions related to a network topology. It involves
	tracking various aspects such as traffic patterns, device health,	tracking various aspects such as traffic patterns, device health,

	performance metrics, and overall network behaviour. This approach	performance metrics, and overall network behavior. This approach
	differentiates network monitoring from resource or device	differentiates network monitoring from resource or device

	monitoring, which focuses on individual components or resources	monitoring, which focuses on individual resources or components
	(Section 3.2).	(Section 3.2).

	Network Analytics: This is the process of deriving analytical	Network Analytics: This is the process of deriving analytical
	insights from operational network data. A process could be	insights from operational network data. A process could be
	executed by a piece of software, a system, or a human that	executed by a piece of software, a system, or a human that
	analyzes operational data and outputs new analytical data related	analyzes operational data and outputs new analytical data related

	to the operational data, for example, a symptom.	to the operational data -- for example, a symptom.

	Network Observability: This is the process of enabling network	Network Observability: This is the process of enabling network
	behavioral assessment through analysis of observed operational	behavioral assessment through analysis of observed operational
	network data (logs, alarms, traces, etc.) with the aim of	network data (logs, alarms, traces, etc.) with the aim of
	detecting symptoms of network behavior, and to identify anomalies	detecting symptoms of network behavior, and to identify anomalies
	and their causes. Network Observability begins with information	and their causes. Network Observability begins with information
	gathered using Network Monitoring tools and that may be further	gathered using Network Monitoring tools and that may be further
	enriched with other operational data. The expected outcome of the	enriched with other operational data. The expected outcome of the
	observability processes is identification and analysis of	observability processes is identification and analysis of
	deviations in observed state versus the expected state of a	deviations in observed state versus the expected state of a

	skipping to change at page 5, line 37 ¶	skipping to change at line 216 ¶
	data gathered in Network Telemetry.	data gathered in Network Telemetry.

	* Network Analytics is the process of deriving insight through the	* Network Analytics is the process of deriving insight through the
	data recorded in Network Monitoring.	data recorded in Network Monitoring.

	* Network Observability is the process of enabling behavioral	* Network Observability is the process of enabling behavioral
	assessment of a network through Network Analytics.	assessment of a network through Network Analytics.

	3.2. Core Terms	3.2. Core Terms


	The terms are presented below in an order that is intended to flow	The terms in this section are presented in an order that is intended
	such that it is possible to gain understanding reading top to bottom.	to flow such that it is possible to gain understanding reading top to
	The figures and explanations in Section 4 may aid understanding the	bottom. The figures and explanations in Section 4 may aid
	terms set out here.	understanding the terms set out here.

	Resource: An element of a network system.	Resource: An element of a network system.


	Resource is a recursive concept so that a Resource may be a	* Resource is a recursive concept so that a Resource may be a
	collection of other Resources (for example, a network node	collection of other Resources (for example, a network node
	comprises a collection of network interfaces).	comprises a collection of network interfaces).

	Characteristic: Observable or measurable aspect or behavior	Characteristic: Observable or measurable aspect or behavior
	associated with a Resource.	associated with a Resource.

	* A Characteristic may be considered to be built on facts (see	* A Characteristic may be considered to be built on facts (see
	'Value', below) and the contexts and descriptors that identify	'Value', below) and the contexts and descriptors that identify
	and give meaning to the facts.	and give meaning to the facts.

	* The term "Metric" [RFC9417] is another word for a measurable	* The term "Metric" [RFC9417] is another word for a measurable
	Characteristic which may also be thought of as analogous to a	Characteristic which may also be thought of as analogous to a
	'variable'.	'variable'.


	Value: A Value is a measure of a Characteristic associated with a	Value: A measure of a Characteristic associated with a Resource. It
	Resource. It may be in the form of a categorization (e.g., high	may be in the form of a categorization (e.g., high or low), an
	or low), an integer (e.g., a count or gauge), or a reading of a	integer (e.g., a count or gauge), or a reading of a continuous
	continuous variable (e.g., an analog measurement), etc.	variable (e.g., an analog measurement), etc.


	Change: In the context of Network Monitoring, a Change is the	Change: In the context of Network Monitoring, the variation in the
	variation in the Value of a Characteristic associated with a	Value of a Characteristic associated with a Resource. A Change
	Resource and may arise over a period of time.	may arise over a period of time.

	* Not all Changes are noteworthy (i.e., they do not have	* Not all Changes are noteworthy (i.e., they do not have
	Relevance).	Relevance).

	* Perception of Change depends upon Detection, the sampling	* Perception of Change depends upon Detection, the sampling
	rate/accuracy/detail, and perspective.	rate/accuracy/detail, and perspective.

	* It may be helpful to qualify this as "Value Change" because the	* It may be helpful to qualify this as "Value Change" because the
	English word "change" is often heavily used.	English word "change" is often heavily used.

	Event: The variation in Value of a Characteristic of a Resource at a	Event: The variation in Value of a Characteristic of a Resource at a
	distinct moment in time (i.e., the period is negligible).	distinct moment in time (i.e., the period is negligible).

	* Compared with a Change, which may be over a period of time, an	* Compared with a Change, which may be over a period of time, an
	Event happens at a distinct moment in time. Thus, an Event may	Event happens at a distinct moment in time. Thus, an Event may
	be the observation of a Change.	be the observation of a Change.


	Condition: A Condition is an interpretation of the Values of a set	Condition: An interpretation of the Values of a set of one or more
	of one or more Characteristics of a Resource (with respect to	Characteristics of a Resource (with respect to working order or
	working order or some other aspect relevant to the Resource	some other aspect relevant to the Resource purpose/application) --
	purpose/application), for example "low available memory". Thus,	for example, "low available memory". Thus, it is the output of a
	it is the output of a function applied to a set of one or more	function applied to a set of one or more variables.
	variables.

	State: A particular Condition that a Resource has (i.e., it is in a	State: A particular Condition that a Resource has (i.e., it is in a
	State) at a specific time. For example, a router may report the	State) at a specific time. For example, a router may report the

	total amount of memory it has, and how much is free. These are	total amount of memory it has and how much is free. These are the
	the Values of two Characteristics of a Resource. These Values can	Values of two Characteristics of a Resource. These Values can be
	be interpreted to determine the Condition of the Resource, and	interpreted to determine the Condition of the Resource, and that
	that may determine the State of the router, such as shortage of	may determine the State of the router, such as shortage of memory.
	memory.

	* While a State may be observed at a specific moment in time, it	* While a State may be observed at a specific moment in time, it
	is actually determined by summarizing measurement over time in	is actually determined by summarizing measurement over time in
	a process sometimes called State compression.	a process sometimes called State compression.

	* It may be helpful to qualify this as "Resource State" to make	* It may be helpful to qualify this as "Resource State" to make
	clear the distinction between this and other uses of "state"	clear the distinction between this and other uses of "state"
	such as "protocol state".	such as "protocol state".

	* This term may be contrasted with "Operational State" as used in	* This term may be contrasted with "Operational State" as used in
	[RFC8342]. For example, the state of a link might be up/down/	[RFC8342]. For example, the state of a link might be up/down/

	degraded, but the operational state of link would include a	degraded, but the operational state of the link would include a
	collection of Values of Characteristics of the link.	collection of Values of Characteristics of the link.

	Detect (hence Detected, Detection): To notice the presence of	Detect (hence Detected, Detection): To notice the presence of

	something (State, Change, Event, activity, etc.).	something (State, Change, Event, activity, etc.) and hence also to
		notice a Change (from the perspective of an observer such as a
	* Hence also to notice a Change (from the perspective of an	monitoring system).
	observer such as a monitoring system).

	Relevance: Consideration of an Event, State, or Value (through the	Relevance: Consideration of an Event, State, or Value (through the
	application of policy, relative to a specific perspective, intent,	application of policy, relative to a specific perspective, intent,
	and in relation to other Events, States, and Values) to determine	and in relation to other Events, States, and Values) to determine
	whether it is of note to the system that controls or manages the	whether it is of note to the system that controls or manages the
	network. Note, for example, that not all Changes are Relevant.	network. Note, for example, that not all Changes are Relevant.

	* This term may also be used as "Relevant Event", "Relevant	* This term may also be used as "Relevant Event", "Relevant
	State", or "Relevant Value".	State", or "Relevant Value".

	Occurrence: A Relevant Event or a particular Relevant Change.	Occurrence: A Relevant Event or a particular Relevant Change.

	* An Occurrence may be an aggregation or abstraction of multiple	* An Occurrence may be an aggregation or abstraction of multiple

	fine-grain Events or Changes.	fine-grained Events or Changes.

	* An Occurrence may occur at any macro or micro scale because	* An Occurrence may occur at any macro or micro scale because

	Resources are a recursive concept, and may be perceived	Resources are a recursive concept, and may be perceived,
	depending on the scope of observation (i.e., according to the	depending on the scope of observation (i.e., according to the
	level of Resource recursion that is examined). That is,	level of Resource recursion that is examined). That is,

	Occurrences, themselves are a recursive concept.	Occurrences, themselves, are a recursive concept.

	Fault: An Occurrence (i.e., an Event or a Change) that is not	Fault: An Occurrence (i.e., an Event or a Change) that is not
	desired/required (as it may be indicative of a current or future	desired/required (as it may be indicative of a current or future
	undesired State). Thus, a Fault happens at a moment in time. A	undesired State). Thus, a Fault happens at a moment in time. A
	Fault can potentially be associated with a Cause. See [RFC8632]	Fault can potentially be associated with a Cause. See [RFC8632]
	for a more detailed discussion of network faults.	for a more detailed discussion of network faults.

	* Note that there is a distinction between a Fault and a Problem	* Note that there is a distinction between a Fault and a Problem
	that depends on context. For example, in a connectivity	that depends on context. For example, in a connectivity
	service where redundancy is present, a link down is a Problem,	service where redundancy is present, a link down is a Problem,

	skipping to change at page 8, line 23 ¶	skipping to change at line 342 ¶

	* Note that there is a historic aspect to the concept of a	* Note that there is a historic aspect to the concept of a
	Problem. The current State may be operational, but there could	Problem. The current State may be operational, but there could
	have been a Fault that is unexplained, and the fact of that	have been a Fault that is unexplained, and the fact of that
	unexplained recent Fault is a Problem.	unexplained recent Fault is a Problem.

	* Note that while a Problem is unresolved it may continue to	* Note that while a Problem is unresolved it may continue to
	require attention. A record of resolved Problems may be	require attention. A record of resolved Problems may be
	maintained in a log.	maintained in a log.


	* Note that there may be a State which is considered to be a	* Note that there may be a State that is considered to be a
	Problem from several perspectives. For example, consider a	Problem from several perspectives. For example, consider a
	"loss of light" State that may cause multiple services to fail.	"loss of light" State that may cause multiple services to fail.
	In this example, a new State (the light recovers) may cause the	In this example, a new State (the light recovers) may cause the
	Problem to be resolved from one perspective (the services are	Problem to be resolved from one perspective (the services are

	operational once more), but may leave the Problem as unresolved	operational once more) but may leave the Problem as unresolved
	(because the loss of light has not been explained). Further,	(because the loss of light has not been explained). Further,
	in this example, there could be another development (the reason	in this example, there could be another development (the reason
	for the temporary loss of light is traced to a microbend in the	for the temporary loss of light is traced to a microbend in the
	fiber that is repaired) resulting in that unresolved Problem	fiber that is repaired) resulting in that unresolved Problem
	now being resolved. But, in this example, this still leaves a	now being resolved. But, in this example, this still leaves a
	further Problem unresolved (a microbend occurred, and that	further Problem unresolved (a microbend occurred, and that
	Problem is not resolved until it is understood how it occurred	Problem is not resolved until it is understood how it occurred
	and a remedy is put in place to prevent recurrence).	and a remedy is put in place to prevent recurrence).

	Cause: The Events (Detected or otherwise) that gave rise to a Fault/	Cause: The Events (Detected or otherwise) that gave rise to a Fault/
	Problem.	Problem.


	Incident: A (Network) Incident is an undesired Occurrence such as an	Incident: Also referred to as "Network Incident". An Incident is an
	unexpected interruption of a network service, degradation of the	undesired Occurrence such as an unexpected interruption of a
	quality of a network service, or the below-target performance of a	network service, degradation of the quality of a network service,
	network service. An Incident results from one or more Problems,	or the below-target performance of a network service. An Incident
	and a Problem may give rise to or contribute to one or more	results from one or more Problems, and a Problem may give rise to
	Incidents. Greater discussion of Network Incident relationships,	or contribute to one or more Incidents. Greater discussion of
	including Customer Incidents and Incident management, can be found	Network Incident relationships, including Customer Incidents and
	in [I-D.ietf-nmop-network-incident-yang].	Incident management, can be found in [Net-Incident-Mgmt-YANG].

	Symptom: An observable Value, Change, State, Event, or Condition	Symptom: An observable Value, Change, State, Event, or Condition
	considered as an indication of a Problem or potential Problem.	considered as an indication of a Problem or potential Problem.


	Anomaly: A (Network) Anomaly is an unusual or unexpected Event or	Anomaly: Also referred to as "Network Anomaly". An Anomaly is an
	pattern in network data in the forwarding plane, control plane, or	unusual or unexpected Event or pattern in network data in the
	management plane that deviates from the normal, expected behavior.	forwarding plane, control plane, or management plane that deviates
	See [I-D.ietf-nmop-network-anomaly-architecture] for more details.	from the normal, expected behavior. See [Net-Anomaly-Arch] for
		more details.

	Alert: An indication of a Fault.	Alert: An indication of a Fault.


	Alarm: As specified in [RFC8632], an Alarm signifies an undesirable	Alarm: As specified in [RFC8632], signifies an undesirable State in
	State in a Resource that requires corrective action. From a	a Resource that requires corrective action. From a management
	management point of view, an Alarm can be seen as a State in its	point of view, an Alarm can be seen as a State in its own right
	own right and the transition to this State may result in an Alert	and the transition to this State may result in an Alert being
	being issued. The receipt of this Alert may give rise to a	issued. The receipt of this Alert may give rise to a continuous
	continuous indication (to a human operator) highlighting the	indication (to a human operator) highlighting the potential or
	potential or actual presence of a Problem.	actual presence of a Problem.

	3.3. Other Terms	3.3. Other Terms

	Three other terms may be helpful:	Three other terms may be helpful:


	Intermittent: A State that is not continuous, but keeps recurring in	Intermittent: A State that is not continuous but that keeps
	some time frame.	recurring in some time frame.


	Transient: A State that is not continuous, and occurs once in some	Transient: A State that is not continuous and that occurs once in
	time frame.	some time frame.


	Recurrent: A Problem that is actively resolved, but returns.	Recurrent: A Problem that is actively resolved but returns.

	4. Workflow Explanations	4. Workflow Explanations

	This section aims to add information about the relationship between	This section aims to add information about the relationship between
	the terms defined in Section 3.2 in the context of network fault and	the terms defined in Section 3.2 in the context of network fault and
	problem management. The text and figures here are for explanation	problem management. The text and figures here are for explanation
	and are not normative for the definition of terms.	and are not normative for the definition of terms.

	The relationship between Resources and Characteristics is shown in	The relationship between Resources and Characteristics is shown in

	Figure 1. Note that there is a 1:n relationship between Network	Figure 1. Note that there is a 1:n relationship between a Network
	system and Resources, and between Resources and Characteristics: this	system and Resources and between Resources and Characteristics: For
	is not shown on the figure for clarity.	clarity, this is not shown in the figure.


	Characteristics	Characteristics
	^	^
	\|	\|
	Resources	Resources
	^	^
	\|	\|
	Network system	Network system

	Figure 1: Resources and Characteristics	Figure 1: Resources and Characteristics

	The Value of a Characteristic of a Resource may change over time.	The Value of a Characteristic of a Resource may change over time.
	Specific Changes in Value may be noticed at a specific time (as	Specific Changes in Value may be noticed at a specific time (as
	digital Changes), Detected, and treated as Events. This is shown on	digital Changes), Detected, and treated as Events. This is shown on

	the left of Figure 2.	the left-hand side of Figure 2.

	The center of Figure 2 shows how the Value of a Characteristic may	The center of Figure 2 shows how the Value of a Characteristic may
	change over time. The Value may be Detected at specific times or	change over time. The Value may be Detected at specific times or
	periodically and give rise to Conditions that are States (and	periodically and give rise to Conditions that are States (and
	consequently State Changes).	consequently State Changes).

	In practice, the Characteristic may vary in an analog manner over	In practice, the Characteristic may vary in an analog manner over
	time as shown on the right-hand side of Figure 2. The Value can be	time as shown on the right-hand side of Figure 2. The Value can be
	read or reported (i.e., Detected) periodically leading to analog	read or reported (i.e., Detected) periodically leading to analog

	Values that may be deemed Relevant Values, or may be evaluated over	Values that may be deemed Relevant Values, or it may be evaluated
	time as shown in Figure 6.	over time as shown in Figure 6.

	Event State Value	Event State Value
	Condition	Condition
	^ ^ ^	^ ^ ^
	Detect : Detect : Detect :	Detect : Detect : Detect :
	: : :	: : :

	^ ^ ^ ^ ^ /\	^ ^ ^ ^ ^ /\
	: : : : : / \	: : : : : / \
	: : : : : /\ / \	: : : : : /\ / \

	skipping to change at page 11, line 33 ¶	skipping to change at line 494 ¶
	\|	\|
	Event	Event

	Figure 3: Event and Dependent Terms	Figure 3: Event and Dependent Terms

	Parallel to the workflow for Events, Figure 4 shows the workflow	Parallel to the workflow for Events, Figure 4 shows the workflow
	progress for States. As shown in Figure 2, Change noted at a	progress for States. As shown in Figure 2, Change noted at a
	particular time gives rise to State. The State may be deemed to have	particular time gives rise to State. The State may be deemed to have
	Relevance considering policy, relative to a specific perspective,	Relevance considering policy, relative to a specific perspective,
	with a view to intent, and in relation to other Events, States, and	with a view to intent, and in relation to other Events, States, and

	Values. A Relevant State may be deemed a Problem, or may indicate a	Values. A Relevant State may be deemed a Problem, or it may indicate
	Problem or potential Problem.	a Problem or potential Problem.

	Problems may be considered based on Symptoms and may map directly or	Problems may be considered based on Symptoms and may map directly or
	indirectly to Causes. An Incident results from one or more Problems.	indirectly to Causes. An Incident results from one or more Problems.
	An Alarm may be raised as the result of a Problem, and the transition	An Alarm may be raised as the result of a Problem, and the transition
	to an Alarmed state may give rise to an Alert.	to an Alarmed state may give rise to an Alert.

	Alarm - - -> Alert	Alarm - - -> Alert
	^	^
	\| ------> Incident	\| ------> Incident
	\| \|	\| \|

	skipping to change at page 12, line 26 ¶	skipping to change at line 523 ¶
	\|	\|
	State	State

	Figure 4: State and Dependent Terms	Figure 4: State and Dependent Terms

	Figure 5 shows how Faults and Problems may be consolidated to	Figure 5 shows how Faults and Problems may be consolidated to
	determine the Causes. The arrows show how one item may give rise to	determine the Causes. The arrows show how one item may give rise to
	another.	another.

	A Cause can be indicated by or determined from Faults, Problems, and	A Cause can be indicated by or determined from Faults, Problems, and

	Symptoms. It may be that one Cause points to another, and can also	Symptoms. It may be that one Cause points to another, and it can
	be considered as a Symptom. The determination of Causes can consider	also be considered as a Symptom. The determination of Causes can
	multiple inputs. An Incident results from one or more Problems.	consider multiple inputs. An Incident results from one or more
		Problems.

	---------	---------
	------------- \| \|	------------- \| \|
	\| ----------> \| Symptom \|	\| ----------> \| Symptom \|
	\| \| \| \|	\| \| \| \|
	\| \| ---------	\| \| ---------
	v \| ^	v \| ^
	--------- \|	--------- \|
	------->\| Cause \|<--------- \|	------->\| Cause \|<--------- \|
	\| --------- \| \|	\| --------- \| \|

	skipping to change at page 13, line 13 ¶	skipping to change at line 555 ¶
	Figure 5: Consolidation of Symptoms and Causes	Figure 5: Consolidation of Symptoms and Causes

	Figure 6 shows how thresholds are important in the consideration of	Figure 6 shows how thresholds are important in the consideration of
	analog Values and Events. The arrows in the figure show how one item	analog Values and Events. The arrows in the figure show how one item
	may give rise to or utilize another. The use of threshold-driven	may give rise to or utilize another. The use of threshold-driven
	Events and States (and the Alerts that they might give rise to) must	Events and States (and the Alerts that they might give rise to) must
	be treated with caution to dampen any "flapping" (so that consistent	be treated with caution to dampen any "flapping" (so that consistent
	States may be observed) and to avoid overwhelming management	States may be observed) and to avoid overwhelming management
	processes or systems. Analog Values may be read or notified from the	processes or systems. Analog Values may be read or notified from the
	Resource and could transition a threshold, be deemed Relevant Values,	Resource and could transition a threshold, be deemed Relevant Values,

	or evaluated over time. Events may be counted, and the Count may	or be evaluated over time. Events may be counted, and the Count may
	cross a threshold or reach a Relevant Value.	cross a threshold or reach a Relevant Value.


	The Threshold Process may be implementation-specific and subject to	The Threshold Process may be implementation specific and subject to
	policies. When a threshold is crossed and any other conditions are	policies. When a threshold is crossed and any other conditions are

	matched, an Event may be determined, and treated like any other	matched, an Event may be determined and may be treated like any other
	Event.	Event.

	Occurrence	Occurrence
	^	^
	\|	\|
	\|---------------------> State	\|---------------------> State
	\|	\|
	\| ------- Relevance	\| ------- Relevance
	\|------>\| Count \|-----------------------------> Value	\|------>\| Count \|-----------------------------> Value
	\| ------- \| ^	\| ------- \| ^

	skipping to change at page 14, line 13 ¶	skipping to change at line 599 ¶
	Figure 6: Counts, Thresholds, and Values	Figure 6: Counts, Thresholds, and Values

	5. Security Considerations	5. Security Considerations

	This document specifies terminology and has no direct effect on the	This document specifies terminology and has no direct effect on the
	security of implementations or deployments. However, protocol	security of implementations or deployments. However, protocol
	solutions and management models need to be aware of several aspects:	solutions and management models need to be aware of several aspects:

	* The exposure of information pertaining to Faults and Problems may	* The exposure of information pertaining to Faults and Problems may
	make available knowledge of the internal workings of a network (in	make available knowledge of the internal workings of a network (in

	particular its vulnerabilities) that may be of use to an attacker.	particular, its vulnerabilities) that may be of use to an
		attacker.

	* Systems that generate management information (messages,	* Systems that generate management information (messages,

	notifications, etc.) when Faults occur, may be attacked by causing	notifications, etc.) when Faults occur may be attacked by causing
	them to generate so much information that the system that manages	them to generate so much information that the system that manages
	the network is swamped and unable to properly manage the network.	the network is swamped and unable to properly manage the network.

	* Reporting false information about Faults (or masking reports of	* Reporting false information about Faults (or masking reports of
	Faults) may cause the system that manages the network to function	Faults) may cause the system that manages the network to function
	incorrectly.	incorrectly.

	6. Privacy Considerations	6. Privacy Considerations

	Network fault and problem management should preserve user privacy by	Network fault and problem management should preserve user privacy by
	not exposing user data or information about end-user activities.	not exposing user data or information about end-user activities.

	Network Telemetry involves observing network traffic and collecting	Network Telemetry involves observing network traffic and collecting
	operational data from the network, while Network Monitoring is the	operational data from the network, while Network Monitoring is the
	process of keeping records of data gathered in Network Telemetry.	process of keeping records of data gathered in Network Telemetry.
	Therefore, it is possible that the data observed and collected	Therefore, it is possible that the data observed and collected
	includes users' privacy information. Such information must be	includes users' privacy information. Such information must be

	protected and controlled to avoid exposure to unauthorised parties.	protected and controlled to avoid exposure to unauthorized parties.
	Particular care may need to be exercised over stores of such	Particular care may need to be exercised over stores of such

	information which might be accessed at any time (including far into	information that might be accessed at any time (including far into
	the future).	the future).


	Additionally, a network operator will be concerned to keep control of	Additionally, a network operator will be concerned about keeping
	all information about Faults to protect their own privacy and the	control of all information about Faults to protect their own privacy
	details of how they operate their network.	and the details of how they operate their network.

	7. IANA Considerations	7. IANA Considerations


	This document makes no requests for IANA action.	This document has no IANA actions.

	Acknowledgments

	The authors would like to thank Med Boucadair, Wanting Du, Joe
	Clarke, Javier Antich, Benoit Claise, Christopher Janz, Sherif
	Mostafa, Kristian Larsson, Dirk Hugo, Carsten Bormann, Hilarie Orman,
	Stewart Bryant, Bo Wu, Paul Kyzivat, Jouni Korhonen, Reshad Rahman,
	Rob Wilton, Mahesh Jethanandani, Tim Bray, Paul Aitken, and Deb
	Cooley for their helpful comments.

	Special thanks to the team that met at a side meeting at IETF-120 to
	discuss some of the thorny issues:

	* Benoit Claise
	* Watson Ladd
	* Brad Peters
	* Bo Wu
	* Georgios Karagiannis
	* Olga Havel
	* Vincenzo Riccobene
	* Yi Lin
	* Jie Dong
	* Aihua Guo
	* Thomas Graf
	* Qin Wu
	* Chaode Yu
	* Adrian Farrel


	Informative References	8. Informative References


	[I-D.ietf-nmop-network-anomaly-architecture]	[Net-Anomaly-Arch]
	Graf, T., Du, W., Francois, P., and A. H. Feng, "A	Graf, T., Du, W., Francois, P., and A. Huang Feng, "A
	Framework for a Network Anomaly Detection Architecture",	Framework for a Network Anomaly Detection Architecture",
	Work in Progress, Internet-Draft, draft-ietf-nmop-network-	Work in Progress, Internet-Draft, draft-ietf-nmop-network-

	anomaly-architecture-04, 4 July 2025,	anomaly-architecture-06, 21 November 2025,
	<https://datatracker.ietf.org/doc/html/draft-ietf-nmop-	<https://datatracker.ietf.org/doc/html/draft-ietf-nmop-

	network-anomaly-architecture-04>.	network-anomaly-architecture-06>.


	[I-D.ietf-nmop-network-incident-yang]	[Net-Incident-Mgmt-YANG]
	Hu, T., Contreras, L. M., Wu, Q., Davis, N., and C. Feng,	Hu, T., Contreras, L. M., Wu, Q., Davis, N., and C. Feng,
	"A YANG Data Model for Network Incident Management", Work	"A YANG Data Model for Network Incident Management", Work
	in Progress, Internet-Draft, draft-ietf-nmop-network-	in Progress, Internet-Draft, draft-ietf-nmop-network-

	incident-yang-05, 6 July 2025,	incident-yang-08, 13 February 2026,
	<https://datatracker.ietf.org/doc/html/draft-ietf-nmop-	<https://datatracker.ietf.org/doc/html/draft-ietf-nmop-

	network-incident-yang-05>.	network-incident-yang-08>.

	[RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management	[RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management
	Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877,	Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877,
	September 2004, <https://www.rfc-editor.org/info/rfc3877>.	September 2004, <https://www.rfc-editor.org/info/rfc3877>.

	[RFC6632] Ersue, M., Ed. and B. Claise, "An Overview of the IETF	[RFC6632] Ersue, M., Ed. and B. Claise, "An Overview of the IETF
	Network Management Standards", RFC 6632,	Network Management Standards", RFC 6632,
	DOI 10.17487/RFC6632, June 2012,	DOI 10.17487/RFC6632, June 2012,
	<https://www.rfc-editor.org/info/rfc6632>.	<https://www.rfc-editor.org/info/rfc6632>.


	skipping to change at page 16, line 34 ¶	skipping to change at line 685 ¶
	[RFC9315] Clemm, A., Ciavaglia, L., Granville, L. Z., and J.	[RFC9315] Clemm, A., Ciavaglia, L., Granville, L. Z., and J.
	Tantsura, "Intent-Based Networking - Concepts and	Tantsura, "Intent-Based Networking - Concepts and
	Definitions", RFC 9315, DOI 10.17487/RFC9315, October	Definitions", RFC 9315, DOI 10.17487/RFC9315, October
	2022, <https://www.rfc-editor.org/info/rfc9315>.	2022, <https://www.rfc-editor.org/info/rfc9315>.

	[RFC9417] Claise, B., Quilbeuf, J., Lopez, D., Voyer, D., and T.	[RFC9417] Claise, B., Quilbeuf, J., Lopez, D., Voyer, D., and T.
	Arumugam, "Service Assurance for Intent-Based Networking	Arumugam, "Service Assurance for Intent-Based Networking
	Architecture", RFC 9417, DOI 10.17487/RFC9417, July 2023,	Architecture", RFC 9417, DOI 10.17487/RFC9417, July 2023,
	<https://www.rfc-editor.org/info/rfc9417>.	<https://www.rfc-editor.org/info/rfc9417>.


		Acknowledgments

		The authors would like to thank Med Boucadair, Wanting Du, Joe
		Clarke, Javier Antich, Benoit Claise, Christopher Janz, Sherif
		Mostafa, Kristian Larsson, Dirk Hugo, Carsten Bormann, Hilarie Orman,
		Stewart Bryant, Bo Wu, Paul Kyzivat, Jouni Korhonen, Reshad Rahman,
		Rob Wilton, Mahesh Jethanandani, Tim Bray, Paul Aitken, and Deb
		Cooley for their helpful comments.

		Special thanks to the team that met at a side meeting at IETF 120 to
		discuss some of the thorny issues:

		* Benoit Claise
		* Watson Ladd
		* Brad Peters
		* Bo Wu
		* Georgios Karagiannis
		* Olga Havel
		* Vincenzo Riccobene
		* Yi Lin
		* Jie Dong
		* Aihua Guo
		* Thomas Graf
		* Qin Wu
		* Chaode Yu
		* Adrian Farrel

	Authors' Addresses	Authors' Addresses

	Nigel Davis (editor)	Nigel Davis (editor)
	Ciena	Ciena
	United Kingdom	United Kingdom
	Email: ndavis@ciena.com	Email: ndavis@ciena.com

	Adrian Farrel (editor)	Adrian Farrel (editor)
	Old Dog Consulting	Old Dog Consulting
	United Kingdom	United Kingdom

End of changes. 63 change blocks.
	180 lines changed or deleted	180 lines changed or added
This html diff was produced by rfcdiff 1.48.