Observability v3

From monitoring to observability – the essential shift in analytics

Cloud-native network functions cannot just be monitored in the traditional way. Disaggregated, containerized microservices mean there’s a ton of new information to consider about how a CNF is performing and how services are delivered. This means we have to move from monitoring – which has served us well – to observability. What do we mean by that?

Disaggregation brings us a host of new sources of information

The thing about disaggregated networks is that, well, they’re just that. As we move to more widespread deployment of cloud-native networks, it’s time to rethink what we mean when we consider network monitoring. We need to move – conceptually and practically – to observability. In this blog, we’ll explain why that’s necessary and what we mean by observability from the Elisa Polystar perspective.

Mainly, we need to make this transition because a lot has changed regarding how networks are built and how data can be collected from the elements involved in service delivery.

With previous approaches to monitoring, we captured (mostly passively) a series of outputs and inputs from a single entity or network function – but even though that approach was also suitable when we transitioned from physical to virtual functions (they were still essentially self-contained), it’s no longer tenable in the cloud-native era.

Containerization brings more than simple virtualization

That’s because not only have we virtualized network functions, but we’ve also disaggregated and decomposed them, thanks to the rise of containerization and microservices. Let’s give an example. Take the PCF, or Policy Control Function. This has evolved from the PCRF – Policy and Charging Rules Function – which in turn evolved from previous AAA solutions. Let’s not get stuck on abbreviations – but, as you know, this is really about the services to which users have access, when, under which scenarios, in what context, and how they are billed.

Pretty important stuff. The point is that, previously, we could only really look at the control interfaces to which this entity connected, as well as any OSS reporting information it provided, whether as a monolithic server or as a virtual NF.

But the5G PCF will likely be built from different microservices, assembled in containers. Now, we can seek to look at the control interfaces, but we can also consider internal interfaces between the microservices that, collectively, comprise the full NF. That’s another level of information – so, we need to be able to look inside the PCF and capture information regarding how these microservices interact and communicate while executing the overall tasks of the PCF.

Enter observability

To do this, we need to consider observability – that is, as Wikipedia puts it, “the ability to collect data about programs’ execution…internal states…and the communication among components”. Put simply, we can’t monitor these internal processes using traditional means – instead, we have to use new techniques to observe them in operation.

Now, when we think about all of the 5G NFs (and new, microservices-based instantiations of other processes), we can see that the scale of the task at hand is huge. We must be able to observe all processes in the network and collect information from all of the (many) microservices that are included in cloud-native network functions (CNFs).

But that’s not all. There’s a lot of other stuff happening too. We have automation platforms (not CNFs themselves, but rather components that may coordinate activities between them), we have applications, compute storage platforms, operating systems, virtualization platforms, and much, much more.

Observability brings end-to-end visibility – for internal and external interfaces

It is observability – not monitoring – that will allow us to make sense of all of the things that are happening across this estate, enabling us to collect data from inter-dependent processes, microservices, CNFs – and everything else. In fact, observability effectively unlocks a newly converged monitoring solution, including end-to-end service visibility, dependency mapping, correlative intelligence, and automatic root cause diagnosis.

What’s more, once we feed all this information into a DataOps platform, we can also combine network, CNF and microservice data with data from other sources – KPIs, CDRs, user metrics and so on (more or less anything that’s relevant), but that’s another story for another article.

For now, we’re merely focused on the concept – and the point that observability is the only way to obtain the data we need from cloudified networks to enable us to run them efficiently and at the scale and velocity required – particularly when we consider that automation is also key to their operation.

What are the four pillars of observability and what do they give me?

So, with that in mind, what are we looking for from an observability platform today? Well, here’s a list:

  • Events: structured logs, e.g., each 3GPP interface signaling transaction, such as from the PCF and a newly containerized PCRF
  • Metrics: which can indicate there is a problem
  • Traces: which identify the source of the problem
  • Logs: provide the forensic detail which reveals the root cause of the problem

Full-stack observability, backed by these pillars enables us to unlock a range of applications that are the foundation of both service assurance (and customer experience management) and network automation. These include:

  • Automated discovery
  • Dependency mapping
  • Topology visualization
  • Correlative intelligence
  • Root cause diagnosis
  • Auto-baselining
  • 3GPP call trace

By aggregating data from this diverse range of sources, we can shift from legacy approaches to monitoring and unlock end-to-end observability, covering both traditional parameters (Diameter signaling), as well as new, essential information (information exchange between microservices in a CNF). That’s what Elisa Polystar gives you – so, to find out more, and how we can help you on your transition to cloud-native networks and on to observability, get in touch.