Sunday, January 31, 2021

Monitoring Microservices

  • Monitoring is an umbrella term that is used to talk about various aspects, such as general health checks of services, latency, logging, resource usage, and checking the well-being of the services and applications.
  • Profiling is about observing delays and understanding how much time each service is taking. It will help us to understand which services are taking time and pinpoint the problem areas.
  • Tracing is more about tracking the flow of control when a request is fired, more or less similar to profiling but with some different details. Distributed tracing extends the concept to a distributed system that has multiple Microservices deployed independently. For example, when the user hits a URL, which API is getting executed, which might call another service, and so on. We would like to know how the flow is moving and about the health of each service. Is there a service that is not responding or is slow? Distributed tracing will help us with this.
  • Logging can be anything; we just log all the events with parameter details. We can also log critical areas of the service or application, which can later help us understand what happened behind the scenes. Log monitoring can be done manually, or be automated through the use of log monitoring tools. 
  • Metrics are another important way to look at the health of your system. These can be produced using logs or tracing data that show how much time various services are taking, to see whether something needs special attention. Different types of metrics can be generated on an as per-need basis and provide information about the general health of the system at a glance.
  • Health checks are automated scripts or tools that keep tabs on the health of the services. This includes the health of hardware infrastructure as well as the availability of different services.
  • Alerting is the system that helps trigger an action when an unwanted or error condition is observed in the system. Email or messaging alerts can be sent based on need, and an escalation policy can be set up as per the system's requirements.

No comments: