Monitoring & Observability

CORTEX provides multiple monitoring touchpoints for operational visibility.

Health Checks

Endpoints

Endpoint	Purpose	Returns
`GET /health`	Basic health	Service status
`GET /health/ready`	Readiness probe	Ready for traffic
`GET /health/live`	Liveness probe	Process alive

Response Format

{
  "status": "ok",
  "service": "cortex-core",
  "version": "0.1.0",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "environment": "production"
}

Kubernetes Configuration

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: cortex-core
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8091
            initialDelaySeconds: 15
            periodSeconds: 20
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8091
            initialDelaySeconds: 5
            periodSeconds: 10

Azure Monitor Integration

Application Insights

// src/main.ts
import * as appInsights from 'applicationinsights';

if (process.env.APPLICATIONINSIGHTS_CONNECTION_STRING) {
  appInsights.setup()
    .setAutoDependencyCorrelation(true)
    .setAutoCollectRequests(true)
    .setAutoCollectPerformance(true)
    .setAutoCollectExceptions(true)
    .start();
}

Custom Metrics

import { TelemetryClient } from 'applicationinsights';

const client = new TelemetryClient();

// Track custom metric
client.trackMetric({
  name: 'ActiveSessions',
  value: await sessionService.getActiveCount(),
});

// Track custom event
client.trackEvent({
  name: 'UserRegistration',
  properties: { tenantId: user.tenantId },
});

Logging

Log Levels

Level	Usage
`error`	Errors that need attention
`warn`	Potential issues
`log`	Important events
`debug`	Debugging information
`verbose`	Detailed tracing

Structured Logging

// NestJS Logger
import { Logger } from '@nestjs/common';

const logger = new Logger('UserService');

logger.log('User created', {
  userId: user.id,
  tenantId: user.tenantId,
});

logger.error('Failed to create user', error.stack, {
  email: dto.email,
  tenantId: dto.tenantId,
});

Log Output (JSON)

{
  "timestamp": "2024-01-15T10:30:00.000Z",
  "level": "info",
  "context": "UserService",
  "message": "User created",
  "userId": "550e8400-e29b-41d4-a716-446655440000",
  "tenantId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
}

Metrics

Key Metrics

Metric	Description
`http_requests_total`	Total HTTP requests
`http_request_duration_ms`	Request latency
`active_sessions`	Current active sessions
`failed_login_attempts`	Failed authentications
`database_query_duration_ms`	DB query latency

Prometheus Format

# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/users",status="200"} 1234

# HELP http_request_duration_ms Request duration in milliseconds
# TYPE http_request_duration_ms histogram
http_request_duration_ms_bucket{method="GET",path="/users",le="50"} 100
http_request_duration_ms_bucket{method="GET",path="/users",le="100"} 150

Alerting

Alert Rules

Alert	Condition	Severity
High Error Rate	Error rate > 5% for 5 min	Critical
High Latency	P95 latency > 2s for 5 min	Warning
Service Down	Health check fails for 1 min	Critical
High Memory	Memory > 80% for 10 min	Warning
Failed Logins	> 10 failures in 5 min	Warning

Azure Monitor Alert

{
  "alertRule": {
    "name": "HighErrorRate",
    "condition": {
      "allOf": [
        {
          "metricName": "requests/failed",
          "operator": "GreaterThan",
          "threshold": 5,
          "timeAggregation": "Percentage"
        }
      ]
    },
    "actions": [
      {
        "actionGroupId": "/subscriptions/.../actionGroups/ops-team"
      }
    ]
  }
}

Dashboard

Key Dashboard Panels

Request Volume — Requests per minute by endpoint
Error Rate — Errors as percentage of total requests
Latency Distribution — P50, P95, P99 response times
Active Users — Current logged-in users
Database Performance — Query times and connection pool
Authentication — Login success/failure rates

Grafana Dashboard JSON

{
  "dashboard": {
    "title": "CORTEX Overview",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "{{method}} {{path}}"
          }
        ]
      }
    ]
  }
}

Tracing

Distributed Tracing

// Correlation ID middleware
@Injectable()
export class CorrelationIdMiddleware implements NestMiddleware {
  use(req: Request, res: Response, next: NextFunction) {
    const correlationId = req.headers['x-correlation-id'] || uuidv4();
    req['correlationId'] = correlationId;
    res.setHeader('x-correlation-id', correlationId);
    next();
  }
}

Trace IDs are included in:

HTTP response headers
Log entries
Audit logs
External service calls

Health Checks​

Endpoints​

Response Format​

Kubernetes Configuration​

Azure Monitor Integration​

Application Insights​

Custom Metrics​

Logging​

Log Levels​

Structured Logging​

Log Output (JSON)​

Metrics​

Key Metrics​

Prometheus Format​

Alerting​

Alert Rules​

Azure Monitor Alert​

Dashboard​

Key Dashboard Panels​

Grafana Dashboard JSON​

Tracing​

Distributed Tracing​

Health Checks

Endpoints

Response Format

Kubernetes Configuration

Azure Monitor Integration

Application Insights

Custom Metrics

Logging

Log Levels

Structured Logging

Log Output (JSON)

Metrics

Key Metrics

Prometheus Format

Alerting

Alert Rules

Azure Monitor Alert

Dashboard

Key Dashboard Panels

Grafana Dashboard JSON

Tracing

Distributed Tracing