Skip to main content

Monitoring & Observability

CORTEX provides multiple monitoring touchpoints for operational visibility.

Health Checks

Endpoints

EndpointPurposeReturns
GET /healthBasic healthService status
GET /health/readyReadiness probeReady for traffic
GET /health/liveLiveness probeProcess alive

Response Format

{
"status": "ok",
"service": "cortex-core",
"version": "0.1.0",
"timestamp": "2024-01-15T10:30:00.000Z",
"environment": "production"
}

Kubernetes Configuration

apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: cortex-core
livenessProbe:
httpGet:
path: /health/live
port: 8091
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /health/ready
port: 8091
initialDelaySeconds: 5
periodSeconds: 10

Azure Monitor Integration

Application Insights

// src/main.ts
import * as appInsights from 'applicationinsights';

if (process.env.APPLICATIONINSIGHTS_CONNECTION_STRING) {
appInsights.setup()
.setAutoDependencyCorrelation(true)
.setAutoCollectRequests(true)
.setAutoCollectPerformance(true)
.setAutoCollectExceptions(true)
.start();
}

Custom Metrics

import { TelemetryClient } from 'applicationinsights';

const client = new TelemetryClient();

// Track custom metric
client.trackMetric({
name: 'ActiveSessions',
value: await sessionService.getActiveCount(),
});

// Track custom event
client.trackEvent({
name: 'UserRegistration',
properties: { tenantId: user.tenantId },
});

Logging

Log Levels

LevelUsage
errorErrors that need attention
warnPotential issues
logImportant events
debugDebugging information
verboseDetailed tracing

Structured Logging

// NestJS Logger
import { Logger } from '@nestjs/common';

const logger = new Logger('UserService');

logger.log('User created', {
userId: user.id,
tenantId: user.tenantId,
});

logger.error('Failed to create user', error.stack, {
email: dto.email,
tenantId: dto.tenantId,
});

Log Output (JSON)

{
"timestamp": "2024-01-15T10:30:00.000Z",
"level": "info",
"context": "UserService",
"message": "User created",
"userId": "550e8400-e29b-41d4-a716-446655440000",
"tenantId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
}

Metrics

Key Metrics

MetricDescription
http_requests_totalTotal HTTP requests
http_request_duration_msRequest latency
active_sessionsCurrent active sessions
failed_login_attemptsFailed authentications
database_query_duration_msDB query latency

Prometheus Format

# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/users",status="200"} 1234

# HELP http_request_duration_ms Request duration in milliseconds
# TYPE http_request_duration_ms histogram
http_request_duration_ms_bucket{method="GET",path="/users",le="50"} 100
http_request_duration_ms_bucket{method="GET",path="/users",le="100"} 150

Alerting

Alert Rules

AlertConditionSeverity
High Error RateError rate > 5% for 5 minCritical
High LatencyP95 latency > 2s for 5 minWarning
Service DownHealth check fails for 1 minCritical
High MemoryMemory > 80% for 10 minWarning
Failed Logins> 10 failures in 5 minWarning

Azure Monitor Alert

{
"alertRule": {
"name": "HighErrorRate",
"condition": {
"allOf": [
{
"metricName": "requests/failed",
"operator": "GreaterThan",
"threshold": 5,
"timeAggregation": "Percentage"
}
]
},
"actions": [
{
"actionGroupId": "/subscriptions/.../actionGroups/ops-team"
}
]
}
}

Dashboard

Key Dashboard Panels

  1. Request Volume — Requests per minute by endpoint
  2. Error Rate — Errors as percentage of total requests
  3. Latency Distribution — P50, P95, P99 response times
  4. Active Users — Current logged-in users
  5. Database Performance — Query times and connection pool
  6. Authentication — Login success/failure rates

Grafana Dashboard JSON

{
"dashboard": {
"title": "CORTEX Overview",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(http_requests_total[5m])",
"legendFormat": "{{method}} {{path}}"
}
]
}
]
}
}

Tracing

Distributed Tracing

// Correlation ID middleware
@Injectable()
export class CorrelationIdMiddleware implements NestMiddleware {
use(req: Request, res: Response, next: NextFunction) {
const correlationId = req.headers['x-correlation-id'] || uuidv4();
req['correlationId'] = correlationId;
res.setHeader('x-correlation-id', correlationId);
next();
}
}

Trace IDs are included in:

  • HTTP response headers
  • Log entries
  • Audit logs
  • External service calls