Monitoring & Observability
CORTEX provides multiple monitoring touchpoints for operational visibility.
Health Checks
Endpoints
| Endpoint | Purpose | Returns |
|---|---|---|
GET /health | Basic health | Service status |
GET /health/ready | Readiness probe | Ready for traffic |
GET /health/live | Liveness probe | Process alive |
Response Format
{
"status": "ok",
"service": "cortex-core",
"version": "0.1.0",
"timestamp": "2024-01-15T10:30:00.000Z",
"environment": "production"
}
Kubernetes Configuration
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: cortex-core
livenessProbe:
httpGet:
path: /health/live
port: 8091
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /health/ready
port: 8091
initialDelaySeconds: 5
periodSeconds: 10
Azure Monitor Integration
Application Insights
// src/main.ts
import * as appInsights from 'applicationinsights';
if (process.env.APPLICATIONINSIGHTS_CONNECTION_STRING) {
appInsights.setup()
.setAutoDependencyCorrelation(true)
.setAutoCollectRequests(true)
.setAutoCollectPerformance(true)
.setAutoCollectExceptions(true)
.start();
}
Custom Metrics
import { TelemetryClient } from 'applicationinsights';
const client = new TelemetryClient();
// Track custom metric
client.trackMetric({
name: 'ActiveSessions',
value: await sessionService.getActiveCount(),
});
// Track custom event
client.trackEvent({
name: 'UserRegistration',
properties: { tenantId: user.tenantId },
});
Logging
Log Levels
| Level | Usage |
|---|---|
error | Errors that need attention |
warn | Potential issues |
log | Important events |
debug | Debugging information |
verbose | Detailed tracing |
Structured Logging
// NestJS Logger
import { Logger } from '@nestjs/common';
const logger = new Logger('UserService');
logger.log('User created', {
userId: user.id,
tenantId: user.tenantId,
});
logger.error('Failed to create user', error.stack, {
email: dto.email,
tenantId: dto.tenantId,
});
Log Output (JSON)
{
"timestamp": "2024-01-15T10:30:00.000Z",
"level": "info",
"context": "UserService",
"message": "User created",
"userId": "550e8400-e29b-41d4-a716-446655440000",
"tenantId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
}
Metrics
Key Metrics
| Metric | Description |
|---|---|
http_requests_total | Total HTTP requests |
http_request_duration_ms | Request latency |
active_sessions | Current active sessions |
failed_login_attempts | Failed authentications |
database_query_duration_ms | DB query latency |
Prometheus Format
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/users",status="200"} 1234
# HELP http_request_duration_ms Request duration in milliseconds
# TYPE http_request_duration_ms histogram
http_request_duration_ms_bucket{method="GET",path="/users",le="50"} 100
http_request_duration_ms_bucket{method="GET",path="/users",le="100"} 150
Alerting
Alert Rules
| Alert | Condition | Severity |
|---|---|---|
| High Error Rate | Error rate > 5% for 5 min | Critical |
| High Latency | P95 latency > 2s for 5 min | Warning |
| Service Down | Health check fails for 1 min | Critical |
| High Memory | Memory > 80% for 10 min | Warning |
| Failed Logins | > 10 failures in 5 min | Warning |
Azure Monitor Alert
{
"alertRule": {
"name": "HighErrorRate",
"condition": {
"allOf": [
{
"metricName": "requests/failed",
"operator": "GreaterThan",
"threshold": 5,
"timeAggregation": "Percentage"
}
]
},
"actions": [
{
"actionGroupId": "/subscriptions/.../actionGroups/ops-team"
}
]
}
}
Dashboard
Key Dashboard Panels
- Request Volume — Requests per minute by endpoint
- Error Rate — Errors as percentage of total requests
- Latency Distribution — P50, P95, P99 response times
- Active Users — Current logged-in users
- Database Performance — Query times and connection pool
- Authentication — Login success/failure rates
Grafana Dashboard JSON
{
"dashboard": {
"title": "CORTEX Overview",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(http_requests_total[5m])",
"legendFormat": "{{method}} {{path}}"
}
]
}
]
}
}
Tracing
Distributed Tracing
// Correlation ID middleware
@Injectable()
export class CorrelationIdMiddleware implements NestMiddleware {
use(req: Request, res: Response, next: NextFunction) {
const correlationId = req.headers['x-correlation-id'] || uuidv4();
req['correlationId'] = correlationId;
res.setHeader('x-correlation-id', correlationId);
next();
}
}
Trace IDs are included in:
- HTTP response headers
- Log entries
- Audit logs
- External service calls