Log Management

Centralized log collection, real-time streaming, and powerful search capabilities for monitoring your Supascale infrastructure and Supabase instances.

Log Management

Supascale's centralized logging system provides comprehensive visibility into your infrastructure operations, instance performance, and system events. This guide covers log collection, analysis, and monitoring capabilities.

Overview

The Supascale logging system offers:

Centralized Collection: Logs from all agents and instances in one place
Real-time Streaming: Live log tailing and monitoring
Powerful Search: Filter and search across all log data
Log Retention: Configurable retention policies
Alert Integration: Log-based alerting and notifications
Export Capabilities: Download and integrate with external tools

Log Sources

Agent Logs

The Supascale agent generates logs for all operations:

Agent Operations:

Agent startup and shutdown
API communication and polling
Command execution and results
Resource monitoring activities
Error conditions and recovery

Command Execution:

Instance deployment progress
Start/stop/restart operations
Configuration changes
Backup operations
System maintenance tasks

Example Agent Log Entry:

{
  "timestamp": "2025-08-02T10:30:00Z",
  "level": "INFO",
  "source": "agent",
  "server_id": "srv_abc123",
  "message": "Command executed successfully",
  "details": {
    "command_id": "cmd_123",
    "command_type": "deploy_instance",
    "duration_ms": 45000,
    "instance_id": "inst_xyz789"
  }
}

Instance Logs

Logs from all Supabase services within instances:

Database Logs (PostgreSQL):

Connection events
Query execution
Error conditions
Performance warnings
Security events

API Logs (PostgREST):

HTTP requests and responses
Authentication events
Query execution times
Error responses
Rate limiting events

Authentication Logs (GoTrue):

User login/logout events
Registration attempts
Password resets
Token generation
Security violations

Storage Logs (Storage API):

File upload/download operations
Access control events
Storage quota warnings
Error conditions

Realtime Logs (Realtime Server):

WebSocket connections
Subscription events
Broadcasting operations
Connection errors

Example Instance Log Entry:

{
  "timestamp": "2025-08-02T10:30:15Z",
  "level": "WARN",
  "source": "database",
  "instance_id": "inst_xyz789",
  "service": "postgresql",
  "message": "Slow query detected",
  "details": {
    "query_duration_ms": 5000,
    "query": "SELECT * FROM large_table WHERE...",
    "client_ip": "192.168.1.100",
    "user": "authenticated_user"
  }
}

System Logs

Infrastructure and system-level events:

Server Events:

System resource alerts
Docker service events
Network connectivity issues
Disk space warnings
Security events

Supascale Platform:

User actions from dashboard
API requests and responses
Billing events
System maintenance
Security audits

Accessing Logs

Dashboard Interface

Navigate to Logs
- Go to Dashboard → Logs
- View real-time log stream
- Use filters to narrow results
Log View Options
Real-time Stream:
- Live updating log entries
- Auto-scroll to newest entries
- Pause/resume streaming
- Customizable refresh intervals
Historical Search:
- Search through archived logs
- Date range filtering
- Advanced search capabilities
- Export filtered results
Log Entry Details
- Click any log entry for full details
- View structured data and metadata
- Copy log entries or specific fields
- Link to related events

Filtering and Search

Quick Filters:

# Filter by log level
level:ERROR

# Filter by source
source:database

# Filter by instance
instance_id:inst_xyz789

# Filter by server
server_id:srv_abc123

# Filter by time range
timestamp:[2025-08-02T10:00:00Z TO 2025-08-02T11:00:00Z]

Advanced Search Queries:

# Combine multiple filters
level:ERROR AND source:database AND instance_id:inst_xyz789

# Text search in messages
message:"connection failed"

# Search in structured data
details.query_duration_ms:>5000

# Wildcard searches
message:*timeout* OR message:*connection*

# Regular expressions
message:/error.*connection.*database/i

Saved Searches:

saved_searches:
  critical_errors:
    query: "level:ERROR OR level:CRITICAL"
    description: "All critical errors across infrastructure"
  
  slow_queries:
    query: "source:database AND details.query_duration_ms:>1000"
    description: "Database queries taking longer than 1 second"
  
  authentication_issues:
    query: "source:auth AND (level:ERROR OR level:WARN)"
    description: "Authentication service issues"

Log Analysis

Performance Analysis

Query Performance Tracking:

# Find slow database queries
source:database AND details.query_duration_ms:>5000

# API response time analysis
source:api AND details.response_time_ms:>2000

# Authentication performance
source:auth AND details.duration_ms:>1000

Resource Usage Patterns:

# Memory warnings
message:*memory* AND level:WARN

# Disk space issues
message:*disk* AND (level:WARN OR level:ERROR)

# Connection pool exhaustion
message:*connection* AND message:*pool*

Error Analysis

Error Pattern Detection:

# Database connection errors
source:database AND message:*connection* AND level:ERROR

# API errors by endpoint
source:api AND level:ERROR AND details.endpoint:"/auth/v1/token"

# Instance deployment failures
source:agent AND details.command_type:"deploy_instance" AND level:ERROR

Error Correlation:

error_correlation:
  # Group related errors
  - timespan: "5m"
    conditions:
      - "source:database AND level:ERROR"
      - "source:api AND level:ERROR"
    description: "Database errors affecting API"
  
  # Cascade failure detection
  - timespan: "10m"
    conditions:
      - "source:agent AND message:*docker*"
      - "instance_id:* AND level:ERROR"
    description: "Docker issues causing instance failures"

Security Analysis

Security Event Monitoring:

# Failed authentication attempts
source:auth AND message:*failed* AND details.attempt_count:>3

# Suspicious API access
source:api AND (details.status_code:401 OR details.status_code:403)

# Unusual database access patterns
source:database AND message:*unauthorized*

Audit Trail:

# User management actions
source:auth AND (message:*user_created* OR message:*user_deleted*)

# Configuration changes
source:agent AND details.command_type:*config*

# Data access patterns
source:database AND details.query:*DELETE* OR details.query:*UPDATE*

Log-based Alerting

Alert Configuration

Create alerts based on log patterns:

Error Rate Alerts:

alerts:
  high_error_rate:
    query: "level:ERROR"
    condition:
      count: ">10"
      timespan: "5m"
    severity: "warning"
    message: "High error rate detected: {{count}} errors in 5 minutes"

  critical_database_errors:
    query: "source:database AND level:ERROR"
    condition:
      count: ">5"
      timespan: "1m"
    severity: "critical"
    message: "Critical database errors detected"

Performance Alerts:

alerts:
  slow_query_alert:
    query: "source:database AND details.query_duration_ms:>10000"
    condition:
      count: ">3"
      timespan: "10m"
    severity: "warning"
    message: "Multiple slow queries detected"

  api_response_time:
    query: "source:api AND details.response_time_ms:>5000"
    condition:
      count: ">10"
      timespan: "5m"
    severity: "warning"
    message: "API response time degradation"

Security Alerts:

alerts:
  brute_force_detection:
    query: "source:auth AND message:*failed_login*"
    condition:
      count: ">20"
      timespan: "10m"
      group_by: "details.client_ip"
    severity: "critical"
    message: "Potential brute force attack from {{details.client_ip}}"

  unusual_database_activity:
    query: "source:database AND (details.query:*DROP* OR details.query:*TRUNCATE*)"
    condition:
      count: ">1"
      timespan: "1m"
    severity: "critical"
    message: "Potentially dangerous database operations detected"

Alert Notifications

Notification Channels:

notifications:
  email:
    enabled: true
    recipients:
      - "ops-team@company.com"
      - "security@company.com"
    
    alert_types:
      - "critical"
      - "warning"
    
    template: |
      Alert: {{alert_name}}
      Severity: {{severity}}
      
      Query: {{query}}
      Count: {{count}} events in {{timespan}}
      
      Recent events:
      {{#each recent_events}}
      - {{timestamp}}: {{message}}
      {{/each}}

  slack:
    enabled: true
    webhook_url: "https://hooks.slack.com/services/..."
    channel: "#alerts"
    
    message_format: |
      🚨 *{{alert_name}}*
      Severity: {{severity}}
      
      {{count}} events matching: `{{query}}`
      
      <{{dashboard_url}}|View in Dashboard>

  webhook:
    enabled: true
    url: "https://your-api.com/alerts"
    headers:
      Authorization: "Bearer token"
    
    payload: |
      {
        "alert": "{{alert_name}}",
        "severity": "{{severity}}",
        "query": "{{query}}",
        "count": {{count}},
        "timespan": "{{timespan}}",
        "events": {{recent_events}}
      }

Log Retention and Management

Retention Policies

Configure how long logs are stored:

Retention Configuration:

retention:
  # Agent logs
  agent:
    duration: "30d"
    compression: true
    archive_location: "s3://logs-archive/agent/"
  
  # Instance logs by service
  instance:
    database: "90d"      # Keep DB logs longer for analysis
    api: "30d"           # API logs for debugging
    auth: "180d"         # Auth logs for security compliance
    storage: "30d"       # Storage operation logs
    realtime: "7d"       # Realtime logs (high volume)
  
  # System logs
  system:
    duration: "60d"
    critical_events: "1y"  # Keep critical events longer

Automatic Cleanup:

cleanup:
  enabled: true
  schedule: "0 2 * * *"  # Daily at 2 AM
  
  policies:
    # Compress old logs
    - age: "7d"
      action: "compress"
      compression: "gzip"
    
    # Archive to cold storage
    - age: "30d"
      action: "archive"
      destination: "s3://logs-archive/"
    
    # Delete very old logs
    - age: "365d"
      action: "delete"
      confirm: true

Log Export

Bulk Export:

# Export logs for specific time range
supascale logs export \
  --start "2025-08-01T00:00:00Z" \
  --end "2025-08-02T00:00:00Z" \
  --format "json" \
  --output "/tmp/logs-export.json.gz"

# Export specific log types
supascale logs export \
  --source "database" \
  --level "ERROR" \
  --instance "inst_xyz789" \
  --format "csv"

Streaming Export:

# Stream logs to external system
supascale logs stream \
  --query "level:ERROR" \
  --destination "syslog://logs.company.com:514"

# Real-time export to file
supascale logs stream \
  --follow \
  --output "/var/log/supascale.log"

Integration with External Tools

Log Shipping

Elasticsearch Integration:

integrations:
  elasticsearch:
    enabled: true
    hosts:
      - "https://elasticsearch.company.com:9200"
    
    authentication:
      type: "api_key"
      api_key: "base64-encoded-key"
    
    index_template: "supascale-logs-{date}"
    mapping:
      timestamp: "@timestamp"
      level: "log.level"
      source: "service.name"
      message: "message"

Splunk Integration:

integrations:
  splunk:
    enabled: true
    hec_endpoint: "https://splunk.company.com:8088/services/collector"
    hec_token: "your-hec-token"
    
    source_type: "supascale"
    index: "infrastructure"
    
    metadata:
      environment: "production"
      datacenter: "us-west-2"

Datadog Integration:

integrations:
  datadog:
    enabled: true
    api_key: "your-datadog-api-key"
    site: "datadoghq.com"
    
    tags:
      - "environment:production"
      - "service:supascale"
    
    log_processing:
      multiline: true
      pipeline: "supascale-logs"

Syslog Integration

Syslog Configuration:

syslog:
  enabled: true
  facility: "local0"
  severity_mapping:
    DEBUG: "debug"
    INFO: "info"
    WARN: "warning"
    ERROR: "err"
    CRITICAL: "crit"
  
  destinations:
    - protocol: "tcp"
      host: "syslog.company.com"
      port: 514
      format: "rfc5424"
    
    - protocol: "udp"
      host: "backup-syslog.company.com"
      port: 514
      format: "rfc3164"

Troubleshooting with Logs

Common Debugging Scenarios

Instance Deployment Issues:

# Find deployment errors
source:agent AND details.command_type:"deploy_instance" AND level:ERROR

# Check specific instance deployment
source:agent AND details.instance_id:"inst_xyz789" AND details.command_type:"deploy_instance"

# Docker-related deployment issues
source:agent AND message:*docker* AND level:ERROR

Performance Issues:

# Database performance problems
source:database AND (details.query_duration_ms:>5000 OR message:*slow*)

# Memory pressure indicators
message:*memory* AND (level:WARN OR level:ERROR)

# Connection pool issues
message:*connection* AND message:*pool* AND level:ERROR

Authentication Problems:

# Login failures
source:auth AND message:*login* AND level:ERROR

# Token validation issues
source:auth AND message:*token* AND (level:WARN OR level:ERROR)

# JWT-related problems
source:auth AND message:*jwt* AND level:ERROR

Log Analysis Best Practices

Structured Logging:

Use consistent log formats across services
Include relevant context in log entries
Use appropriate log levels for different events
Include correlation IDs for request tracing

Search Optimization:

Use specific queries instead of broad searches
Combine multiple filters for precise results
Use time ranges to limit search scope
Save frequently used search patterns

Performance Monitoring:

Monitor log ingestion rates
Track search performance
Set up alerts for log volume spikes
Regular cleanup of old logs

API Access

Logs API

Programmatic access to log data:

Search Logs:

GET /api/v1/logs/search

{
  "query": "level:ERROR AND source:database",
  "start_time": "2025-08-02T10:00:00Z",
  "end_time": "2025-08-02T11:00:00Z",
  "limit": 100,
  "sort": "-timestamp"
}

Stream Logs:

GET /api/v1/logs/stream

{
  "query": "instance_id:inst_xyz789",
  "follow": true,
  "buffer_size": 1000
}

Export Logs:

POST /api/v1/logs/export

{
  "query": "source:agent",
  "start_time": "2025-08-01T00:00:00Z",
  "end_time": "2025-08-02T00:00:00Z",
  "format": "json",
  "compression": "gzip"
}

Next: Explore Advanced Use Cases for complex deployment scenarios.

Log Management

Centralized log collection, real-time streaming, and powerful search capabilities for monitoring your Supascale infrastructure and Supabase instances.

Log Management

Overview

The Supascale logging system offers:

Centralized Collection: Logs from all agents and instances in one place
Real-time Streaming: Live log tailing and monitoring
Powerful Search: Filter and search across all log data
Log Retention: Configurable retention policies
Alert Integration: Log-based alerting and notifications
Export Capabilities: Download and integrate with external tools

Log Sources

Agent Logs

The Supascale agent generates logs for all operations:

Agent Operations:

Agent startup and shutdown
API communication and polling
Command execution and results
Resource monitoring activities
Error conditions and recovery

Command Execution:

Instance deployment progress
Start/stop/restart operations
Configuration changes
Backup operations
System maintenance tasks

Example Agent Log Entry:

{
  "timestamp": "2025-08-02T10:30:00Z",
  "level": "INFO",
  "source": "agent",
  "server_id": "srv_abc123",
  "message": "Command executed successfully",
  "details": {
    "command_id": "cmd_123",
    "command_type": "deploy_instance",
    "duration_ms": 45000,
    "instance_id": "inst_xyz789"
  }
}

Instance Logs

Logs from all Supabase services within instances:

Database Logs (PostgreSQL):

Connection events
Query execution
Error conditions
Performance warnings
Security events

API Logs (PostgREST):

HTTP requests and responses
Authentication events
Query execution times
Error responses
Rate limiting events

Authentication Logs (GoTrue):

User login/logout events
Registration attempts
Password resets
Token generation
Security violations

Storage Logs (Storage API):

File upload/download operations
Access control events
Storage quota warnings
Error conditions

Realtime Logs (Realtime Server):

WebSocket connections
Subscription events
Broadcasting operations
Connection errors

Example Instance Log Entry:

{
  "timestamp": "2025-08-02T10:30:15Z",
  "level": "WARN",
  "source": "database",
  "instance_id": "inst_xyz789",
  "service": "postgresql",
  "message": "Slow query detected",
  "details": {
    "query_duration_ms": 5000,
    "query": "SELECT * FROM large_table WHERE...",
    "client_ip": "192.168.1.100",
    "user": "authenticated_user"
  }
}

System Logs

Infrastructure and system-level events:

Server Events:

System resource alerts
Docker service events
Network connectivity issues
Disk space warnings
Security events

Supascale Platform:

User actions from dashboard
API requests and responses
Billing events
System maintenance
Security audits

Accessing Logs

Dashboard Interface

Navigate to Logs
- Go to Dashboard → Logs
- View real-time log stream
- Use filters to narrow results
Log View Options
Real-time Stream:
- Live updating log entries
- Auto-scroll to newest entries
- Pause/resume streaming
- Customizable refresh intervals
Historical Search:
- Search through archived logs
- Date range filtering
- Advanced search capabilities
- Export filtered results
Log Entry Details
- Click any log entry for full details
- View structured data and metadata
- Copy log entries or specific fields
- Link to related events

Filtering and Search

Quick Filters:

# Filter by log level
level:ERROR

# Filter by source
source:database

# Filter by instance
instance_id:inst_xyz789

# Filter by server
server_id:srv_abc123

# Filter by time range
timestamp:[2025-08-02T10:00:00Z TO 2025-08-02T11:00:00Z]

Advanced Search Queries:

# Combine multiple filters
level:ERROR AND source:database AND instance_id:inst_xyz789

# Text search in messages
message:"connection failed"

# Search in structured data
details.query_duration_ms:>5000

# Wildcard searches
message:*timeout* OR message:*connection*

# Regular expressions
message:/error.*connection.*database/i

Saved Searches:

saved_searches:
  critical_errors:
    query: "level:ERROR OR level:CRITICAL"
    description: "All critical errors across infrastructure"
  
  slow_queries:
    query: "source:database AND details.query_duration_ms:>1000"
    description: "Database queries taking longer than 1 second"
  
  authentication_issues:
    query: "source:auth AND (level:ERROR OR level:WARN)"
    description: "Authentication service issues"

Log Analysis

Performance Analysis

Query Performance Tracking:

# Find slow database queries
source:database AND details.query_duration_ms:>5000

# API response time analysis
source:api AND details.response_time_ms:>2000

# Authentication performance
source:auth AND details.duration_ms:>1000

Resource Usage Patterns:

# Memory warnings
message:*memory* AND level:WARN

# Disk space issues
message:*disk* AND (level:WARN OR level:ERROR)

# Connection pool exhaustion
message:*connection* AND message:*pool*

Error Analysis

Error Pattern Detection:

# Database connection errors
source:database AND message:*connection* AND level:ERROR

# API errors by endpoint
source:api AND level:ERROR AND details.endpoint:"/auth/v1/token"

# Instance deployment failures
source:agent AND details.command_type:"deploy_instance" AND level:ERROR

Error Correlation:

error_correlation:
  # Group related errors
  - timespan: "5m"
    conditions:
      - "source:database AND level:ERROR"
      - "source:api AND level:ERROR"
    description: "Database errors affecting API"
  
  # Cascade failure detection
  - timespan: "10m"
    conditions:
      - "source:agent AND message:*docker*"
      - "instance_id:* AND level:ERROR"
    description: "Docker issues causing instance failures"

Security Analysis

Security Event Monitoring:

# Failed authentication attempts
source:auth AND message:*failed* AND details.attempt_count:>3

# Suspicious API access
source:api AND (details.status_code:401 OR details.status_code:403)

# Unusual database access patterns
source:database AND message:*unauthorized*

Audit Trail:

# User management actions
source:auth AND (message:*user_created* OR message:*user_deleted*)

# Configuration changes
source:agent AND details.command_type:*config*

# Data access patterns
source:database AND details.query:*DELETE* OR details.query:*UPDATE*

Log-based Alerting

Alert Configuration

Create alerts based on log patterns:

Error Rate Alerts:

alerts:
  high_error_rate:
    query: "level:ERROR"
    condition:
      count: ">10"
      timespan: "5m"
    severity: "warning"
    message: "High error rate detected: {{count}} errors in 5 minutes"

  critical_database_errors:
    query: "source:database AND level:ERROR"
    condition:
      count: ">5"
      timespan: "1m"
    severity: "critical"
    message: "Critical database errors detected"

Performance Alerts:

alerts:
  slow_query_alert:
    query: "source:database AND details.query_duration_ms:>10000"
    condition:
      count: ">3"
      timespan: "10m"
    severity: "warning"
    message: "Multiple slow queries detected"

  api_response_time:
    query: "source:api AND details.response_time_ms:>5000"
    condition:
      count: ">10"
      timespan: "5m"
    severity: "warning"
    message: "API response time degradation"

Security Alerts:

alerts:
  brute_force_detection:
    query: "source:auth AND message:*failed_login*"
    condition:
      count: ">20"
      timespan: "10m"
      group_by: "details.client_ip"
    severity: "critical"
    message: "Potential brute force attack from {{details.client_ip}}"

  unusual_database_activity:
    query: "source:database AND (details.query:*DROP* OR details.query:*TRUNCATE*)"
    condition:
      count: ">1"
      timespan: "1m"
    severity: "critical"
    message: "Potentially dangerous database operations detected"

Alert Notifications

Notification Channels:

notifications:
  email:
    enabled: true
    recipients:
      - "ops-team@company.com"
      - "security@company.com"
    
    alert_types:
      - "critical"
      - "warning"
    
    template: |
      Alert: {{alert_name}}
      Severity: {{severity}}
      
      Query: {{query}}
      Count: {{count}} events in {{timespan}}
      
      Recent events:
      {{#each recent_events}}
      - {{timestamp}}: {{message}}
      {{/each}}

  slack:
    enabled: true
    webhook_url: "https://hooks.slack.com/services/..."
    channel: "#alerts"
    
    message_format: |
      🚨 *{{alert_name}}*
      Severity: {{severity}}
      
      {{count}} events matching: `{{query}}`
      
      <{{dashboard_url}}|View in Dashboard>

  webhook:
    enabled: true
    url: "https://your-api.com/alerts"
    headers:
      Authorization: "Bearer token"
    
    payload: |
      {
        "alert": "{{alert_name}}",
        "severity": "{{severity}}",
        "query": "{{query}}",
        "count": {{count}},
        "timespan": "{{timespan}}",
        "events": {{recent_events}}
      }

Log Retention and Management

Retention Policies

Configure how long logs are stored:

Retention Configuration:

retention:
  # Agent logs
  agent:
    duration: "30d"
    compression: true
    archive_location: "s3://logs-archive/agent/"
  
  # Instance logs by service
  instance:
    database: "90d"      # Keep DB logs longer for analysis
    api: "30d"           # API logs for debugging
    auth: "180d"         # Auth logs for security compliance
    storage: "30d"       # Storage operation logs
    realtime: "7d"       # Realtime logs (high volume)
  
  # System logs
  system:
    duration: "60d"
    critical_events: "1y"  # Keep critical events longer

Automatic Cleanup:

cleanup:
  enabled: true
  schedule: "0 2 * * *"  # Daily at 2 AM
  
  policies:
    # Compress old logs
    - age: "7d"
      action: "compress"
      compression: "gzip"
    
    # Archive to cold storage
    - age: "30d"
      action: "archive"
      destination: "s3://logs-archive/"
    
    # Delete very old logs
    - age: "365d"
      action: "delete"
      confirm: true

Log Export

Bulk Export:

# Export logs for specific time range
supascale logs export \
  --start "2025-08-01T00:00:00Z" \
  --end "2025-08-02T00:00:00Z" \
  --format "json" \
  --output "/tmp/logs-export.json.gz"

# Export specific log types
supascale logs export \
  --source "database" \
  --level "ERROR" \
  --instance "inst_xyz789" \
  --format "csv"

Streaming Export:

# Stream logs to external system
supascale logs stream \
  --query "level:ERROR" \
  --destination "syslog://logs.company.com:514"

# Real-time export to file
supascale logs stream \
  --follow \
  --output "/var/log/supascale.log"

Integration with External Tools

Log Shipping

Elasticsearch Integration:

integrations:
  elasticsearch:
    enabled: true
    hosts:
      - "https://elasticsearch.company.com:9200"
    
    authentication:
      type: "api_key"
      api_key: "base64-encoded-key"
    
    index_template: "supascale-logs-{date}"
    mapping:
      timestamp: "@timestamp"
      level: "log.level"
      source: "service.name"
      message: "message"

Splunk Integration:

integrations:
  splunk:
    enabled: true
    hec_endpoint: "https://splunk.company.com:8088/services/collector"
    hec_token: "your-hec-token"
    
    source_type: "supascale"
    index: "infrastructure"
    
    metadata:
      environment: "production"
      datacenter: "us-west-2"

Datadog Integration:

integrations:
  datadog:
    enabled: true
    api_key: "your-datadog-api-key"
    site: "datadoghq.com"
    
    tags:
      - "environment:production"
      - "service:supascale"
    
    log_processing:
      multiline: true
      pipeline: "supascale-logs"

Syslog Integration

Syslog Configuration:

syslog:
  enabled: true
  facility: "local0"
  severity_mapping:
    DEBUG: "debug"
    INFO: "info"
    WARN: "warning"
    ERROR: "err"
    CRITICAL: "crit"
  
  destinations:
    - protocol: "tcp"
      host: "syslog.company.com"
      port: 514
      format: "rfc5424"
    
    - protocol: "udp"
      host: "backup-syslog.company.com"
      port: 514
      format: "rfc3164"

Troubleshooting with Logs

Common Debugging Scenarios

Instance Deployment Issues:

# Find deployment errors
source:agent AND details.command_type:"deploy_instance" AND level:ERROR

# Check specific instance deployment
source:agent AND details.instance_id:"inst_xyz789" AND details.command_type:"deploy_instance"

# Docker-related deployment issues
source:agent AND message:*docker* AND level:ERROR

Performance Issues:

# Database performance problems
source:database AND (details.query_duration_ms:>5000 OR message:*slow*)

# Memory pressure indicators
message:*memory* AND (level:WARN OR level:ERROR)

# Connection pool issues
message:*connection* AND message:*pool* AND level:ERROR

Authentication Problems:

# Login failures
source:auth AND message:*login* AND level:ERROR

# Token validation issues
source:auth AND message:*token* AND (level:WARN OR level:ERROR)

# JWT-related problems
source:auth AND message:*jwt* AND level:ERROR

Log Analysis Best Practices

Structured Logging:

Use consistent log formats across services
Include relevant context in log entries
Use appropriate log levels for different events
Include correlation IDs for request tracing

Search Optimization:

Use specific queries instead of broad searches
Combine multiple filters for precise results
Use time ranges to limit search scope
Save frequently used search patterns

Performance Monitoring:

Monitor log ingestion rates
Track search performance
Set up alerts for log volume spikes
Regular cleanup of old logs

API Access

Logs API

Programmatic access to log data:

Search Logs:

GET /api/v1/logs/search

{
  "query": "level:ERROR AND source:database",
  "start_time": "2025-08-02T10:00:00Z",
  "end_time": "2025-08-02T11:00:00Z",
  "limit": 100,
  "sort": "-timestamp"
}

Stream Logs:

GET /api/v1/logs/stream

{
  "query": "instance_id:inst_xyz789",
  "follow": true,
  "buffer_size": 1000
}

Export Logs:

POST /api/v1/logs/export

{
  "query": "source:agent",
  "start_time": "2025-08-01T00:00:00Z",
  "end_time": "2025-08-02T00:00:00Z",
  "format": "json",
  "compression": "gzip"
}

Next: Explore Advanced Use Cases for complex deployment scenarios.