傅洋

Retire-Cluster REST API

The Retire-Cluster REST API provides comprehensive management and monitoring capabilities for your distributed computing cluster through HTTP endpoints.

Quick Start

Installation

Install with API support:

pip install retire-cluster[api]

Starting the API Server

# Start with default settings
retire-cluster-api

# Start with custom configuration
retire-cluster-api --host 0.0.0.0 --port 8081 --auth --api-key your-secret-key

# Connect to specific cluster node
retire-cluster-api --cluster-host 192.168.1.100 --cluster-port 8080

Basic Usage

# Check API health
curl http://localhost:8081/health

# Get cluster status
curl http://localhost:8081/api/v1/cluster/status

# List devices
curl http://localhost:8081/api/v1/devices

# Submit a task
curl -X POST http://localhost:8081/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{"task_type": "echo", "payload": {"message": "Hello World"}}'

API Overview

Base URL

http://localhost:8081/api/v1

Authentication

API supports optional API key authentication:

# Include API key in header
curl -H "X-API-Key: your-secret-key" http://localhost:8081/api/v1/cluster/config

# Or use Authorization header
curl -H "Authorization: Bearer your-secret-key" http://localhost:8081/api/v1/cluster/config

Response Format

All API responses follow a consistent format:

{
  "status": "success|error",
  "data": {},
  "message": "Optional message",
  "timestamp": "2023-01-01T12:00:00Z",
  "request_id": "uuid"
}

Cluster Management

Get Cluster Status

GET /api/v1/cluster/status

Returns comprehensive cluster statistics including device count, health percentage, and resource totals.

Response:

{
  "status": "success",
  "data": {
    "cluster_stats": {
      "total_devices": 5,
      "online_devices": 4,
      "offline_devices": 1,
      "health_percentage": 80.0,
      "total_resources": {
        "cpu_cores": 32,
        "memory_gb": 128,
        "storage_gb": 2000
      },
      "by_role": {
        "compute": 2,
        "mobile": 2,
        "storage": 1
      },
      "by_platform": {
        "linux": 2,
        "android": 2,
        "windows": 1
      }
    },
    "server_info": {
      "version": "1.0.0",
      "uptime": "2d 4h 30m",
      "host": "0.0.0.0",
      "port": 8080
    }
  }
}

Health Check

GET /health

Simple health check endpoint for monitoring.

Response:

{
  "status": "healthy",
  "timestamp": "2023-01-01T12:00:00Z",
  "components": {
    "api": "healthy",
    "cluster_server": "healthy",
    "task_scheduler": "healthy"
  }
}

Get Cluster Metrics

GET /api/v1/cluster/metrics

Detailed performance and utilization metrics.

Get Configuration

GET /api/v1/cluster/config

Requires Authentication

Returns cluster configuration settings.

Device Management

List Devices

GET /api/v1/devices?page=1&page_size=20&status=online&role=compute

Query Parameters:

Response:

{
  "status": "success",
  "data": [
    {
      "device_id": "laptop-001",
      "role": "compute",
      "platform": "linux",
      "status": "online",
      "ip_address": "192.168.1.101",
      "last_heartbeat": "2023-01-01T12:00:00Z",
      "uptime": "2h 30m",
      "capabilities": {
        "cpu_count": 8,
        "memory_total_gb": 16,
        "storage_total_gb": 500,
        "has_gpu": true
      },
      "tags": ["development", "gpu-capable"]
    }
  ],
  "pagination": {
    "page": 1,
    "page_size": 20,
    "total_items": 5,
    "total_pages": 1,
    "has_next": false,
    "has_previous": false
  }
}

Get Device Details

GET /api/v1/devices/{device_id}

Returns detailed information about a specific device.

Get Device Status

GET /api/v1/devices/{device_id}/status

Get current status and health of a device.

Ping Device

POST /api/v1/devices/{device_id}/ping

Requires Authentication

Test connectivity to a specific device.

Remove Device

DELETE /api/v1/devices/{device_id}

Requires Authentication

Remove a device from the cluster.

Device Summary

GET /api/v1/devices/summary

Get summary statistics of all devices.

Task Management

Submit Task

POST /api/v1/tasks
Content-Type: application/json

Request Body:

{
  "task_type": "echo",
  "payload": {
    "message": "Hello World"
  },
  "priority": "normal",
  "requirements": {
    "min_cpu_cores": 2,
    "min_memory_gb": 4,
    "required_platform": "linux",
    "timeout_seconds": 300
  },
  "metadata": {
    "created_by": "api_user"
  }
}

Task Types:

Priority Levels:

Requirements:

Response:

{
  "status": "success",
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "task": {
      "task_id": "550e8400-e29b-41d4-a716-446655440000",
      "task_type": "echo",
      "status": "queued",
      "priority": "normal",
      "created_at": "2023-01-01T12:00:00Z"
    }
  }
}

List Tasks

GET /api/v1/tasks?status=running&page=1&page_size=20

Query Parameters:

Task Statuses:

Get Task Details

GET /api/v1/tasks/{task_id}

Returns complete task information including payload, requirements, and results.

Get Task Status

GET /api/v1/tasks/{task_id}/status

Get current status and execution progress.

Get Task Result

GET /api/v1/tasks/{task_id}/result

Get execution result for completed tasks.

Response:

{
  "status": "success",
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "success",
    "result_data": {
      "echo": {"message": "Hello World"}
    },
    "execution_time_seconds": 0.05,
    "worker_device_id": "laptop-001",
    "started_at": "2023-01-01T12:00:00Z",
    "completed_at": "2023-01-01T12:00:00Z"
  }
}

Cancel Task

POST /api/v1/tasks/{task_id}/cancel

Requires Authentication

Cancel a running or queued task.

Retry Task

POST /api/v1/tasks/{task_id}/retry

Requires Authentication

Retry a failed task.

Task Statistics

GET /api/v1/tasks/statistics

Get task execution statistics and performance metrics.

Supported Task Types

GET /api/v1/tasks/types

Get list of supported task types across all devices.

Configuration Management

Get Configuration

GET /api/v1/config

Requires Authentication

Get complete system configuration.

Get Server Config

GET /api/v1/config/server

Requires Authentication

Update Server Config

PUT /api/v1/config/server
Content-Type: application/json

Requires Authentication

{
  "max_connections": 100
}

Get Heartbeat Config

GET /api/v1/config/heartbeat

Requires Authentication

Update Heartbeat Config

PUT /api/v1/config/heartbeat
Content-Type: application/json

Requires Authentication

{
  "interval": 60,
  "timeout": 300,
  "max_missed": 3
}

Reset Configuration

POST /api/v1/config/reset

Requires Authentication

Reset all configuration to defaults.

Error Handling

Error Response Format

{
  "status": "error",
  "message": "Error description",
  "error_code": "ERROR_CODE",
  "error_details": {
    "field": "field_name",
    "reason": "validation_error"
  },
  "timestamp": "2023-01-01T12:00:00Z",
  "request_id": "uuid"
}

Common Error Codes

HTTP Status Codes

Rate Limiting

Default rate limits:

Rate limit headers:

X-RateLimit-Remaining: 59
X-RateLimit-Reset: 1640995200

WebSocket Support (Future)

Real-time updates will be available via WebSocket:

const ws = new WebSocket('ws://localhost:8081/ws');
ws.onmessage = (event) => {
  const update = JSON.parse(event.data);
  console.log('Cluster update:', update);
};

SDK Examples

Python SDK

import requests

class RetireClusterAPI:
    def __init__(self, base_url, api_key=None):
        self.base_url = base_url
        self.headers = {'Content-Type': 'application/json'}
        if api_key:
            self.headers['X-API-Key'] = api_key
    
    def get_cluster_status(self):
        response = requests.get(
            f"{self.base_url}/cluster/status",
            headers=self.headers
        )
        return response.json()
    
    def submit_task(self, task_type, payload, **kwargs):
        data = {
            'task_type': task_type,
            'payload': payload,
            **kwargs
        }
        response = requests.post(
            f"{self.base_url}/tasks",
            json=data,
            headers=self.headers
        )
        return response.json()

# Usage
api = RetireClusterAPI('http://localhost:8081/api/v1')
status = api.get_cluster_status()
task = api.submit_task('echo', {'message': 'Hello'})

JavaScript SDK

class RetireClusterAPI {
  constructor(baseUrl, apiKey) {
    this.baseUrl = baseUrl;
    this.headers = {'Content-Type': 'application/json'};
    if (apiKey) {
      this.headers['X-API-Key'] = apiKey;
    }
  }
  
  async getClusterStatus() {
    const response = await fetch(`${this.baseUrl}/cluster/status`, {
      headers: this.headers
    });
    return response.json();
  }
  
  async submitTask(taskType, payload, options = {}) {
    const data = {
      task_type: taskType,
      payload: payload,
      ...options
    };
    const response = await fetch(`${this.baseUrl}/tasks`, {
      method: 'POST',
      headers: this.headers,
      body: JSON.stringify(data)
    });
    return response.json();
  }
}

// Usage
const api = new RetireClusterAPI('http://localhost:8081/api/v1');
const status = await api.getClusterStatus();
const task = await api.submitTask('echo', {message: 'Hello'});

Production Deployment

Using Gunicorn

pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:8081 retire_cluster.api.wsgi:app

Using Docker

FROM python:3.11-slim

COPY . /app
WORKDIR /app

RUN pip install retire-cluster[api]
EXPOSE 8081

CMD ["retire-cluster-api", "--host", "0.0.0.0", "--port", "8081"]

Environment Variables

export RETIRE_CLUSTER_API_HOST=0.0.0.0
export RETIRE_CLUSTER_API_PORT=8081
export RETIRE_CLUSTER_API_KEY=your-secret-key
export RETIRE_CLUSTER_CLUSTER_HOST=localhost
export RETIRE_CLUSTER_CLUSTER_PORT=8080

Monitoring and Logging

Prometheus Metrics (Future)

# HELP retire_cluster_devices_total Total number of registered devices
# TYPE retire_cluster_devices_total gauge
retire_cluster_devices_total{status="online"} 4

# HELP retire_cluster_tasks_total Total number of tasks
# TYPE retire_cluster_tasks_total counter
retire_cluster_tasks_total{status="success"} 1250

Log Format

{
  "timestamp": "2023-01-01T12:00:00Z",
  "level": "INFO",
  "logger": "api.requests",
  "message": "Request processed",
  "request_id": "uuid",
  "method": "GET",
  "path": "/api/v1/cluster/status",
  "status_code": 200,
  "duration_ms": 45.2
}

Security Best Practices

  1. Use HTTPS in production
  2. Enable API key authentication
  3. Configure proper CORS settings
  4. Implement rate limiting
  5. Monitor API access logs
  6. Keep API keys secure
  7. Use least privilege principle
  8. Regular security updates

Troubleshooting

Common Issues

API server won’t start

Authentication errors

Task submission fails

Rate limit exceeded

Debug Mode

retire-cluster-api --debug

Enables detailed logging and error traces.