Open Bedrock Server

A unified, provider-agnostic chat completions API server supporting OpenAI and AWS Bedrock

View the Project on GitHub teabranch/open-bedrock-server

Architecture Guide

This document describes the architecture of the Open Bedrock Server Server, which implements a unified, provider-agnostic approach to LLM integration.

Table of contents

  1. Core Principles
  2. Unified Architecture
  3. Core Components
    1. 1. Unified Endpoint Layer
    2. 2. Format Detection Layer
    3. 3. Model-Based Routing Layer
    4. 4. Service Layer
    5. 5. Adapter Layer
    6. 6. API Client Layer
  4. Data Flow
    1. 1. Request Processing Flow
    2. 2. Streaming Flow
    3. 3. Error Handling Flow
  5. Format Support Matrix
    1. Input Formats
    2. Output Formats
    3. Model Routing Patterns
  6. Deployment Architecture
    1. Development Environment
    2. Production Environment
    3. Security Architecture
  7. Performance Considerations
    1. Caching Strategy
    2. Concurrency Management
    3. Monitoring Points
  8. Extensibility
    1. Adding New Providers
    2. Adding New Formats
    3. Configuration Management

Core Principles

Unified Architecture

graph TD
    A[Client Request] --> B[/v1/chat/completions]
    B --> C{Format Detection}
    
    C --> D1[OpenAI Format]
    C --> D2[Bedrock Claude Format]
    C --> D3[Bedrock Titan Format]
    
    D1 --> E{Model-Based Routing}
    D2 --> F[Format Conversion] --> E
    D3 --> F
    
    E --> G1[OpenAI Service]
    E --> G2[Bedrock Service]
    
    G1 --> H1[OpenAI API]
    G2 --> H2[AWS Bedrock API]
    
    H1 --> I[Response Processing]
    H2 --> I
    
    I --> J{Target Format?}
    J --> K1[OpenAI Response]
    J --> K2[Bedrock Claude Response]
    J --> K3[Bedrock Titan Response]
    
    K1 --> L[Client Response]
    K2 --> L
    K3 --> L

    classDef endpoint fill:#E6E6FA,stroke:#B0C4DE,stroke-width:2px,color:#333;
    classDef service fill:#F0F8FF,stroke:#87CEEB,stroke-width:2px,color:#333;
    classDef api fill:#FFF8DC,stroke:#DAA520,stroke-width:2px,color:#333;
    
    class B endpoint;
    class G1,G2 service;
    class H1,H2 api;

Core Components

1. Unified Endpoint Layer

Single Entry Point:

Key Features:

2. Format Detection Layer

RequestFormatDetector:

class RequestFormatDetector:
    @staticmethod
    def detect_format(request_data: Dict[str, Any]) -> RequestFormat:
        # Priority-based detection
        if "anthropic_version" in request_data:
            return RequestFormat.BEDROCK_CLAUDE
        elif "inputText" in request_data:
            return RequestFormat.BEDROCK_TITAN
        elif "model" in request_data and "messages" in request_data:
            return RequestFormat.OPENAI
        else:
            return RequestFormat.UNKNOWN

3. Model-Based Routing Layer

LLMServiceFactory:

class LLMServiceFactory:
    @staticmethod
    def get_service_for_model(model_id: str) -> AbstractLLMService:
        # OpenAI models: gpt-*, text-*, dall-e-*
        if model_id.startswith(("gpt-", "text-", "dall-e-")):
            return OpenAIService()
        
        # Bedrock models: anthropic.*, amazon.*, ai21.*, etc.
        elif any(model_id.startswith(prefix) for prefix in 
                ["anthropic.", "amazon.", "ai21.", "cohere.", "meta."]):
            return BedrockService()
        
        # Regional Bedrock: us.anthropic.*, eu.anthropic.*, etc.
        elif len(model_id.split(".")) > 2:
            return BedrockService()
        
        else:
            raise ModelNotFoundError(f"Unsupported model: {model_id}")

4. Service Layer

Abstract Service Interface:

class AbstractLLMService(ABC):
    @abstractmethod
    async def chat_completion(
        self,
        request: ChatCompletionRequest
    ) -> Union[ChatCompletionResponse, AsyncGenerator[ChatCompletionChunk, None]]:
        pass

Concrete Services:

5. Adapter Layer

Format Conversion:

class BedrockToOpenAIAdapter:
    def convert_bedrock_to_openai_request(self, bedrock_request) -> ChatCompletionRequest:
        # Convert Bedrock format to OpenAI format
        pass
    
    def convert_openai_to_bedrock_response(self, openai_response, target_format) -> Dict:
        # Convert OpenAI response to Bedrock format
        pass

Strategy Pattern for Bedrock:

6. API Client Layer

Unified API Client:

class APIClient:
    @retry(stop=stop_after_attempt(3))
    async def make_openai_request(self, payload: Dict) -> Any:
        # OpenAI API calls with retry logic
        pass
    
    @retry(stop=stop_after_attempt(3))
    async def make_bedrock_request(self, model_id: str, payload: Dict) -> Any:
        # Bedrock API calls with retry logic
        pass

Data Flow

1. Request Processing Flow

Client Request
    ↓
Format Detection (OpenAI/Bedrock Claude/Bedrock Titan)
    ↓
Model Extraction (from request)
    ↓
Service Routing (based on model ID patterns)
    ↓
Format Conversion (if needed)
    ↓
Provider API Call (OpenAI/Bedrock)
    ↓
Response Processing
    ↓
Format Conversion (if target_format specified)
    ↓
Client Response

2. Streaming Flow

Streaming Request (stream=true)
    ↓
Format Detection & Routing (same as above)
    ↓
Streaming API Call
    ↓
Chunk Processing & Format Conversion
    ↓
Real-time Client Response (Server-Sent Events)

3. Error Handling Flow

API Error
    ↓
Error Classification (Auth/Rate Limit/Service/etc.)
    ↓
Error Mapping (Provider-specific → Standard format)
    ↓
Retry Logic (if applicable)
    ↓
Standardized Error Response

Format Support Matrix

Input Formats

Format Detection Key Example
OpenAI model + messages {"model": "gpt-4o-mini", "messages": [...]}
Bedrock Claude anthropic_version {"anthropic_version": "bedrock-2023-05-31", ...}
Bedrock Titan inputText {"inputText": "User: Hello", ...}

Output Formats

Format Query Parameter Response Structure
OpenAI target_format=openai Standard OpenAI Chat Completions format
Bedrock Claude target_format=bedrock_claude Anthropic Claude message format
Bedrock Titan target_format=bedrock_titan Amazon Titan text generation format

Model Routing Patterns

Pattern Provider Examples
gpt-* OpenAI gpt-4o-mini, gpt-3.5-turbo
text-* OpenAI text-davinci-003
anthropic.* Bedrock anthropic.claude-3-haiku-20240307-v1:0
amazon.* Bedrock amazon.titan-text-express-v1
ai21.* Bedrock ai21.j2-ultra-v1
us.anthropic.* Bedrock us.anthropic.claude-3-haiku-20240307-v1:0

Deployment Architecture

Development Environment

Developer Machine
├── Python Application
├── Local Configuration (.env)
├── Direct API Access
│   ├── OpenAI API
│   └── AWS Bedrock (via credentials)
└── Local Testing

Production Environment

Load Balancer
    ↓
Container Orchestration (ECS/Kubernetes)
    ↓
Application Containers
├── Environment Variables
├── IAM Roles (for AWS)
├── Health Checks
└── Logging/Monitoring
    ↓
External APIs
├── OpenAI API
└── AWS Bedrock

Security Architecture

Client Request
    ↓
API Gateway (optional)
    ↓
Authentication Layer (API Key)
    ↓
Rate Limiting
    ↓
Application Layer
    ↓
Provider Authentication
├── OpenAI API Key
└── AWS IAM Roles/Credentials

Performance Considerations

Caching Strategy

Concurrency Management

Monitoring Points

Extensibility

Adding New Providers

  1. Create Service Class: Implement AbstractLLMService
  2. Update Factory: Add routing logic for new model patterns
  3. Add Adapters: Implement format conversion if needed
  4. Update Tests: Add comprehensive test coverage

Adding New Formats

  1. Update Detection: Add format detection logic
  2. Create Adapters: Implement conversion to/from standard format
  3. Update Routing: Ensure proper service selection
  4. Document Format: Add to API documentation

Configuration Management


This architecture provides a solid foundation for a unified LLM integration server while maintaining flexibility for future enhancements and provider additions.