Deployment Guidelines¶

This guide describes how to deploy Hayhooks in production environments.

Since Hayhooks is a FastAPI application, you can deploy it using any standard ASGI server deployment strategy. For comprehensive deployment concepts, see the FastAPI deployment documentation.

This guide focuses on Hayhooks-specific considerations for production deployments.

Quick Recommendations¶

Use HAYHOOKS_PIPELINES_DIR to deploy pipelines in production environments
Start with a single worker for I/O-bound pipelines, use multiple workers for CPU-bound workloads
Implement async methods (run_api_async) for better I/O performance
Configure health checks for container orchestration
Set appropriate resource limits and environment variables
Review security settings (CORS, tracebacks, logging levels)

Configuration Resources

Review Configuration and Environment Variables Reference before deploying.

Pipeline Deployment Strategy¶

For production deployments, use HAYHOOKS_PIPELINES_DIR to deploy pipelines at startup.

Using HAYHOOKS_PIPELINES_DIR¶

Set the environment variable to point to a directory containing your pipeline definitions:

export HAYHOOKS_PIPELINES_DIR=/app/pipelines
hayhooks run

When Hayhooks starts, it automatically loads all pipelines from this directory.

Benefits:

Pipelines are available immediately on startup
Consistent across all workers/instances
No runtime deployment API calls needed
Simple to version control and deploy

Directory structure:

pipelines/
├── my_pipeline/
│   ├── pipeline_wrapper.py
│   └── pipeline.yml
└── another_pipeline/
    ├── pipeline_wrapper.py
    └── pipeline.yml

See YAML Pipeline Deployment and PipelineWrapper for file structure details.

Development vs Production

For local development, you can use CLI commands (hayhooks pipeline deploy-files) or API endpoints (POST /deploy-files). For production, always use HAYHOOKS_PIPELINES_DIR.

Performance Tuning¶

Single Worker vs Multiple Workers¶

Single Worker Environment:

hayhooks run

Best for:

Development and testing
I/O-bound pipelines (HTTP requests, file operations, database queries)
Low to moderate concurrent requests
Simpler deployment and debugging

Multiple Workers Environment:

hayhooks run --workers 4

Best for:

CPU-bound pipelines (embedding generation, heavy computation)
High concurrent request volumes
Production environments with available CPU cores

Worker Count Formula

A common starting point: workers = (2 x CPU_cores) + 1. Adjust based on your workload - I/O-bound: More workers can help; CPU-bound: Match CPU cores to avoid context switching overhead.

Concurrency Behavior¶

Pipeline run() methods execute synchronously but are wrapped in run_in_threadpool to avoid blocking the async event loop.

I/O-bound pipelines (HTTP requests, file operations, database queries):

Can handle concurrent requests effectively in a single worker
Worker switches between tasks during I/O waits
Consider implementing async methods for even better performance

CPU-bound pipelines (embedding generation, heavy computation):

Limited by Python's Global Interpreter Lock (GIL)
Requests are queued and processed sequentially in a single worker
Use multiple workers or horizontal scaling to improve throughput

Async Pipelines¶

Implement async methods for better I/O-bound performance:

from haystack import AsyncPipeline
from hayhooks import BasePipelineWrapper

class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        self.pipeline = AsyncPipeline.loads(
            (Path(__file__).parent / "pipeline.yml").read_text()
        )

    async def run_api_async(self, query: str) -> str:
        result = await self.pipeline.run_async({"prompt": {"query": query}})
        return result["llm"]["replies"][0]

Streaming¶

Use streaming for chat endpoints to reduce perceived latency:

from hayhooks import async_streaming_generator, get_last_user_message

async def run_chat_completion_async(self, model: str, messages: list[dict], body: dict):
    question = get_last_user_message(messages)
    return async_streaming_generator(
        pipeline=self.pipeline,
        pipeline_run_args={"prompt": {"query": question}},
    )

See OpenAI Compatibility for more details on streaming.

Horizontal Scaling¶

Deploy multiple instances behind a load balancer for increased throughput.

Key considerations:

Use HAYHOOKS_PIPELINES_DIR to ensure all instances have the same pipelines
Configure session affinity if using stateful components
Distribute traffic evenly across instances
Monitor individual instance health

Example setup (Docker Swarm, Kubernetes, or cloud load balancers):

# Each instance should use the same pipeline directory
export HAYHOOKS_PIPELINES_DIR=/app/pipelines
hayhooks run

GIL Limitations

Even with multiple workers, individual workers have GIL limitations. CPU-bound pipelines benefit more from horizontal scaling (multiple instances) than vertical scaling (multiple workers per instance).

Docker Deployment¶

Single Container¶

docker run -d \
  -p 1416:1416 \
  -e HAYHOOKS_HOST=0.0.0.0 \
  -e HAYHOOKS_PIPELINES_DIR=/app/pipelines \
  -v "$PWD/pipelines:/app/pipelines:ro" \
  deepset/hayhooks:main

Docker Compose¶

version: '3.8'
services:
  hayhooks:
    image: deepset/hayhooks:main
    ports:
      - "1416:1416"
    environment:
      HAYHOOKS_HOST: 0.0.0.0
      HAYHOOKS_PIPELINES_DIR: /app/pipelines
      LOG: INFO
    volumes:
      - ./pipelines:/app/pipelines:ro
    restart: unless-stopped

See Quick Start with Docker Compose for a complete example with Open WebUI integration.

Health Checks¶

Add health checks to monitor container health:

services:
  hayhooks:
    image: deepset/hayhooks:main
    ports:
      - "1416:1416"
    environment:
      HAYHOOKS_HOST: 0.0.0.0
      HAYHOOKS_PIPELINES_DIR: /app/pipelines
    volumes:
      - ./pipelines:/app/pipelines:ro
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:1416/status"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped

The /status endpoint returns the server status and can be used for health monitoring.

Production Deployment Options¶

Docker¶

Deploy Hayhooks using Docker containers for consistent, portable deployments across environments. Docker provides isolation, easy versioning, and simplified dependency management. See the Docker documentation for container deployment best practices.

Kubernetes¶

Deploy Hayhooks on Kubernetes for automated scaling, self-healing, and advanced orchestration capabilities. Use Deployments, Services, and ConfigMaps to manage pipeline definitions and configuration. See the Kubernetes documentation for deployment strategies.

Server/VPS Deployment¶

Deploy Hayhooks directly on a server or VPS using systemd or process managers like supervisord for production reliability. This approach offers full control over the environment and is suitable for dedicated workloads. See the FastAPI deployment documentation for manual deployment guidance.

AWS ECS¶

Deploy Hayhooks on AWS Elastic Container Service for managed container orchestration in the AWS ecosystem. ECS handles container scheduling, load balancing, and integrates seamlessly with other AWS services. See the AWS ECS documentation for deployment details.

Production Best Practices¶

Environment Variables¶

Store sensitive configuration in environment variables or secrets:

# Use a .env file
HAYHOOKS_PIPELINES_DIR=/app/pipelines
LOG=INFO
HAYHOOKS_SHOW_TRACEBACKS=false

See Environment Variables Reference for all options.

Logging¶

Configure appropriate log levels for production:

# Production: INFO or WARNING
export LOG=INFO

# Development: DEBUG
export LOG=DEBUG

See Logging for details.

CORS Configuration¶

Configure CORS for production environments:

# Restrict to specific origins
export HAYHOOKS_CORS_ALLOW_ORIGINS='["https://yourdomain.com"]'
export HAYHOOKS_CORS_ALLOW_CREDENTIALS=true

Troubleshooting¶

Pipeline Not Available¶

If pipelines aren't available after startup:

Check HAYHOOKS_PIPELINES_DIR is correctly set
Verify pipeline files exist in the directory
Check logs for deployment errors: docker logs <container_id>
Verify pipeline wrapper syntax and imports

High Memory Usage¶

For memory-intensive pipelines:

Increase container memory limits in Docker Compose
Profile pipeline components for memory leaks
Optimize component initialization and caching
Consider using smaller models or batch sizes

Slow Response Times¶

For performance issues:

Check component initialization in setup() vs run_api()
Verify pipeline directory is mounted correctly
Review logs for errors or warnings
Consider implementing async methods or adding workers (see Performance Tuning above)

Next Steps¶

Advanced Configuration - Custom routes, middleware, and programmatic customization
Environment Variables Reference - Complete configuration reference
Pipeline Deployment - Pipeline deployment concepts
Quick Start with Docker Compose - Complete Docker Compose example