Production Best Practices¶
This page provides opinionated, Hayhooks-specific recommendations for running a secure and reliable production deployment. For infrastructure-level guidance (Docker, Kubernetes, scaling), see Deployment Guidelines.
Lock Down CORS¶
Hayhooks defaults to allowing all origins (["*"]). In production you should restrict CORS to only the domains that need access:
export HAYHOOKS_CORS_ALLOW_ORIGINS='["https://app.example.com", "https://admin.example.com"]'
export HAYHOOKS_CORS_ALLOW_METHODS='["GET", "POST"]'
export HAYHOOKS_CORS_ALLOW_HEADERS='["Content-Type", "Authorization"]'
export HAYHOOKS_CORS_ALLOW_CREDENTIALS=true
Warning
Leaving HAYHOOKS_CORS_ALLOW_ORIGINS=["*"] in production means any website can make requests to your Hayhooks server from a browser. Always restrict origins to your own domains.
If your frontend is served from multiple subdomains, use a regex pattern instead of listing every origin:
export HAYHOOKS_CORS_ALLOW_ORIGIN_REGEX='https://.*\.example\.com'
See Environment Variables Reference for all CORS settings.
Disable Tracebacks¶
Stack traces in error responses can leak internal paths, library versions, and code structure. Always disable them in production:
export HAYHOOKS_SHOW_TRACEBACKS=false
This is the default, but it is worth verifying explicitly -- especially if you copied a development .env file.
Configure Logging¶
Use INFO as the baseline log level. Switch to WARNING if log volume becomes a concern:
export LOG=INFO
Tip
Avoid DEBUG in production. It logs every request body, pipeline step, and internal event, which can significantly increase log volume and may expose sensitive data.
See Logging for log format and observability details.
Deploy Pipelines via HAYHOOKS_PIPELINES_DIR¶
In production, pipelines should be deployed at startup from a directory -- not via CLI commands or the HTTP API at runtime:
export HAYHOOKS_PIPELINES_DIR=/app/pipelines
Why this matters:
- All workers and instances start with the same set of pipelines
- No manual deploy step is needed after a restart or scale-out event
- Pipeline definitions can be version-controlled alongside your deployment configuration
Info
Runtime deploy/undeploy via the API is useful during development but introduces consistency risks in multi-worker or multi-instance setups. See Deployment Guidelines for details.
Tune Startup Deploy Performance¶
When deploying many pipelines from HAYHOOKS_PIPELINES_DIR, use parallel startup to reduce boot time:
export HAYHOOKS_STARTUP_DEPLOY_STRATEGY=parallel
export HAYHOOKS_STARTUP_DEPLOY_WORKERS=8
The default strategy is already parallel with 4 workers. Increase the worker count if you have many pipelines and available CPU cores.
For runtime deploy/undeploy operations, keep the default serialized mode unless you have a specific reason to change it:
export HAYHOOKS_DEPLOY_CONCURRENCY=serialized
See Environment Variables Reference for all deploy tuning options.
Optimize for Your Workload¶
Most Hayhooks pipelines fall into one of two categories. The right tuning strategy depends on which one yours is.
I/O-Bound Pipelines¶
Pipelines that spend most of their time waiting on external services -- LLM API calls, database queries, HTTP requests, file downloads -- are I/O-bound.
Recommendations:
- Use async methods. Implement
run_api_async()(orrun_chat_completion_async()) and build your pipeline withAsyncPipeline. This lets a single worker handle many concurrent requests because it can switch to another request while waiting on I/O. - A single worker is usually enough. Adding more workers does not help much when the bottleneck is network latency, not CPU.
- Scale horizontally if needed. If you need more throughput, add replicas (Kubernetes pods, Docker containers) rather than workers.
from haystack import AsyncPipeline
from hayhooks import BasePipelineWrapper
class PipelineWrapper(BasePipelineWrapper):
def setup(self) -> None:
self.pipeline = AsyncPipeline()
# ... build pipeline ...
async def run_api_async(self, query: str) -> str:
result = await self.pipeline.run_async({"prompt": {"query": query}})
return result["llm"]["replies"][0]
See Deployment Guidelines for more async examples.
CPU-Bound Pipelines¶
Pipelines that perform heavy computation locally -- embedding generation, document processing, on-device model inference -- are CPU-bound.
Recommendations:
-
Add workers on multi-core machines. Python's GIL limits a single worker to one CPU core for pure Python work. Use
--workersto run multiple processes:hayhooks run --workers 4A common starting point is
(2 x CPU_cores) + 1. Monitor actual CPU usage and adjust. -
On Kubernetes, keep one worker per pod and scale via replicas. This gives the orchestrator full control over scheduling, resource limits, and rolling updates.
- Move heavy initialization into
setup(). Loading models or building indexes insetup()runs once at startup. Doing it insiderun_api()would repeat the cost on every request.
Info
Even with multiple workers, each worker has its own GIL. For truly CPU-intensive workloads, horizontal scaling (more pods/containers) is more effective than vertical scaling (more workers in one process). See Deployment Guidelines for details.
Add Authentication¶
Hayhooks does not include built-in authentication. For production, you should add one of:
- Middleware-based API key auth -- see the API Key Auth example for a complete implementation
- Reverse proxy auth -- use Nginx, Traefik, or a cloud load balancer to handle authentication before requests reach Hayhooks
- Custom middleware -- add your own FastAPI middleware via programmatic customization
Warning
Without authentication, anyone who can reach your Hayhooks server can invoke pipelines and (if runtime deploy is enabled) deploy or undeploy them. At minimum, restrict network access to trusted sources.
Set Up Health Checks¶
The /status endpoint returns the server status and list of deployed pipelines. Use it as a health check for container orchestrators:
# Docker Compose
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:1416/status"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
For Kubernetes, use a liveness probe on the same endpoint:
livenessProbe:
httpGet:
path: /status
port: 1416
initialDelaySeconds: 40
periodSeconds: 30
Docker and Container Tips¶
Follow these practices when running Hayhooks in containers:
-
Mount pipelines read-only -- use
:roto prevent accidental writes:-v "$PWD/pipelines:/app/pipelines:ro" -
Bind to
0.0.0.0-- the defaultlocalhostis not reachable from outside the container:-e HAYHOOKS_HOST=0.0.0.0 -
Set resource limits -- prevent a single pipeline from consuming all host resources:
deploy: resources: limits: cpus: "2.0" memory: 4G -
Use a non-root user -- if building a custom image, add a non-root user for defense in depth:
RUN useradd -m hayhooks USER hayhooks -
Pin image tags -- avoid
latestormainin production. Use a specific release tag or SHA for reproducibility.
Manage Environment Variables¶
- Never hardcode secrets (API keys, database passwords) in pipeline code or Docker Compose files. Use environment variables, mounted secret files, or your platform's secret manager.
- Use
.envfiles for local testing only. In production, inject variables through your orchestrator (Docker secrets, Kubernetes ConfigMaps/Secrets, cloud provider secret stores). -
Audit your configuration before deploying. A quick check:
# Verify no dev settings leaked into production env | grep HAYHOOKS_
Recommended Production .env¶
For reference, a minimal production configuration:
HAYHOOKS_HOST=0.0.0.0
HAYHOOKS_PORT=1416
HAYHOOKS_PIPELINES_DIR=/app/pipelines
HAYHOOKS_SHOW_TRACEBACKS=false
HAYHOOKS_CORS_ALLOW_ORIGINS=["https://app.example.com"]
HAYHOOKS_CORS_ALLOW_CREDENTIALS=true
HAYHOOKS_STARTUP_DEPLOY_STRATEGY=parallel
HAYHOOKS_STARTUP_DEPLOY_WORKERS=4
LOG=INFO
Next Steps¶
- Deployment Guidelines -- Infrastructure, Docker, Kubernetes, and scaling
- Advanced Configuration -- Custom routes, middleware, and programmatic customization
- Environment Variables Reference -- Complete configuration reference
- Development Best Practices -- Tips for local development workflow