A modular, scalable AI system for deploying full-featured language models, transcription, embeddings, and reranking, with support for Docker and Kubernetes orchestration.
Central gateway with OpenAI-compatible endpoints
ZMQ-based LLM inference template
Speech-to-text service
Vector embedding service
Pre-configured dev containers • Extensive documentation • Developers that enjoy communicating