Language Model Orchestration System

A modular, scalable AI system for deploying full-featured language models, transcription, embeddings, and reranking, with support for Docker and Kubernetes orchestration.

The LMOS Ecosystem - One endpoint, Full Language Stack.

LMOS-Router

Central gateway with OpenAI-compatible endpoints

LLM-Runner

ZMQ-based LLM inference template

STT-Runner

Speech-to-text service

Embedding-Runner

Vector embedding service

Key Features

  • Built on OpenAI API Spec
  • Baked in Auth and Rate Limiting
  • Designed for Containerization
  • Modular and Ready for Scale

Built for developers, enabling faster feature delivery to users.

Pre-configured dev containers • Extensive documentation • Developers that enjoy communicating

Coming Soon to Apache 2.0