supports multiple different generative models for text and audio:

Model NameTypeDescriptionCost
style-diff-500TTSSpeech synthesis model with style diffusion and adversarial training$0.005/1k characters
vitsTTSVITS ( is a popular end-to-end (one-stage) TTS model. Our hosted VITS TTS API features ultra-fast inference, the largest offering of languages (88) and the lowest price across all vendors.$0.001/1k characters
ar-diff-50kTTSTortoise-style AR+diffusion model$0.03/1k characters
Neets-7BLLMMistral-7B fork$0.55/million LLM tokens
mistralai/Mixtral-8X7B-Instruct-v0.1LLMPretrained generative Sparse Mixture of Experts$0.55/million LLM tokens

We are always improving our services and working on new models. Follow us on X or join our Discord to be the first to know when we update or add new models.