Run any AI model
in your cloud or ours

With Mystic you can deploy ML in your own Azure/AWS/GCP
account or deploy in our shared GPU cluster

Recommended

Cloud integration with AWS/Azure/GCP

Mystic in your cloud

All Mystic features directly in your own cloud. In a few simple steps you get the most cost effective and scalable way of running ML inference.

Get started

Pricing

Pay-as-you-go API

Mystic in our shared cloud

Our shared cluster of GPUs used by 100s of users simultaneously. Low cost but performance will vary depending on realtime GPU availability.

Get started

Pricing

Created and used by experts at

Bring your generative AI product
to market faster

Good AI products need good models and infrastructure;
we solve the infrastructure part.

Cost optimizations

Run on spot and parallelized GPUs
Run in AWS/GCP/Azure and use your cloud credits

Fast inference

Use vLLM, TensorRT, TGI or any other inference engine
Low cold starts with our fast registry

Simpler developer experience

A fully managed Kubernetes platform that runs in your own cloud
Open-source Python library and API to simplify your entire AI workflow

With our managed platform designed for AI

You get a high-performance platform to serve your AI models. Mystic will automatically scale up and down GPUs depending on the number of API calls your models receive. You can easily view, edit and monitor your infrastructure from your Mystic Dashboard, CLI and APIs.

Cost optimizations

What we’ve done to make sure your infrastructure bill is as low as possible.

Pay GPUs at cost of cloud

Serverless providers charge you a premium on compute that quickly becomes very expensive. With Mystic running in your cloud, there is no added fee on compute.

a chart explaining cost optimizations from other providers vs Mystic, showing we dont have overheads

Run inference on spot instances

Mystic allows you to run your AI models on spot instances and automatically manage the request of new GPUs when preempted.

A graphic showing 10x cheaper than other on demand GPUs (example: A100-80GB on Azure)

Run in parallel, in the same GPU

Mystic supports GPU fractionalization. With 0 code changes, you can run multiple models on the same A30 or A100 or H100 or H200 GPU and maximise GPU utilization.

$A graphic depicting fractionalization of GPUs by showing a rectangle broken in 24 smaller squares, and some of these squares are using on model and some are using another$

Automatically scale down to 0-GPUs

If your models in production stop receiving requests, our auto-scaler will automatically release the GPUs back to the cloud provider. You can easily customize these warmup and cooldown periods with our API.

A chart showing how Mystic scales down to 0 GPUs when not in use, and scales up when needed

Cloud credits and commitments

If you are a company with cloud credits or existing cloud spend agreements, you can use them to pay for your cloud bill while using Mystic.

A graphic showing the 3 cloud providers we support: AWS, GCP, Azure

Performance optimizations

What we’ve done to make sure your models run extremely fast and have minimal cold start.

Bring your inference engines

Within a few milliseconds our scheduler decides the optimal strategy of queuing, routing and scaling.

vLLM

TGI

TensorRT

Deepspeed

Exllamav2

...

Bring your own

High-performance model loader built in Rust
Coming soon

Thanks to our custom container registry, written in Rust, experience much lower cold-starts than anywhere else in the market and load your containers extremely fast.

A simple and beautiful developer experience

We believe data-scientists and AI engineers should be able to safely deploy their ML without having to be experts in infrastructure.

No Kubernetes or DevOps experience required

Our managed platform removes all the complexities of building and maintaining your custom ML platform. We’ve packed all the engineering so you don’t have to.

APIs, CLI and Python SDK to deploy and run your ML

Extremely simple APIs, our CLI tool and an open-source Python library to give you the freedom and confidence of serving high-performance ML models..

A beautiful dashboard to view and manage all your ML deployments

A unified dashboard to view all your runs, ML pipelines, versions, GPU clusters, API tokens and much more.

Get started with Mystic

Run your AI models in in your cloud or ours

With Mystic, you can deploy ML in your own Azure/AWS/GCP account or deploy in our shared GPU cluster.

Recommended

Cloud integration with AWS/Azure/GCP

Mystic in your cloud

All Mystic features directly in your own cloud. In a few simple steps you get the most cost effective and scalable way of running ML inference.

Get started

Pricing

Pay-as-you-go API

Mystic in our shared cloud

Our shared cluster of GPUs used by 100s of users simultaneously. Low cost but performance will vary depending on realtime GPU availability.

Get started

Pricing

How to deploy AI models with Mystic

From 0 to fast API endpoint

From your custom SDXL, to your fine-tuned LLM. Whether it’s a LoRa or a complex pipeline. Our open-source tool will allow you to package your ML pipeline.

Wrap your pipeline with our open source library

Pipeline AI is our open-source Python library to wrap AI pipelines.

Whether it's a standard PyTorch model, a HuggingFace model, a combination of multiple models using your favourite inference engine or your fine-tuned models.

You get it, it's flexible and you can package anything.

View docs

from huggingface_hub import snapshot_download
from vllm import LLM, SamplingParams

from pipeline import entity, pipe

@entity
class LlamaPipeline
    @pipe(on_startup=True, run_once=True)
    def load_model(self) -> None:
      model_dir = "/tmp/llama2-7b-cache/"
      snapshot_download(
          "meta-llama/Llama-2-7b-chat-hf",
          local_dir=model_dir,
          token="YOUR_HUGGINGFACE_TOKEN",
      )
      self.llm = LLM(
        model_dir,
        dtype="bfloat16",
      )
      self.tokenizer = self.llm.get_tokenizer()

Deploy to AWS, GCP, Azure with Mystic

With a single command, a new version of your pipeline is deployed on your own cloud.

Upload your pipeline

pipeline container push

Run your AI model as an API

Get an instant API endpoint to run your model after upload. Mystic automatically scales up and down GPUs depending on the usage of your deployed model. Use our APIs, CLI or Dashboard to view and manage your models and infrastructure.

RESTful APIs to call your model

curl -X POST 'https://www.mystic.ai/v4/runs/stream'
--header 'Authorization: Bearer YOUR_TOKEN'
--header 'Content-Type: application/json'
--data '{
  "pipeline": "user/pipeline_streaming:v1",
  "inputs": [{"type":"string","value":"A lone tree in the dessert"}]
  }'-N

Streaming animation showing an image of a tree materializing from white to full image over 1 second

Community

See what our public community uploads and deploy in your own cloud with 1-click-deploy.

Explore AI models

Join our Discord

Run any AI model in your cloud or ours

Cloud integration with AWS/Azure/GCP

Mystic in your cloud

Pay-as-you-go API

Mystic in our shared cloud

Created and used by experts at

Bring your generative AI product to market faster

Good AI products need good models and infrastructure; we solve the infrastructure part.

Cost optimizations

Fast inference

Simpler developer experience

With our managed platform designed for AI

Cost optimizations

Pay GPUs at cost of cloud

Run inference on spot instances

Run in parallel, in the same GPU

Automatically scale down to 0-GPUs

Cloud credits and commitments

Performance optimizations

Bring your inference engines

High-performance model loader built in Rust Coming soon

A simple and beautiful developer experience

No Kubernetes or DevOps experience required

APIs, CLI and Python SDK to deploy and run your ML

A beautiful dashboard to view and manage all your ML deployments

Get started with Mystic

Run your AI models in in your cloud or ours

Cloud integration with AWS/Azure/GCP

Mystic in your cloud

Pay-as-you-go API

Mystic in our shared cloud

How to deploy AI models with Mystic

From 0 to fast API endpoint

Wrap your pipeline with our open source library

Deploy to AWS, GCP, Azure with Mystic

Upload your pipeline

Run your AI model as an API

RESTful APIs to call your model

Community

Run any AI model
in your cloud or ours

Bring your generative AI product
to market faster

Good AI products need good models and infrastructure;
we solve the infrastructure part.

High-performance model loader built in Rust
Coming soon