Model Overview
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
When Cortex.cpp is started, it automatically starts an API server, this is inspired by Docker CLI. This server manages various model endpoints. These endpoints facilitate the following:
- Model Operations: Run and stop models.
- Model Management: Manage your local models.
The model in the API server is automatically loaded/unloaded by using the /chat/completions
endpoint.
Model Formats​
Cortex.cpp supports three model formats:
- GGUF
- ONNX
- TensorRT-LLM
For details on each format, see the Model Formats page.
Built-in Models​
Cortex.cpp offers a range of built-in models that include popular open-source options. These models, hosted on HuggingFace as Cortex Model Repositories, are pre-compiled for different engines, enabling each model to have multiple branches in various formats.
Built-in Model Variants​
Built-in models are made available across the following variants:
- By format:
gguf
,onnx
, andtensorrt-llm
- By Size:
7b
,13b
, and more. - By quantizations:
q4
,q8
, and more.
You can see our full list of Built-in Models here.