Customize Engine Settings
In this guide, we will show you how to customize the engine settings.
- Navigate to the
~/jan/engine
folder. You can find this folder by going toApp Settings
>Advanced
>Open App Directory
.
- macOS
- Windows
- Linux
cd ~/jan/engine
C:/Users/<your_user_name>/jan/engine
cd ~/jan/engine
- Modify the
nitro.json
file based on your needs. The default settings are shown below.
~/jan/engines/nitro.json
{
"ctx_len": 2048,
"ngl": 100,
"cpu_threads": 1,
"cont_batching": false,
"embedding": false
}
The table below describes the parameters in the nitro.json
file.
Parameter | Type | Description |
---|---|---|
ctx_len | Integer | The context length for the model operations. |
ngl | Integer | The number of GPU layers to use. |
cpu_threads | Integer | The number of threads to use for inferencing (CPU mode only) |
cont_batching | Boolean | Whether to use continuous batching. |
embedding | Boolean | Whether to use embedding in the model. |
tip
- By default, the value of
ngl
is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can setngl
to 15. Because the majority of models on Mistral or Llama are around ~ 30 layers. - To utilize the embedding feature, include the JSON parameter
"embedding": true
. It will enable Nitro to process inferences with embedding capabilities. For a more detailed explanation, please refer to the Embedding in the Nitro documentation. - To utilize the continuous batching feature to boost throughput and minimize latency in large language model (LLM) inference, please refer to the Continuous Batching in the Nitro documentation.