Skip to main content

Customize Engine Settings

In this guide, we will show you how to customize the engine settings.

  1. Navigate to the ~/jan/engine folder. You can find this folder by going to App Settings > Advanced > Open App Directory.
cd ~/jan/engine
  1. Modify the nitro.json file based on your needs. The default settings are shown below.
~/jan/engines/nitro.json
{
"ctx_len": 2048,
"ngl": 100,
"cpu_threads": 1,
"cont_batching": false,
"embedding": false
}

The table below describes the parameters in the nitro.json file.

ParameterTypeDescription
ctx_lenIntegerThe context length for the model operations.
nglIntegerThe number of GPU layers to use.
cpu_threadsIntegerThe number of threads to use for inferencing (CPU mode only)
cont_batchingBooleanWhether to use continuous batching.
embeddingBooleanWhether to use embedding in the model.
tip
  • By default, the value of ngl is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set ngl to 15. Because the majority of models on Mistral or Llama are around ~ 30 layers.
  • To utilize the embedding feature, include the JSON parameter "embedding": true. It will enable Nitro to process inferences with embedding capabilities. For a more detailed explanation, please refer to the Embedding in the Nitro documentation.
  • To utilize the continuous batching feature to boost throughput and minimize latency in large language model (LLM) inference, please refer to the Continuous Batching in the Nitro documentation.