Customize Engine Settings

In this guide, we will show you how to customize the engine settings.

Navigate to the ~/jan/engine folder. You can find this folder by going to App Settings > Advanced > Open App Directory.

cd ~/jan/engine

C:/Users/<your_user_name>/jan/engine

cd ~/jan/engine

Modify the nitro.json file based on your needs. The default settings are shown below.

~/jan/engines/nitro.json
{
  "ctx_len": 2048,
  "ngl": 100,
  "cpu_threads": 1,
  "cont_batching": false,
  "embedding": false
}

The table below describes the parameters in the nitro.json file.

Parameter	Type	Description
`ctx_len`	Integer	The context length for the model operations.
`ngl`	Integer	The number of GPU layers to use.
`cpu_threads`	Integer	The number of threads to use for inferencing (CPU mode only)
`cont_batching`	Boolean	Whether to use continuous batching.
`embedding`	Boolean	Whether to use embedding in the model.

tip

By default, the value of ngl is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set ngl to 15. Because the majority of models on Mistral or Llama are around ~ 30 layers.
To utilize the embedding feature, include the JSON parameter "embedding": true. It will enable Nitro to process inferences with embedding capabilities. For a more detailed explanation, please refer to the Embedding in the Nitro documentation.
To utilize the continuous batching feature to boost throughput and minimize latency in large language model (LLM) inference, please refer to the Continuous Batching in the Nitro documentation.