Path to the model on the filesystem.
Optional
batchPrompt processing batch size.
Optional
contextText context size.
Optional
embeddingEmbedding mode only.
Optional
f16Use fp16 for KV cache.
Optional
gbnfGBNF string to be used to format output. Also known as grammar
.
Optional
gpuNumber of layers to store in VRAM.
Optional
jsonJSON schema to be used to format output. Also known as grammar
.
Optional
logitsThe llama_eval() call computes all logits, not just the last one.
Optional
maxOptional
prependAdd the begining of sentence token.
Optional
seedIf null, a random seed will be used.
Optional
temperatureThe randomness of the responses, e.g. 0.1 deterministic, 1.5 creative, 0.8 balanced, 0 disables.
Optional
threadsNumber of threads to use to evaluate tokens.
Optional
topKConsider the n most likely tokens, where n is 1 to vocabulary size, 0 disables (uses full vocabulary). Note: only applies when temperature
> 0.
Optional
topPSelects the smallest token set whose probability exceeds P, where P is between 0 - 1, 1 disables. Note: only applies when temperature
> 0.
Optional
trimTrim whitespace from the end of the generated text Disabled by default.
Optional
useForce system to keep model in RAM.
Optional
useUse mmap if possible.
Optional
vocabOnly load the vocabulary, no weights.
Generated using TypeDoc
Note that the modelPath is the only required parameter. For testing you can set this in the environment variable
LLAMA_PATH
.