Skip to content

Latest commit

 

History

History
169 lines (145 loc) · 8.03 KB

options.md

File metadata and controls

169 lines (145 loc) · 8.03 KB

Options

This content needs reviewed and updated.

Complete and Chat Options

These are the options you can use with the Completion and Chat methods.

Ollama Anthropic Mistral OpenAI Gemini
llm.WithTemperature(float64) What sampling temperature to use, between 0.0 and 1.0. Higher values like 0.7 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Yes Yes Yes Yes Yes

Embedding Options

These are the options you can include for the Embeddingmethod.

Ollama Anthropic Mistral OpenAI Gemini
ollama.WithKeepAlive(time.Duration) Controls how long the model will stay loaded into memory following the request
Yes No No No No
ollama.WithTruncate() Does not truncate the end of each input to fit within context length. Returns error if context length is exceeded.
Yes No No No No
ollama.WithOption(string, any) Set model-specific option value.
Yes No No No No
openai.WithDimensions(uint64) The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
No No No Yes No
llm.WithFormat(string) The format to return the embeddings in. Can be either .
No No 'float' 'float' or 'base64' No

Older Content

You can add options to sessions, or to prompts. Different providers and models support different options.

package llm 

type Model interface {
  // Set session-wide options
  Context(...Opt) Context

  // Create a completion from a text prompt
  Completion(context.Context, string, ...Opt) (Completion, error)

  // Embedding vector generation
  Embedding(context.Context, string, ...Opt) ([]float64, error)
}

type Context interface {
  // Generate a response from a user prompt (with attachments and
  // other options)
  FromUser(context.Context, string, ...Opt) error
}

The options are as follows:

Option Ollama Anthropic Mistral OpenAI Description
llm.WithTemperature(float64) Yes Yes Yes Yes What sampling temperature to use, between 0.0 and 1.0. Higher values like 0.7 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
llm.WithTopP(float64) Yes Yes Yes Yes Nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
llm.WithTopK(uint64) Yes Yes No No Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative.
llm.WithMaxTokens(uint64) No Yes Yes Yes The maximum number of tokens to generate in the response.
llm.WithStream(func(llm.Completion)) Can be enabled when tools are not used Yes Yes Yes Stream the response to a function.
llm.WithToolChoice(string, string, ...) No Use auto, any or a function name. Only the first argument is used. Use auto, any, none, required or a function name. Only the first argument is used. Use auto, none, required or a function name. Only the first argument is used. The tool to use for the model.
llm.WithToolKit(llm.ToolKit) Cannot be combined with streaming Yes Yes Yes The set of tools to use.
llm.WithStopSequence(string, string, ...) Yes Yes Yes Yes Stop generation if one of these tokens is detected.
llm.WithSystemPrompt(string) No Yes Yes Yes Set the system prompt for the model.
llm.WithSeed(uint64) Yes No Yes Yes The seed to use for random sampling. If set, different calls will generate deterministic results.
llm.WithFormat(string) Use json No Use json_format or text Use json_format or text The format of the response. For Mistral, you must also instruct the model to produce JSON yourself with a system or a user message.
llm.WithPresencePenalty(float64) Yes No Yes Yes Determines how much the model penalizes the repetition of words or phrases. A higher presence penalty encourages the model to use a wider variety of words and phrases, making the output more diverse and creative.
llm.WithFequencyPenalty(float64) Yes No Yes Yes Penalizes the repetition of words based on their frequency in the generated text. A higher frequency penalty discourages the model from repeating words that have already appeared frequently in the output, promoting diversity and reducing repetition.
llm.WithPrediction(string) No No Yes Yes Enable users to specify expected results, optimizing response times by leveraging known or predictable content. This approach is especially effective for updating text documents or code files with minimal changes, reducing latency while maintaining high-quality results.
llm.WithSafePrompt() No No Yes No Whether to inject a safety prompt before all conversations.
llm.WithNumCompletions(uint64) No No Yes Yes Number of completions to return for each request.
llm.WithAttachment(io.Reader) Yes Yes Yes - Attach a file to a user prompt. It is the responsibility of the caller to close the reader.
llm.WithUser(string) No Yes No Yes A unique identifier representing your end-user
antropic.WithEphemeral() No Yes No - Attachments should be cached server-side
antropic.WithCitations() No Yes No - Attachments should be used in citations
openai.WithStore(bool) No No No Yes Whether or not to store the output of this chat completion request
openai.WithDimensions(uint64) No No No Yes The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models
openai.WithReasoningEffort(string) No No No Yes The level of effort model should put into reasoning.
openai.WithMetadata(string, string) No No No Yes Metadata to be logged with the completion.
openai.WithLogitBias(uint64, int64) No No No Yes A token and their logit bias value. Call multiple times to add additional tokens
openai.WithLogProbs() No No No Yes Include the log probabilities on the completion.
openai.WithLogProbs() No No No Yes Include the log probabilities on the completion.
openai.WithTopLogProbs(uint64) No No No Yes An integer between 0 and 20 specifying the number of most likely tokens to return at each token position.
openai.WithAudio(string, string) No No No Yes Output audio (voice, format) for the completion. Can be used with certain models.
openai.WithServiceTier(string) No No No Yes Specifies the latency tier to use for processing the request.
openai.WithStreamOptions(func(llm.Completion), bool) No No No Yes Include usage information in the stream response
openai.WithDisableParallelToolCalls() No No No Yes Call tools in serial, rather than in parallel