Generate output from large language models
[generatedText,completeOutput,httpResponse] = generate(model,userPrompt)
[generatedText,completeOutput,httpResponse] = generate(model,messageHistory)
___ = generate(___,Name=Value)
[generatedText,completeOutput,httpResponse] = generate(model,userPrompt)
generates output from a large language model given a single user prompt.
___ = generate(model,messageHistory)
instead uses the entire chat history to generate output. This can include example inputs and outputs for few-shot prompting. The message history can also include images that models with vision capabilities, such as GPT-4o, can use to generate text.
___ = generate(___,Name=Value)
specifies additional options using one or more name-value arguments.
First, specify the OpenAI® API key as an environment variable and save it to a file called ".env"
. Next, load the environment file using the loadenv
function.
loadenv(".env")
Connect to the OpenAI API. Generate text based on a user prompt.
model = openAIChat;
[generatedText,completeOutput,httpResponse]=generate(model,"Why is a raven like a writing desk?",MaxNumTokens=50)
generatedText = "The phrase "Why is a raven like a writing desk?" is famously posed by the Mad Hatter in Lewis Carroll's "Alice's Adventures in Wonderland." Initially, it is presented as a nonsensical riddle without a definitive answer, highlighting the"
completeOutput = struct with fields:
role: 'assistant'
content: 'The phrase "Why is a raven like a writing desk?" is famously posed by the Mad Hatter in Lewis Carroll's "Alice's Adventures in Wonderland." Initially, it is presented as a nonsensical riddle without a definitive answer, highlighting the'
refusal: []
httpResponse =
ResponseMessage with properties:
StatusLine: 'HTTP/1.1 200 OK'
StatusCode: OK
Header: [1x24 matlab.net.http.HeaderField]
Body: [1x1 matlab.net.http.MessageBody]
Completed: 0
openAIChat
object | ollamaChat
object | azureChat
object
Specify the chat completion API to use to generate text.
character vector | string scalar
Natural language prompt instructing the model what to do.
Example: "Please list three MATLAB functions beginning with m."
messageHistory
object
Chat history, specified as a messageHistory
object.
The supported name-value arguments depend on the chat completion API.
Name-Value Argument | openAIChat |
azureChat |
ollamaChat |
---|---|---|---|
MaxNumTokens |
Supported | Supported | Supported |
Seed |
Supported | Supported | Supported |
Temperature |
Supported | Supported | Supported |
TopP |
Supported | Supported | Supported |
StopSequences |
Supported | Supported | Supported |
TimeOut |
Supported | Supported | Supported |
StreamFun |
Supported | Supported | Supported |
ResponseFormat |
Supported | Supported | Supported |
ModelName |
Supported | Supported | |
PresencePenalty |
Supported | Supported | |
FrequencyPenalty |
Supported | Supported | |
NumCompletions |
Supported | Supported | |
ToolChoice |
Supported | Supported | Supported |
MinP |
Supported | ||
TopK |
Supported | ||
TailFreeSamplingZ |
Supported |
inf
(default) | positive integer
Specify the maximum number of tokens to generate.
[]
(default) | integer
Specify a random seed to ensure deterministic outputs.
model.Temperature
(default) | numeric scalar between 0
and 2
Temperature value for controlling the randomness of the output. Higher temperature increases the randomness of the output.
model.TopP
(default) | numeric scalar between 0
and 1
Top probability mass for controlling the diversity of the generated output. Higher top probability mass corresponds to higher diversity.
model.StopSequences
(default) | string array with between 0
and 4
elements
Sequences that stop generation of tokens.
Example: ["The end.","And that is all she wrote."]
model.TimeOut
(default) | nonnegative numeric scalar
If the server does not respond within the timeout, then the function throws an error.
model.StreamFun
(default) | function handle
Specify a custom streaming function to process the generated output as it is being generated, rather than having to wait for the end of the generation. For example, you can use this function to print the output as it is generated.
For an example, see Process Generated Text in Real Time by Using ChatGPT™ in Streaming Mode.
Example: @(token) fprint("%s",token)
model.ResponseFormat
(default) | "text"
| "json"
| string scalar | structure array
After construction, this property is read-only.
Format of the generatedOutput
output argument of the generate
function. You can request unformatted output, JSON mode, or structured output.
If you set the response format to "text"
, then the generated output is an unformatted string.
If you set the response format to "json"
, then the generated output is a formatted string containing JSON encoded data.
To configure the format of the generated JSON file, describe the format using natural language and provide it to the model either in the system prompt or as a user message. The prompt or message describing the format must contain the word "json"
or "JSON"
.
The JSON response format is not supported for these models:
"gpt-4"
"gpt-4-0613"
"o1-preview"
"o1-mini"
To ensure that the model follows the required format, use structured output. To do this, set ReponseFormat
to:
- A string scalar containing a valid JSON Schema.
- A structure array containing an example that adheres to the required format, for example:
ResponseFormat=struct("Name","Rudolph","NoseColor",[255 0 0])
Structured output is only supported for models "gpt-4o-mini"
, "gpt-4o-mini-2024-07-18"
, "gpt-4o-2024-08-06"
and later.
model.ModelName
(default) | "gpt-4o-mini"
| "gpt-4"
| "gpt-3.5-turbo"
| "dall-e-2"
| ...
Name of the OpenAI or Ollama model to use for text generation.
To use an Ollama model, first install it following the instructions at https://ollama.com/library.
This option is only supported for openAIChat
and ollamaChat
objects.
model.PresencePenalty
(default) | numeric scalar between -2
and 2
Penalty value for using a token that has already been used at least once in the generated output. Higher values reduce the repetition of tokens. Negative values increase the repetition of tokens.
The presence penalty is independent of the number of incidents of a token, so long as it has been used at least once. To increase the penalty for every additional time a token is generated, use the FrequencyPenalty
name-value argument.
This option is only supported for these chat completion APIs:
openAIChat
objectsazureChat
objects
model.FrequencyPenalty
(default) | numeric scalar between -2
and 2
Penalty value for repeatedly using the same token in the generated output. Higher values reduce the repetition of tokens. Negative values increase the repetition of tokens.
The frequency penalty increases with every instance of a token in the generated output. To use a constant penalty for a repeated token, independent of the number of instances that token is generated, use the PresencePenalty
name-value argument.
This option is only supported for these chat completion APIs:
openAIChat
objectsazureChat
objects
model.NumCompletions
(default) | positive integer
Specify the number of outputs to generate.
This option is only supported for these chat completion APIs:
openAIChat
objectsazureChat
objects
model.ToolChoice
(default) | "auto"
| "none"
| openAIFunction
object | array of openAIFunction
objects
OpenAI functions to call during output generation. For more information on OpenAI function calling, see openAIFunction
.
If the tool choice is "auto"
, then any function calls specified in chat
are executed during generation. To see whether any function calls are specified, check the FunctionNames
property of chat
.
If the tool choice is "none"
, then no function call is executed during generation.
You can also specify one or more openAIFunction
objects directly.
This option is only supported for these chat completion APIs:
openAIChat
objectsazureChat
objects
model.MinP
(default) | numeric scalar between 0
and 1
Tune the frequency of improbable tokens in generated output using min-p sampling. Higher minimum probability ratio corresponds to lower diversity.
This option is only supported for ollamaChat
objects.
model.TopK
(default) | positive numeric scalar
Sample only from the TopK
most likely next tokens for each token during generation. Higher top-k sampling corresponds to higher diversity.
This option is only supported for ollamaChat
objects.
model.TailFreeSamplingZ
(default) | numeric scalar
Tune the frequency of improbable tokens in generated output. Higher tail free sampling corresponds to lower diversity. If TailFreeSamplingZ
is set to 1
, then the model does not use this sampling technique.
This option is only supported for ollamaChat
objects.
string scalar
Text that the model generates, returned as a string.
structure array
Complete output that the model generates, returned as a structure array.
The type and name of the fields in the structure depend on the API, the model, whether you use function calls, and whether you stream the output.
matlab.net.http.ResponseMessage
object
Response message returned by the server, specified as a matlab.net.http.ResponseMessage
object.
openAIChat
| ollamaChat
| azureChat
| messageHistory
- Create Simple Chat Bot
- Create Simple Ollama Chat Bot
- Analyze Scientific Papers Using Function Calls
- Retrieval Augmented Generation Using Ollama and MATLAB
Copyright 2024 The MathWorks, Inc.