-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support handoff/reentrant functions #6098
Comments
We've had a need in some other cases for functions that only describe themselves but for which they needn't be invoked, eg functions originating in one process and flowing through to another that does the llm call, or like https://github.com/microsoft/semantic-kernel/blob/28e9d11a98456a105165a00e2b246b77dbd022ee/dotnet/src/SemanticKernel.Abstractions/AI/ChatCompletion/AIFunctionKernelFunction.cs#L22 where it's just a marker that's handled manually. What if instead of modeling this as a separate kind of tool we instead model it as a hierarchy and describe the additional capability the current function type has: invocability. So you have InvocableAIFunction : AIFunction : AITool, where AIFunction adds the schema and InvocableAIFunction adds InvokeAsync. Leaf clients generally shouldn't care about the distinction, as they only want to describe the function to the llm. Most consumers shouldn't care either, as it's rarer to manually invoke these. AIFunctionFactory.Create would return strongly-typed InvocableAIFunction. FICC would distinguish between the two. |
That sounds good from the "creating AIFunctions" side, though might be tricker from the "resolving FCCs side". In the case where an LLM returns a mixture of invokable a non-invokable (e.g., handoff calls) FCCs, we'd expect application code to provide FRCs for the non-invokable FCCs but to leave the invokable ones alone. So the application code would need logic like: foreach (var fcc in response.Messages.SelectMany(m => m.Contents).OfType<FunctionCallContent>())
{
var tool = chatOptions.Tools.FirstOrDefault(t => t.Name == fcc.Name);
if (tool is AIFunction aiFunction && tool is not AIInvokableFunction)
{
// ... Use app-specific logic to resolve this and produce an FRC for a subsequent call
}
} The I agree it's technically possible either way. Just trying to contemplate the ergonomics of usage. And this all depends on how we handle the mixed-invokable-and-noninvocable-FCCs case anyway. Another way is a helper to identify the unresolved FCCs that applications are expected to deal with, e.g.: foreach (var fcc in response.GetUnresolvedFunctionCalls())
{
// ... Use app-specific logic to resolve this and produce an FRC for a subsequent call
} |
This may be related to #5999 as well. I believe the goal of |
After looking into some agent SDKs and approaches people are using to combine M.E.AI with durable orchestration, I think there's a scenario we're missing with
AIFunction
.Today,
FunctionInvokingChatClient
assumes that everyFunctionCallContent
corresponds to anAIFunction
that it can invoke usingInvokeAsync
, and that the function calling loop should continue until all theFunctionCallContent
items have been resolved with aFunctionResultContent
. That is,FunctionInvokingChatClient
remains in control of the execution flow until it's finished resolving all the function calls.This doesn't work well in two scenarios:
FunctionInvokingChatClient
won't yield control flow back to application logic until it's finished resolving all the calls, and it's not meaningful to resolve a handoff function. To be clear, handoff is not the same as delegation (i.e., one LLM call recursively nested inside another, which could be modelled byInvokeAsync
) - it's explicitly an end to the current control flow so that a new one can take over instead.FunctionInvokingChatClient
, because control over flow naturally returns to app logic every time there's a function call. ButFunctionInvokingChatClient
gets in the way of that by insisting on resolving all function calls.AIFunction
subclass whoseInvokeAsync
throws a customHandoffException
that would get caught by application code. But that's very dangerous because who knows what state things are in when the exception is caught?AIFunction
calls may be very long-running. They might take hours or days. During this time, the host process does not want to keep running, but rather wants to terminate and get restarted when theAIFunction
call result later becomes available. As such it's undesirable for us toawait function.InvokeAsync(...)
because that implies the host process needs to stay alive the whole time it's waiting.await function.InvokeAsync()
call happens, it can terminate the host process without waiting, scheduling a restart when the call result is available. And then in theory you could useUseCaching
to make all theIChatClient.GenerateAsync
calls replay the same results on the next host instance up to the point where the long-running call happens, at which time Azure Durable Functions should immediately supply the result and execution would resume. However this is very complicated and is quite Azure Durable Functions-specific. We could make it easier for application code to control this if we had a way to let it take charge of resolving a particular function call.FunctionInvokingChatClient
, but rather to return control to application code that can track the state somewhere for later resumption.Strawman design
We could define some new tool type, e.g.,:
When
FunctionInvokingChatClient
maps a FCC to anAIMetaFunction
, it doesn't attempt to call the function. It can't do, since there's noInvokeAsync
. Instead it just terminates the function calling loop immediately, returning the response with the FCC to application code as if you didn't haveFunctionInvokingChatClient
in the pipeline at all.Application code can then do whatever it wants to follow up, e.g., adding an FRC to the history and making a new
GetResponseAsync
call (either to the sameIChatClient
, possibly at an arbitrary point in the distant future, or to a differentIChatClient
representing a different agent or whatever).What about mixed results?
The rule above seems quite simple, but it's tricky if the LLM's response contains multiple FCCs, some of which are metafunctions and others are regular ones. Does
FunctionInvokingChatClient
resolve the regular ones before returning control?I think it probably would have to yield control immediately if there are any metafunction calls, ignoring the regular FCCs. Because otherwise what will be in the
ChatResponse
? It will be a combination of messages from the LLM and other messages from theFunctionInvokingChatClient
, which is extremely strange as they represent different directions in the conversation.If
FunctionInvokingChatClient
were just to exit the function calling loop immediately if there are any meta-FCCs, then application code could:... and then we'd want to update
FunctionInvokingChatClient
so that before making its initial call through to the inner client, it checks if the conversation contains unresolved regular FCCs, and if so, resolve those in the normal way before passing through to the inner client. Effectively this is the same as the logic it already has, except that we'd enter the function calling loop after the initial call to the inner client. It might be a relatively trivial tweak, though I haven't implemented it to check.The text was updated successfully, but these errors were encountered: