Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support handoff/reentrant functions #6098

Open
SteveSandersonMS opened this issue Mar 13, 2025 · 3 comments
Open

Support handoff/reentrant functions #6098

SteveSandersonMS opened this issue Mar 13, 2025 · 3 comments
Assignees
Labels
area-ai Microsoft.Extensions.AI libraries untriaged

Comments

@SteveSandersonMS
Copy link
Member

After looking into some agent SDKs and approaches people are using to combine M.E.AI with durable orchestration, I think there's a scenario we're missing with AIFunction.

Today, FunctionInvokingChatClient assumes that every FunctionCallContent corresponds to an AIFunction that it can invoke using InvokeAsync, and that the function calling loop should continue until all the FunctionCallContent items have been resolved with a FunctionResultContent. That is, FunctionInvokingChatClient remains in control of the execution flow until it's finished resolving all the function calls.

This doesn't work well in two scenarios:

  • Handoff: if we wanted an equivalent to OpenAI Agents SDK's handoff concept, then we'd need to allow certain function calls to signal "stop processing and use application logic to start a new call".
    • For example, the current LLM call might be to an agent that wants to transfer control to a different agent, and it signals that by returning a "handoff" function call that can include some additional context. Application logic is then expected to begin an entirely new LLM calling cycle, potentially to a different endpoint URL, transferring some of the context from the prior conversation into the new call. Right now, M.E.AI doesn't model that well because FunctionInvokingChatClient won't yield control flow back to application logic until it's finished resolving all the calls, and it's not meaningful to resolve a handoff function. To be clear, handoff is not the same as delegation (i.e., one LLM call recursively nested inside another, which could be modelled by InvokeAsync) - it's explicitly an end to the current control flow so that a new one can take over instead.
    • This works very easily if you don't have FunctionInvokingChatClient, because control over flow naturally returns to app logic every time there's a function call. But FunctionInvokingChatClient gets in the way of that by insisting on resolving all function calls.
    • The closest we have right now is something like making a custom AIFunction subclass whose InvokeAsync throws a custom HandoffException that would get caught by application code. But that's very dangerous because who knows what state things are in when the exception is caught?
  • Durable execution: Some AIFunction calls may be very long-running. They might take hours or days. During this time, the host process does not want to keep running, but rather wants to terminate and get restarted when the AIFunction call result later becomes available. As such it's undesirable for us to await function.InvokeAsync(...) because that implies the host process needs to stay alive the whole time it's waiting.
    • Instead, it would be nice if for certain functions, we returned control over function invocation to the application code, so it could schedule execution of the long-running process using mechanisms from its host environment (e.g., Azure Durable Functions). Later, a further incarnation of the host process will run and will have the function calling result, and it needs an easy way to resume the LLM call starting from having both the FCC and FRC in the call history.
      • To some extent, Azure Durable Functions enables this transparently anyway, in that when the await function.InvokeAsync() call happens, it can terminate the host process without waiting, scheduling a restart when the call result is available. And then in theory you could use UseCaching to make all the IChatClient.GenerateAsync calls replay the same results on the next host instance up to the point where the long-running call happens, at which time Azure Durable Functions should immediately supply the result and execution would resume. However this is very complicated and is quite Azure Durable Functions-specific. We could make it easier for application code to control this if we had a way to let it take charge of resolving a particular function call.
    • This is really the same requirement as for "handoff", i.e., the ability for certain functions not to be resolved by FunctionInvokingChatClient, but rather to return control to application code that can track the state somewhere for later resumption.

Strawman design

We could define some new tool type, e.g.,:

public class AIMetaFunction : AITool // Other name suggestions welcomed!
{
    public virtual JsonElement JsonSchema => AIJsonUtilities.DefaultJsonSchema;
}
var handoffTool = AIFunctionFactory.CreateMetaFunction(name: "handoff_to_other_agent", schema: ...);

When FunctionInvokingChatClient maps a FCC to an AIMetaFunction, it doesn't attempt to call the function. It can't do, since there's no InvokeAsync. Instead it just terminates the function calling loop immediately, returning the response with the FCC to application code as if you didn't have FunctionInvokingChatClient in the pipeline at all.

Application code can then do whatever it wants to follow up, e.g., adding an FRC to the history and making a new GetResponseAsync call (either to the same IChatClient, possibly at an arbitrary point in the distant future, or to a different IChatClient representing a different agent or whatever).

What about mixed results?

The rule above seems quite simple, but it's tricky if the LLM's response contains multiple FCCs, some of which are metafunctions and others are regular ones. Does FunctionInvokingChatClient resolve the regular ones before returning control?

I think it probably would have to yield control immediately if there are any metafunction calls, ignoring the regular FCCs. Because otherwise what will be in the ChatResponse? It will be a combination of messages from the LLM and other messages from the FunctionInvokingChatClient, which is extremely strange as they represent different directions in the conversation.

If FunctionInvokingChatClient were just to exit the function calling loop immediately if there are any meta-FCCs, then application code could:

  • Add the response to history as normal
  • Then also start a new call containing FRCs for any metafunction calls, ignoring regular FCCs

... and then we'd want to update FunctionInvokingChatClient so that before making its initial call through to the inner client, it checks if the conversation contains unresolved regular FCCs, and if so, resolve those in the normal way before passing through to the inner client. Effectively this is the same as the logic it already has, except that we'd enter the function calling loop after the initial call to the inner client. It might be a relatively trivial tweak, though I haven't implemented it to check.

@github-actions github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Mar 13, 2025
@stephentoub
Copy link
Member

We've had a need in some other cases for functions that only describe themselves but for which they needn't be invoked, eg functions originating in one process and flowing through to another that does the llm call, or like https://github.com/microsoft/semantic-kernel/blob/28e9d11a98456a105165a00e2b246b77dbd022ee/dotnet/src/SemanticKernel.Abstractions/AI/ChatCompletion/AIFunctionKernelFunction.cs#L22 where it's just a marker that's handled manually.

What if instead of modeling this as a separate kind of tool we instead model it as a hierarchy and describe the additional capability the current function type has: invocability. So you have InvocableAIFunction : AIFunction : AITool, where AIFunction adds the schema and InvocableAIFunction adds InvokeAsync. Leaf clients generally shouldn't care about the distinction, as they only want to describe the function to the llm. Most consumers shouldn't care either, as it's rarer to manually invoke these. AIFunctionFactory.Create would return strongly-typed InvocableAIFunction. FICC would distinguish between the two.

@SteveSandersonMS
Copy link
Member Author

SteveSandersonMS commented Mar 13, 2025

That sounds good from the "creating AIFunctions" side, though might be tricker from the "resolving FCCs side". In the case where an LLM returns a mixture of invokable a non-invokable (e.g., handoff calls) FCCs, we'd expect application code to provide FRCs for the non-invokable FCCs but to leave the invokable ones alone.

So the application code would need logic like:

foreach (var fcc in response.Messages.SelectMany(m => m.Contents).OfType<FunctionCallContent>())
{
    var tool = chatOptions.Tools.FirstOrDefault(t => t.Name == fcc.Name);
    if (tool is AIFunction aiFunction && tool is not AIInvokableFunction)
    {
        // ... Use app-specific logic to resolve this and produce an FRC for a subsequent call
    }
}

The is not AIInvokableFunction part of this is very nonobvious. And it will break if we later introduce some other AIFunction subclass that has some other meaning. So there is potentially a case for having an AITool or AIFunction subclass that explicitly is meant to be resolved by application code, so apps can just do if (tool is ThatType) { /* resolve it */ }.

I agree it's technically possible either way. Just trying to contemplate the ergonomics of usage. And this all depends on how we handle the mixed-invokable-and-noninvocable-FCCs case anyway.

Another way is a helper to identify the unresolved FCCs that applications are expected to deal with, e.g.:

foreach (var fcc in response.GetUnresolvedFunctionCalls())
{
    // ... Use app-specific logic to resolve this and produce an FRC for a subsequent call
}

@luisquintanilla
Copy link
Contributor

This may be related to #5999 as well.

I believe the goal of AIParserFunction in this code base was to have a similar effect as the proposed var handoffTool = AIFunctionFactory.CreateMetaFunction(name: "handoff_to_other_agent", schema: ...);

https://github.com/SteveSandersonMS/dotnet-ai-workshop/blob/2ba92886a451faf568d5cc3a9f8eb84740e8d338/exercises/CorrectiveRetrievalAugmentedGeneration/End/StructuredPrediction/AIParserFunction.cs#L6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-ai Microsoft.Extensions.AI libraries untriaged
Projects
None yet
Development

No branches or pull requests

3 participants