Add support for APG (adaptive projected guidance) + unconditionnal SLG #593
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements this paper: https://arxiv.org/abs/2410.02416
TLDR:
APG is a set of 3 modilfications for CFG:
--apg-momentum
)--apg-nt
)out_uncond-out_cond
) is orthogonally projected on the same "direction" asout_cond
. The final update is linearly interpolated between the original update and the projected update with the parameter "eta" (--apg-eta
)No extra forward pass is required, so the performance cost is negligible.
Thanks mostly to the normalization, but also the projection, this allows to take adventage of very large CFG scales without getting deep-fried output images. I'm not sure how usefull the reverse momentum really is, but it was in the paper so I added it too (I think it prevents the CFG from going too much "in the same direction" at every step?).
Usage
[your usual command with cfg here] --apg-eta 0 --apg-nt 5 --apg-momentum -0.5
Recommanded values:
Feel free to play around with the settings, going outside of the recommended ranges can have interesting effects, especially with eta and momentum.
I also added an experimental smoothing parameter ($[0,1]$ range get mapped to $1$ .
--apg-nt-smoothing
) for the normalization. In the paper they're using a "saturate" function (min(1,threshold/norm)
), which has two potential issues: it has a kink (not continuously differentiable), and is not invertible as all input values outside of theThis experimental feature remplaces the$min(1,x)$ function with $\frac{x}{\left(1+x^{\frac{1}{p}}\right)^{p}}$ , which is smooth and invertible. It is equivalent to $f(x)=x$ for small values of $x$ (just like the min) and perfectly approximates to the original $min(1,x)$ as the value of $p$ goes to $0$ .
Edit: I also added unconditionnal SLG (
--slg-uncond
) (I stole the idea from deepbeepmeep/Wan2GP#61)Just a simpler version of SLG (Skip Layer Guidance, introduced in #451) for DiT models.
Default SLG requires a third forward pass of the network with some layers skipped. This increase the computing time by a bit under 50% for the SLG steps, wich isn't ideal.
Unconditionnal SLG skips layers during the same unconditionnal pass used for CFG/APG. It seems to be about as effective as normal SLG, but it's even faster than CFG, thanks to the layers being skipped.
Downside: it's less flexible,
--slg-scale
should be kept to 0 and--cfg-scale
now controls both the CFG and the SLG.Upside: It's faster.
setting both
--slg-scale != 0
and--slg-uncond
at the same time will most likely degrade image quality while using more compute. It's possible, but not recommended. (Maybe it could be worth to investigate skipping a different sets of layers with normal slg and unconditionnal slg, but we're getting too far out of scope for this PR)