-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce GGMLBlock and implement SVD(Broken) #159
Conversation
this is amazing news, thank u so much for ur hard work leejet. can't wait u guys fix svd and try it on here on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont |
Doing batch inference will only be reserved for when you have a lot of VRAM. I think now we will be able to perform a single computation of a UNet in which a batch of conditionals |
I was planning to refactor Stable Diffusion to have an API similar to llama.cpp and also support offloading, computing sd, and controlnet on the GPU with low VRAM. However, upon reviewing this refactoring, I think it's better to just extend what I need to make a web UI work. |
@FSSRepo Basically this PR that I'm implementing: #157 |
It is understandable that problems will arise, ggml is a lib mainly to support llama. However, for me, ggml has several advantages that cannot be ignored compared to pytorch. It is small enough (pytorch cuda dependency is about 1GB), and it supports quantification very well, and supports windows rocm. |
GGMLBlock is very educational and this implementation is great. |
For svd, batch inference is a must, the ne3 is actually batch size * num video frames. |
did you use the fp16-fix vae? |
I found a problem when using it now. Regarding generating seeds, if the seed is 42, then the generated pictures are correct. |
What shocks me is that for a 768x768 image (sdxl-turbo) on 7900xtx, a single sampling only takes 0.35s. It seems that the performance bottleneck lies in the decoding operation. |
im using taesdxl and it works great, with taesdxl i dont need to use vae fp16 fix and generated images are great with 1 step and lcm sampler for sdxl turbo, however, what i meant was converting the models to q4.1 gguf files with -m convert command in cmd for having smaller models which gave me errors after generating image [the same as converting safetensors topic in issues if i remember] |
@Cyberhan123 I got same result in sd-webui using seed 297003140. |
I will merge this PR even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. I have other changes that rely on GGMLBlock, such as adding support for stable cascade. I will try to fix the SVD issue later if I have time. |
In the past few weeks during my free time, I've been working on implementing this PR. In this PR, I introduced the GGMLBlock, making it easier to implement neural networks. In most cases, it's straightforward to implement the corresponding GGMLBlock from nn.Module. I have implemented the majority of the building blocks for SVD and the SVD pipeline, except for VScalingWithEDMcNoise, which is also relatively simple to implement.
However, I've started to feel fatigued. ggml's batch inference implementation for certain operators has issues, and although I've addressed some problems in this branch https://github.com/leejet/ggml/tree/batch-inference, it's not entirely resolved. Furthermore, there are situations where NaN occurs in the implementation of some operators, and these issues also need to be fixed. If I have time in the future, I'll continue addressing these issues with ggml. However, for now, I'll be allocating my free time to other tasks as I've already invested a considerable amount of effort in implementing this PR over the past few weeks. Perhaps I'll merge this PR first, even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. The test results for batch inference are documented in the comments of the test functions in unet.hpp/vae.hpp; take a look if you're interested.