feat: introduce GGMLBlock and implement SVD(Broken) #159

leejet · 2024-01-27T12:13:14Z

In the past few weeks during my free time, I've been working on implementing this PR. In this PR, I introduced the GGMLBlock, making it easier to implement neural networks. In most cases, it's straightforward to implement the corresponding GGMLBlock from nn.Module. I have implemented the majority of the building blocks for SVD and the SVD pipeline, except for VScalingWithEDMcNoise, which is also relatively simple to implement.

However, I've started to feel fatigued. ggml's batch inference implementation for certain operators has issues, and although I've addressed some problems in this branch https://github.com/leejet/ggml/tree/batch-inference, it's not entirely resolved. Furthermore, there are situations where NaN occurs in the implementation of some operators, and these issues also need to be fixed. If I have time in the future, I'll continue addressing these issues with ggml. However, for now, I'll be allocating my free time to other tasks as I've already invested a considerable amount of effort in implementing this PR over the past few weeks. Perhaps I'll merge this PR first, even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. The test results for batch inference are documented in the comments of the test functions in unet.hpp/vae.hpp; take a look if you're interested.

Amin456789 · 2024-01-27T17:02:11Z

this is amazing news, thank u so much for ur hard work leejet. can't wait u guys fix svd and try it on here

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

FSSRepo · 2024-01-27T17:12:56Z

Doing batch inference will only be reserved for when you have a lot of VRAM. I think now we will be able to perform a single computation of a UNet in which a batch of conditionals c = [c, uc] is used.

FSSRepo · 2024-01-27T17:16:00Z

I was planning to refactor Stable Diffusion to have an API similar to llama.cpp and also support offloading, computing sd, and controlnet on the GPU with low VRAM. However, upon reviewing this refactoring, I think it's better to just extend what I need to make a web UI work.

Cyberhan123 · 2024-01-28T00:33:15Z

I was planning to refactor Stable Diffusion to have an API similar to llama.cpp and also support offloading, computing sd, and controlnet on the GPU with low VRAM. However, upon reviewing this refactoring, I think it's better to just extend what I need to make a web UI work.

@FSSRepo Basically this PR that I'm implementing: #157
I split the loading logic of clip, vae, and unet, and then added the set_options api. I think we did the same thing.

Cyberhan123 · 2024-01-28T00:44:12Z

It is understandable that problems will arise, ggml is a lib mainly to support llama. However, for me, ggml has several advantages that cannot be ignored compared to pytorch. It is small enough (pytorch cuda dependency is about 1GB), and it supports quantification very well, and supports windows rocm.

Cyberhan123 · 2024-01-28T01:03:27Z

GGMLBlock is very educational and this implementation is great.

leejet · 2024-01-28T02:14:19Z

Doing batch inference will only be reserved for when you have a lot of VRAM. I think now we will be able to perform a single computation of a UNet in which a batch of conditionals c = [c, uc] is used.

For svd, batch inference is a must, the ne3 is actually batch size * num video frames.

leejet · 2024-01-28T02:16:08Z

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

did you use the fp16-fix vae?

Cyberhan123 · 2024-01-28T05:35:39Z

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

did you use the fp16-fix vae?

I found a problem when using it now. Regarding generating seeds, if the seed is 42, then the generated pictures are correct.
But if the seeds are random, the pictures will often be generated very strangely. I don’t know much about the behavior on pytorch.

seed 42:

random seed

Cyberhan123 · 2024-01-28T05:46:26Z

What shocks me is that for a 768x768 image (sdxl-turbo) on 7900xtx, a single sampling only takes 0.35s. It seems that the performance bottleneck lies in the decoding operation.

Amin456789 · 2024-01-28T08:20:26Z

im using taesdxl and it works great, with taesdxl i dont need to use vae fp16 fix and generated images are great with 1 step and lcm sampler for sdxl turbo, however, what i meant was converting the models to q4.1 gguf files with -m convert command in cmd for having smaller models which gave me errors after generating image [the same as converting safetensors topic in issues if i remember]
converting on the fly and making images works great but -m convert --type to quantize like q4.1 somehow curropt the model sdxl turbo i think

leejet · 2024-01-29T14:44:47Z

But if the seeds are random, the pictures will often be generated very strangely

@Cyberhan123 I got same result in sd-webui using seed 297003140.

leejet · 2024-02-24T12:05:12Z

I will merge this PR even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. I have other changes that rely on GGMLBlock, such as adding support for stable cascade. I will try to fix the SVD issue later if I have time.

leejet added 2 commits January 27, 2024 19:49

introduce GGMLBlock and implement SVD(Broken)

d7990ea

static std::vector<int> flat_index_to_indices

3aa99ed

leejet added 4 commits February 24, 2024 19:31

Merge branch 'master' into svd

9f7c92d

make IMG2VID unusable because it's broken

f718014

rm tensor.hpp

9566f59

add sdxl vae warning

2467fd6

leejet merged commit b636886 into master Feb 24, 2024
7 checks passed

leejet mentioned this pull request Feb 24, 2024

Add TencentARC PhotoMaker support #179

Merged

Cyberhan123 mentioned this pull request Feb 25, 2024

add progress callback, supress pretty_progress #170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce GGMLBlock and implement SVD(Broken) #159

feat: introduce GGMLBlock and implement SVD(Broken) #159

leejet commented Jan 27, 2024

Amin456789 commented Jan 27, 2024 •

edited

Loading

FSSRepo commented Jan 27, 2024

FSSRepo commented Jan 27, 2024 •

edited

Loading

Cyberhan123 commented Jan 28, 2024

Cyberhan123 commented Jan 28, 2024 •

edited

Loading

Cyberhan123 commented Jan 28, 2024

leejet commented Jan 28, 2024

leejet commented Jan 28, 2024

Cyberhan123 commented Jan 28, 2024 •

edited

Loading

Cyberhan123 commented Jan 28, 2024

Amin456789 commented Jan 28, 2024 •

edited

Loading

leejet commented Jan 29, 2024

leejet commented Feb 24, 2024

feat: introduce GGMLBlock and implement SVD(Broken) #159

feat: introduce GGMLBlock and implement SVD(Broken) #159

Conversation

leejet commented Jan 27, 2024

Amin456789 commented Jan 27, 2024 • edited Loading

FSSRepo commented Jan 27, 2024

FSSRepo commented Jan 27, 2024 • edited Loading

Cyberhan123 commented Jan 28, 2024

Cyberhan123 commented Jan 28, 2024 • edited Loading

Cyberhan123 commented Jan 28, 2024

leejet commented Jan 28, 2024

leejet commented Jan 28, 2024

Cyberhan123 commented Jan 28, 2024 • edited Loading

Cyberhan123 commented Jan 28, 2024

Amin456789 commented Jan 28, 2024 • edited Loading

leejet commented Jan 29, 2024

leejet commented Feb 24, 2024

Amin456789 commented Jan 27, 2024 •

edited

Loading

FSSRepo commented Jan 27, 2024 •

edited

Loading

Cyberhan123 commented Jan 28, 2024 •

edited

Loading

Cyberhan123 commented Jan 28, 2024 •

edited

Loading

Amin456789 commented Jan 28, 2024 •

edited

Loading