Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

photoMaker issue with two or more generated images / SDXL sample steps #207

Closed
Jonathhhan opened this issue Mar 19, 2024 · 12 comments
Closed

Comments

@Jonathhhan
Copy link

Jonathhhan commented Mar 19, 2024

Very nice feature, thanks. It seems that only the first generated image works after loading the sd_ctx (multiple images work with batch size > 1).

I used the Newton images from the example and the prompt: "man img, man with futuristic clothes".

This is the first image:
ofxStableDiffusion-2024-03-19-03-33-18

And this is the second image:
ofxStableDiffusion-2024-03-19-03-34-24

And with SDXL-Turbo photoMaker seems to need less than the fixed sample 50 steps...

@Jonathhhan Jonathhhan changed the title photoMaker photoMaker issues with batch size and multiple images Mar 19, 2024
@Jonathhhan Jonathhhan changed the title photoMaker issues with batch size and multiple images photoMaker issue with two or more generated images / SDXL sample steps Mar 19, 2024
@Green-Sky
Copy link
Contributor

It looks like the cfg scale was too high for the first image.

@Jonathhhan
Copy link
Author

Jonathhhan commented Mar 19, 2024

@Green-Sky yes, I used 7 and not the recommended 5.

@bssrdf
Copy link
Contributor

bssrdf commented Mar 19, 2024

@Jonathhhan, could you provide the full command line with SDXL and Photomaker model files? In particular, did you use the file from https://huggingface.co/bssrdf/PhotoMaker?

Here are what I can generate using Newton example images and your prompt with batch size 2.

bin/sd -m ../models/RealVisXL_V3.0.safetensors  --stacked-id-embd-dir ../models/photomaker-v1.safetensors --input-id-images-dir examples/newton_man -p "man img, man with futuristic clothes"  --cfg-scale 7 --sampling-method euler -H 1024 -W 1024  -b 2 -o newton_issu01.png
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:165  - loading model from '../models/RealVisXL_V3.0.safetensors'
[INFO ] model.cpp:705  - load ../models/RealVisXL_V3.0.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:188  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:194  - Stable Diffusion weight type: f16
[WARN ] stable-diffusion.cpp:200  - !!!It looks like you are using SDXL model. If you find that the generated images are completely black, try specifying SDXL VAE FP16 Fix with the --vae parameter. You can find it here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors
[INFO ] model.cpp:705  - load ../models/photomaker-v1.safetensors using safetensors format
[INFO ] lora.hpp:38   - loading LoRA from '../models/photomaker-v1.safetensors'
[INFO ] stable-diffusion.cpp:275  - loading stacked ID embedding (PHOTOMAKER) model file from '../models/photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load ../models/photomaker-v1.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:400  - total params memory size = 7182.38MB (VRAM 7182.38MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(VRAM)
[INFO ] stable-diffusion.cpp:419  - loading model from '../models/RealVisXL_V3.0.safetensors' completed, taking 88.15s
[INFO ] stable-diffusion.cpp:436  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'examples/newton_man/newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'examples/newton_man/newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'examples/newton_man/newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'examples/newton_man/newton_3.jpg'
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.09s
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 548 ms
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 20 to 50 for PHOTOMAKER
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 157 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/2 - seed 42
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
  |==================================================| 50/50 - 1.84it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 27.58s
[INFO ] stable-diffusion.cpp:1732 - generating image: 2/2 - seed 43
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
  |==================================================| 50/50 - 1.79it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 27.52s
[INFO ] stable-diffusion.cpp:1777 - generating 2 latent images completed, taking 55.12s
[INFO ] stable-diffusion.cpp:1779 - decoding 2 latents
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.15s
[INFO ] stable-diffusion.cpp:1789 - latent 2 decoded, taking 1.17s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 2.31s
[INFO ] stable-diffusion.cpp:1810 - txt2img completed in 57.60s
save result image to 'newton_issu01.png'
save result image to 'newton_issu01_2.png'
double free or corruption (fasttop)
Aborted

newton_issu01
newton_issu01_2

They look fine.

@Jonathhhan
Copy link
Author

Jonathhhan commented Mar 19, 2024

@bssrdf batch processing works fine. The issue appears, if I run txt2img for a second time without reloading the sd_ctx. The console output looks exactly the same for both runs:

System Info:
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
New BaseEngine 00000202288E6220
New GLFWEngine 00000202288E6220
[DEBUG] stable-diffusion.cpp:145  - Using CUDA backend
[notice ] EngineGLFW::setup(): Replaced the openFrameworks' GLFW event listeners by the imgui_impl_glfw ones. You will not have multi-window nor multi-context support. This can be enabled by defining OFXIMGUI_GLFW_FIX_MULTICONTEXT_PRIMARY_VP=1.
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:165  - loading model from 'data/models/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load data/models/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:176  - loading vae from 'data/models/vae/vae.safetensors'
[INFO ] model.cpp:705  - load data/models/vae/vae.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/vae/vae.safetensors'
[INFO ] stable-diffusion.cpp:188  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:194  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:195  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:884  - clip params backend buffer size =  1564.36 MB(VRAM) (713 tensors)
[DEBUG] ggml_extend.hpp:884  - unet params backend buffer size =  4900.07 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:884  - vae params backend buffer size =  159.68 MB(VRAM) (248 tensors)
[INFO ] model.cpp:705  - load data/models/photomaker/photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/photomaker/photomaker-v1.safetensors'
[INFO ] lora.hpp:38   - loading LoRA from 'data/models/photomaker/photomaker-v1.safetensors'
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:884  - lora params backend buffer size =  354.38 MB(VRAM) (10240 tensors)
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[DEBUG] lora.hpp:74   - finished loaded lora
[INFO ] stable-diffusion.cpp:275  - loading stacked ID embedding (PHOTOMAKER) model file from 'data/models/photomaker/photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load data/models/photomaker/photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/photomaker/photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:884  - pmid params backend buffer size =  623.48 MB(VRAM) (407 tensors)
[DEBUG] stable-diffusion.cpp:296  - loading vocab
[DEBUG] clip.hpp:164  - vocab size: 49408
[DEBUG] clip.hpp:175  -  trigger word img already in vocab
[DEBUG] stable-diffusion.cpp:316  - loading weights
[DEBUG] model.cpp:1343 - loading tensors from data/models/sd_xl_base_1.0.safetensors
[DEBUG] model.cpp:1343 - loading tensors from data/models/vae/vae.safetensors
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[INFO ] stable-diffusion.cpp:415  - total params memory size = 7247.59MB (VRAM 7247.59MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 159.68MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(VRAM)
[INFO ] stable-diffusion.cpp:419  - loading model from 'data/models/sd_xl_base_1.0.safetensors' completed, taking 4.77s
[INFO ] stable-diffusion.cpp:436  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:464  - finished loaded file
[DEBUG] upscaler.cpp:19   - Using CUDA backend
[INFO ] upscaler.cpp:32   - Upscaler weight type: f16
[INFO ] esrgan.hpp:164  - loading esrgan from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
[DEBUG] ggml_extend.hpp:884  - esrgan params backend buffer size =   8.53 MB(VRAM) (192 tensors)
[INFO ] model.cpp:708  - load data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth using checkpoint format
[DEBUG] model.cpp:1221 - init from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
[DEBUG] model.cpp:1343 - loading tensors from data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth
[INFO ] esrgan.hpp:183  - esrgan model loaded


[DEBUG] stable-diffusion.cpp:1551 - txt2img 1024x1024
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_3.jpg'
[DEBUG] stable-diffusion.cpp:1597 - prompt after extract and remove lora: "man img, man with futuristic clothes"
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[DEBUG] ggml_extend.hpp:835  - lora compute buffer size: 20.50 MB(VRAM)
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.28s
[DEBUG] clip.hpp:1222 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 86 ms
[DEBUG] ggml_extend.hpp:835  - pmid compute buffer size: 40.31 MB(VRAM)
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 161 ms
[DEBUG] clip.hpp:1328 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 15 to 50 for PHOTOMAKER
[DEBUG] clip.hpp:1328 - parse 'man , man with futuristic clothes' to [['man , man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 61 ms
[DEBUG] clip.hpp:1328 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 54 ms
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 117 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 2058
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
[DEBUG] ggml_extend.hpp:835  - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 50/50 - 1.28it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 41.23s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 41.23s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[DEBUG] ggml_extend.hpp:835  - vae compute buffer size: 6656.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1447 - computing vae [mode: DECODE] graph completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.22s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1812 - txt2img completed in 42.56s


[DEBUG] stable-diffusion.cpp:1551 - txt2img 1024x1024
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_3.jpg'
[DEBUG] stable-diffusion.cpp:1597 - prompt after extract and remove lora: "man img, man with futuristic clothes"
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[DEBUG] ggml_extend.hpp:835  - lora compute buffer size: 20.50 MB(VRAM)
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.26s
[DEBUG] clip.hpp:1222 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 53 ms
[DEBUG] ggml_extend.hpp:835  - pmid compute buffer size: 40.31 MB(VRAM)
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 127 ms
[DEBUG] clip.hpp:1328 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 15 to 50 for PHOTOMAKER
[DEBUG] clip.hpp:1328 - parse 'man , man with futuristic clothes' to [['man , man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 55 ms
[DEBUG] clip.hpp:1328 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 53 ms
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 111 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 2215
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
[DEBUG] ggml_extend.hpp:835  - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 50/50 - 1.28it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 40.68s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 40.68s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[DEBUG] ggml_extend.hpp:835  - vae compute buffer size: 6656.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1447 - computing vae [mode: DECODE] graph completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.22s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1812 - txt2img completed in 42.02s

@bssrdf
Copy link
Contributor

bssrdf commented Mar 19, 2024

@bssrdf batch processing works fine. The issue appears, if I run txt2img for a second time without reloading the sd_ctx. The console output looks exactly the same for both runs:

System Info:
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
New BaseEngine 00000202288E6220
New GLFWEngine 00000202288E6220
[DEBUG] stable-diffusion.cpp:145  - Using CUDA backend
[notice ] EngineGLFW::setup(): Replaced the openFrameworks' GLFW event listeners by the imgui_impl_glfw ones. You will not have multi-window nor multi-context support. This can be enabled by defining OFXIMGUI_GLFW_FIX_MULTICONTEXT_PRIMARY_VP=1.
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:165  - loading model from 'data/models/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load data/models/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:176  - loading vae from 'data/models/vae/vae.safetensors'
[INFO ] model.cpp:705  - load data/models/vae/vae.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/vae/vae.safetensors'
[INFO ] stable-diffusion.cpp:188  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:194  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:195  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:884  - clip params backend buffer size =  1564.36 MB(VRAM) (713 tensors)
[DEBUG] ggml_extend.hpp:884  - unet params backend buffer size =  4900.07 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:884  - vae params backend buffer size =  159.68 MB(VRAM) (248 tensors)
[INFO ] model.cpp:705  - load data/models/photomaker/photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/photomaker/photomaker-v1.safetensors'
[INFO ] lora.hpp:38   - loading LoRA from 'data/models/photomaker/photomaker-v1.safetensors'
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:884  - lora params backend buffer size =  354.38 MB(VRAM) (10240 tensors)
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[DEBUG] lora.hpp:74   - finished loaded lora
[INFO ] stable-diffusion.cpp:275  - loading stacked ID embedding (PHOTOMAKER) model file from 'data/models/photomaker/photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load data/models/photomaker/photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/photomaker/photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:884  - pmid params backend buffer size =  623.48 MB(VRAM) (407 tensors)
[DEBUG] stable-diffusion.cpp:296  - loading vocab
[DEBUG] clip.hpp:164  - vocab size: 49408
[DEBUG] clip.hpp:175  -  trigger word img already in vocab
[DEBUG] stable-diffusion.cpp:316  - loading weights
[DEBUG] model.cpp:1343 - loading tensors from data/models/sd_xl_base_1.0.safetensors
[DEBUG] model.cpp:1343 - loading tensors from data/models/vae/vae.safetensors
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[INFO ] stable-diffusion.cpp:415  - total params memory size = 7247.59MB (VRAM 7247.59MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 159.68MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(VRAM)
[INFO ] stable-diffusion.cpp:419  - loading model from 'data/models/sd_xl_base_1.0.safetensors' completed, taking 4.77s
[INFO ] stable-diffusion.cpp:436  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:464  - finished loaded file
[DEBUG] upscaler.cpp:19   - Using CUDA backend
[INFO ] upscaler.cpp:32   - Upscaler weight type: f16
[INFO ] esrgan.hpp:164  - loading esrgan from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
[DEBUG] ggml_extend.hpp:884  - esrgan params backend buffer size =   8.53 MB(VRAM) (192 tensors)
[INFO ] model.cpp:708  - load data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth using checkpoint format
[DEBUG] model.cpp:1221 - init from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
[DEBUG] model.cpp:1343 - loading tensors from data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth
[INFO ] esrgan.hpp:183  - esrgan model loaded
[DEBUG] stable-diffusion.cpp:1551 - txt2img 1024x1024
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_3.jpg'
[DEBUG] stable-diffusion.cpp:1597 - prompt after extract and remove lora: "man img, man with futuristic clothes"
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[DEBUG] ggml_extend.hpp:835  - lora compute buffer size: 20.50 MB(VRAM)
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.28s
[DEBUG] clip.hpp:1222 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 86 ms
[DEBUG] ggml_extend.hpp:835  - pmid compute buffer size: 40.31 MB(VRAM)
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 161 ms
[DEBUG] clip.hpp:1328 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 15 to 50 for PHOTOMAKER
[DEBUG] clip.hpp:1328 - parse 'man , man with futuristic clothes' to [['man , man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 61 ms
[DEBUG] clip.hpp:1328 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 54 ms
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 117 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 2058
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
[DEBUG] ggml_extend.hpp:835  - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 50/50 - 1.28it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 41.23s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 41.23s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[DEBUG] ggml_extend.hpp:835  - vae compute buffer size: 6656.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1447 - computing vae [mode: DECODE] graph completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.22s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1812 - txt2img completed in 42.56s
[DEBUG] stable-diffusion.cpp:1551 - txt2img 1024x1024
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_3.jpg'
[DEBUG] stable-diffusion.cpp:1597 - prompt after extract and remove lora: "man img, man with futuristic clothes"
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[DEBUG] ggml_extend.hpp:835  - lora compute buffer size: 20.50 MB(VRAM)
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.26s
[DEBUG] clip.hpp:1222 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 53 ms
[DEBUG] ggml_extend.hpp:835  - pmid compute buffer size: 40.31 MB(VRAM)
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 127 ms
[DEBUG] clip.hpp:1328 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 15 to 50 for PHOTOMAKER
[DEBUG] clip.hpp:1328 - parse 'man , man with futuristic clothes' to [['man , man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 55 ms
[DEBUG] clip.hpp:1328 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 53 ms
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 111 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 2215
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
[DEBUG] ggml_extend.hpp:835  - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 50/50 - 1.28it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 40.68s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 40.68s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[DEBUG] ggml_extend.hpp:835  - vae compute buffer size: 6656.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1447 - computing vae [mode: DECODE] graph completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.22s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1812 - txt2img completed in 42.02s

Sorry, I mis-read your first message 😊
Can you try running more than one txt2img call but without photomaker? Just to isolate whether this is a photomaker specific issue.

@Jonathhhan
Copy link
Author

Jonathhhan commented Mar 20, 2024

Can you try running more than one txt2img call but without photomaker? Just to isolate whether this is a photomaker specific issue.

@bssrdf good point. Yes, it works without photomaker (if the path to the photomaker model is empty). It crashes, if the model is loaded and I leave "man (something) img, " away (which is a non related issue, but could be a nice way to trigger photomaker).

@bssrdf
Copy link
Contributor

bssrdf commented Mar 20, 2024

Can you try running more than one txt2img call but without photomaker? Just to isolate whether this is a photomaker specific issue.

@bssrdf good point. Yes, it works without photomaker (if the path to the photomaker model is empty). It crashes, if the model is loaded and I leave "man (something) img, " away (which is a non related issue, but could be a nice way to trigger photomaker).

@Jonathhhan, can you provide details about how to run 2 txt2img without reloading sd_ctx? Did you change the code in main.cpp?

@Jonathhhan
Copy link
Author

@bssrdf of course. I made an addon for Open Frameworks and do not use main.cpp at all (which complicates it a little): https://github.com/Jonathhhan/ofxStableDiffusion
In this file happens most of the relevant stuff: https://github.com/Jonathhhan/ofxStableDiffusion/blob/main/ofxStableDiffusionExample/src/stableDiffusionThread.cpp

@fszontagh
Copy link
Contributor

@Jonathhhan did you set the "isFreeParamsImmediatly" to false?

@Jonathhhan
Copy link
Author

Jonathhhan commented Mar 20, 2024

did you set the "isFreeParamsImmediatly" to false?

@fszontagh Yes.

@bssrdf
Copy link
Contributor

bssrdf commented Mar 21, 2024

@bssrdf of course. I made an addon for Open Frameworks and do not use main.cpp at all (which complicates it a little): https://github.com/Jonathhhan/ofxStableDiffusion In this file happens most of the relevant stuff: https://github.com/Jonathhhan/ofxStableDiffusion/blob/main/ofxStableDiffusionExample/src/stableDiffusionThread.cpp

@Jonathhhan, I have reproduced the issue and implemented a fix. Please wait for the merged PR or you can try the branch. Thanks for reporting the bug.

@Jonathhhan
Copy link
Author

Jonathhhan commented Mar 21, 2024

@bssrdf thanks (I can confirm that it works now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants