comfyanonymous
0ec513d877
Add a --force-channels-last to inference models in channel last mode.
2024-06-15 01:08:12 -04:00
Simon Lui
5eb98f0092
Exempt IPEX from non_blocking previews fixing segmentation faults. ( #3708 )
2024-06-13 18:51:14 -04:00
comfyanonymous
0e49211a11
Load the SD3 T5xxl model in the same dtype stored in the checkpoint.
2024-06-11 17:03:26 -04:00
comfyanonymous
104fcea0c8
Add function to get the list of currently loaded models.
2024-06-05 23:25:16 -04:00
comfyanonymous
b1fd26fe9e
pytorch xpu should be flash or mem efficient attention?
2024-06-04 17:44:14 -04:00
comfyanonymous
b249862080
Add an annoying print to a function I want to remove.
2024-06-01 12:47:31 -04:00
comfyanonymous
bf3e334d46
Disable non_blocking when --deterministic or directml.
2024-05-30 11:07:38 -04:00
comfyanonymous
0920e0e5fe
Remove some unused imports.
2024-05-27 19:08:27 -04:00
comfyanonymous
6c23854f54
Fix OSX latent2rgb previews.
2024-05-22 13:56:28 -04:00
comfyanonymous
8508df2569
Work around black image bug on Mac 14.5 by forcing attention upcasting.
2024-05-21 16:56:33 -04:00
comfyanonymous
09e069ae6c
Log the pytorch version.
2024-05-20 06:22:29 -04:00
comfyanonymous
19300655dd
Don't automatically switch to lowvram mode on GPUs with low memory.
2024-05-17 00:31:32 -04:00
Simon Lui
f509c6fe21
Fix Intel GPU memory allocation accuracy and documentation update. ( #3459 )
...
* Change calculation of memory total to be more accurate, allocated is actually smaller than reserved.
* Update README.md install documentation for Intel GPUs.
2024-05-12 06:36:30 -04:00
comfyanonymous
fa6dd7e5bb
Fix lowvram issue with saving checkpoints.
...
The previous fix didn't cover the case where the model was loaded in
lowvram mode right before.
2024-05-12 06:13:45 -04:00
comfyanonymous
49c20cdc70
No longer necessary.
2024-05-12 05:34:43 -04:00
comfyanonymous
e1489ad257
Fix issue with lowvram mode breaking model saving.
2024-05-11 21:55:20 -04:00
Simon Lui
a56d02efc7
Change torch.xpu to ipex.optimize, xpu device initialization and remove workaround for text node issue from older IPEX. ( #3388 )
2024-05-02 03:26:50 -04:00
comfyanonymous
258dbc06c3
Fix some memory related issues.
2024-04-14 12:08:58 -04:00
comfyanonymous
0a03009808
Fix issue with controlnet models getting loaded multiple times.
2024-04-06 18:38:39 -04:00
comfyanonymous
5d8898c056
Fix some performance issues with weight loading and unloading.
...
Lower peak memory usage when changing model.
Fix case where model weights would be unloaded and reloaded.
2024-03-28 18:04:42 -04:00
comfyanonymous
c6de09b02e
Optimize memory unload strategy for more optimized performance.
2024-03-24 02:36:30 -04:00
comfyanonymous
4b9005e949
Fix regression with model merging.
2024-03-20 13:56:12 -04:00
comfyanonymous
c18a203a8a
Don't unload model weights for non weight patches.
2024-03-20 02:27:58 -04:00
comfyanonymous
db8b59ecff
Lower memory usage for loras in lowvram mode at the cost of perf.
2024-03-13 20:07:27 -04:00
comfyanonymous
0ed72befe1
Change log levels.
...
Logging level now defaults to info. --verbose sets it to debug.
2024-03-11 13:54:56 -04:00
comfyanonymous
65397ce601
Replace prints with logging and add --verbose argument.
2024-03-10 12:14:23 -04:00
comfyanonymous
dce3555339
Add some tesla pascal GPUs to the fp16 working but slower list.
2024-03-02 17:16:31 -05:00
comfyanonymous
88f300401c
Enable fp16 by default on mps.
2024-02-19 12:00:48 -05:00
comfyanonymous
929e266f3e
Manual cast for bf16 on older GPUs.
2024-02-17 09:01:17 -05:00
comfyanonymous
0b3c50480c
Make --force-fp32 disable loading models in bf16.
2024-02-16 23:01:54 -05:00
comfyanonymous
f83109f09b
Stable Cascade Stage C.
2024-02-16 10:55:08 -05:00
comfyanonymous
aeaeca10bd
Small refactor of is_device_* functions.
2024-02-15 21:10:10 -05:00
comfyanonymous
66e28ef45c
Don't use is_bf16_supported to check for fp16 support.
2024-02-04 20:53:35 -05:00
comfyanonymous
24129d78e6
Speed up SDXL on 16xx series with fp16 weights and manual cast.
2024-02-04 13:23:43 -05:00
comfyanonymous
4b0239066d
Always use fp16 for the text encoders.
2024-02-02 10:02:49 -05:00
comfyanonymous
f9e55d8463
Only auto enable bf16 VAE on nvidia GPUs that actually support it.
2024-01-15 03:10:22 -05:00
comfyanonymous
1b103e0cb2
Add argument to run the VAE on the CPU.
2023-12-30 05:49:07 -05:00
comfyanonymous
e1e322cf69
Load weights that can't be lowvramed to target device.
2023-12-28 21:41:10 -05:00
comfyanonymous
a252963f95
--disable-smart-memory now unloads everything like it did originally.
2023-12-23 04:25:06 -05:00
comfyanonymous
36a7953142
Greatly improve lowvram sampling speed by getting rid of accelerate.
...
Let me know if this breaks anything.
2023-12-22 14:38:45 -05:00
comfyanonymous
2f9d6a97ec
Add --deterministic option to make pytorch use deterministic algorithms.
2023-12-17 16:59:21 -05:00
comfyanonymous
b0aab1e4ea
Add an option --fp16-unet to force using fp16 for the unet.
2023-12-11 18:36:29 -05:00
comfyanonymous
ba07cb748e
Use faster manual cast for fp8 in unet.
2023-12-11 18:24:44 -05:00
comfyanonymous
57926635e8
Switch text encoder to manual cast.
...
Use fp16 text encoder weights for CPU inference to lower memory usage.
2023-12-10 23:00:54 -05:00
comfyanonymous
340177e6e8
Disable non blocking on mps.
2023-12-10 01:30:35 -05:00
comfyanonymous
9ac0b487ac
Make --gpu-only put intermediate values in GPU memory instead of cpu.
2023-12-08 02:35:45 -05:00
comfyanonymous
2db86b4676
Slightly faster lora applying.
2023-12-06 05:13:14 -05:00
comfyanonymous
ca82ade765
Use .itemsize to get dtype size for fp8.
2023-12-04 11:52:06 -05:00
comfyanonymous
31b0f6f3d8
UNET weights can now be stored in fp8.
...
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
2023-12-04 11:10:00 -05:00
comfyanonymous
0cf4e86939
Add some command line arguments to store text encoder weights in fp8.
...
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
2023-11-17 02:56:59 -05:00
comfyanonymous
7339479b10
Disable xformers when it can't load properly.
2023-11-13 12:31:10 -05:00
comfyanonymous
dd4ba68b6e
Allow different models to estimate memory usage differently.
2023-11-12 04:03:52 -05:00
comfyanonymous
8594c8be4d
Empty the cache when torch cache is more than 25% free mem.
2023-10-22 13:58:12 -04:00
comfyanonymous
c8013f73e5
Add some Quadro cards to the list of cards with broken fp16.
2023-10-16 16:48:46 -04:00
comfyanonymous
fd4c5f07e7
Add a --bf16-unet to test running the unet in bf16.
2023-10-13 14:51:10 -04:00
comfyanonymous
9a55dadb4c
Refactor code so model can be a dtype other than fp32 or fp16.
2023-10-13 14:41:17 -04:00
comfyanonymous
88733c997f
pytorch_attention_enabled can now return True when xformers is enabled.
2023-10-11 21:30:57 -04:00
comfyanonymous
20d3852aa1
Pull some small changes from the other repo.
2023-10-11 20:38:48 -04:00
Simon Lui
eec449ca8e
Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively.
2023-09-22 21:11:27 -07:00
comfyanonymous
1cdfb3dba4
Only do the cast on the device if the device supports it.
2023-09-20 17:52:41 -04:00
comfyanonymous
321c5fa295
Enable pytorch attention by default on xpu.
2023-09-17 04:09:19 -04:00
comfyanonymous
0966d3ce82
Don't run text encoders on xpu because there are issues.
2023-09-14 12:16:07 -04:00
comfyanonymous
1938f5c5fe
Add a force argument to soft_empty_cache to force a cache empty.
2023-09-04 00:58:18 -04:00
Simon Lui
4a0c4ce4ef
Some fixes to generalize CUDA specific functionality to Intel or other GPUs.
2023-09-02 18:22:10 -07:00
comfyanonymous
b8c7c770d3
Enable bf16-vae by default on ampere and up.
2023-08-27 23:06:19 -04:00
comfyanonymous
a57b0c797b
Fix lowvram model merging.
2023-08-26 11:52:07 -04:00
comfyanonymous
f72780a7e3
The new smart memory management makes this unnecessary.
2023-08-25 18:02:15 -04:00
comfyanonymous
30eb92c3cb
Code cleanups.
2023-08-24 19:39:18 -04:00
comfyanonymous
51dde87e97
Try to free enough vram for control lora inference.
2023-08-24 17:20:54 -04:00
comfyanonymous
cc44ade79e
Always shift text encoder to GPU when the device supports fp16.
2023-08-23 21:45:00 -04:00
comfyanonymous
a6ef08a46a
Even with forced fp16 the cpu device should never use it.
2023-08-23 21:38:28 -04:00
comfyanonymous
f081017c1a
Save memory by storing text encoder weights in fp16 in most situations.
...
Do inference in fp32 to make sure quality stays the exact same.
2023-08-23 01:08:51 -04:00
comfyanonymous
0d7b0a4dc7
Small cleanups.
2023-08-20 14:56:47 -04:00
Simon Lui
9225465975
Further tuning and fix mem_free_total.
2023-08-20 14:19:53 -04:00
Simon Lui
2c096e4260
Add ipex optimize and other enhancements for Intel GPUs based on recent memory changes.
2023-08-20 14:19:51 -04:00
comfyanonymous
e9469e732d
--disable-smart-memory now disables loading model directly to vram.
2023-08-20 04:00:53 -04:00
comfyanonymous
3aee33b54e
Add --disable-smart-memory for those that want the old behaviour.
2023-08-17 03:12:37 -04:00
comfyanonymous
2be2742711
Fix issue with regular torch version.
2023-08-17 01:58:54 -04:00
comfyanonymous
89a0767abf
Smarter memory management.
...
Try to keep models on the vram when possible.
Better lowvram mode for controlnets.
2023-08-17 01:06:34 -04:00
comfyanonymous
1ce0d8ad68
Add CMP 30HX card to the nvidia_16_series list.
2023-08-04 12:08:45 -04:00
comfyanonymous
4a77fcd6ab
Only shift text encoder to vram when CPU cores are under 8.
2023-07-31 00:08:54 -04:00
comfyanonymous
3cd31d0e24
Lower CPU thread check for running the text encoder on the CPU vs GPU.
2023-07-30 17:18:24 -04:00
comfyanonymous
22f29d66ca
Try to fix memory issue with lora.
2023-07-22 21:38:56 -04:00
comfyanonymous
4760c29380
Merge branch 'fix-AttributeError-module-'torch'-has-no-attribute-'mps'' of https://github.com/KarryCharon/ComfyUI
2023-07-20 00:34:54 -04:00
comfyanonymous
18885f803a
Add MX450 and MX550 to list of cards with broken fp16.
2023-07-19 03:08:30 -04:00
comfyanonymous
ff6b047a74
Fix device print on old torch version.
2023-07-17 15:18:58 -04:00
comfyanonymous
1679abd86d
Add a command line argument to enable backend:cudaMallocAsync
2023-07-17 11:00:14 -04:00
comfyanonymous
5f57362613
Lower lora ram usage when in normal vram mode.
2023-07-16 02:59:04 -04:00
comfyanonymous
490771b7f4
Speed up lora loading a bit.
2023-07-15 13:25:22 -04:00
KarryCharon
3e2309f149
fix mps miss import
2023-07-12 10:06:34 +08:00
comfyanonymous
0ae81c03bb
Empty cache after model unloading for normal vram and lower.
2023-07-09 09:56:03 -04:00
comfyanonymous
e7bee85df8
Add arguments to run the VAE in fp16 or bf16 for testing.
2023-07-06 23:23:46 -04:00
comfyanonymous
ddc6f12ad5
Disable autocast in unet for increased speed.
2023-07-05 21:58:29 -04:00
comfyanonymous
8d694cc450
Fix issue with OSX.
2023-07-04 02:09:02 -04:00
comfyanonymous
dc9d1f31c8
Improvements for OSX.
2023-07-03 00:08:30 -04:00
comfyanonymous
2c4e0b49b7
Switch to fp16 on some cards when the model is too big.
2023-07-02 10:00:57 -04:00
comfyanonymous
6f3d9f52db
Add a --force-fp16 argument to force fp16 for testing.
2023-07-01 22:42:35 -04:00
comfyanonymous
1c1b0e7299
--gpu-only now keeps the VAE on the device.
2023-07-01 15:22:40 -04:00
comfyanonymous
3b6fe51c1d
Leave text_encoder on the CPU when it can handle it.
2023-07-01 14:38:51 -04:00
comfyanonymous
b6a60fa696
Try to keep text encoders loaded and patched to increase speed.
...
load_model_gpu() is now used with the text encoder models instead of just
the unet.
2023-07-01 13:28:07 -04:00