Lora Model Training Records
Lora-Script
Abstract
This post primarily records issues encountered during model fine-tuning and some parameter configurations. The graphics cards used were rented from a cloud provider.
- Multi-card model training mainly uses a modified
kohya_ssframework and utilizesdeepspeed 3for multi-GPU training. - Single-card training uses the
akitoolkit forlora-scripttraining.
Bash Config
1 | [model] |
Common Issues/Troubleshooting
Mirror Site Address
export HF_ENDPOINT=https://hf-mirror.com
Missing Modules
ModuleNotFoundError: No module named 'bitsandbytes'- Solution:
pip install bitsandbytes>=0.43.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/
- Solution:
Pip Permission Issues
WARNING: Running pip as the 'root' user can result in broken permissions...- Solution: Using a virtual environment is recommended, but for quick fixes locally:
pip install bitsandbytes>=0.43.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/
- Solution: Using a virtual environment is recommended, but for quick fixes locally:
Folder Permission Issues
WARNING: Ignoring invalid distribution -orch ...- Solution: Delete the folder path
.~orchor other similarly named temporary folders.
- Solution: Delete the folder path
xformers Issues
no modules name 'xformers'in cuda 12.8:- Solution:
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu128
- Solution:
torchvision Issues
not libpng libjpeg . or need build torchvision before ***- Solution:
pip3 install torchvision --index-url https://download.pytorch.org/whl/cu128
- Solution:
CUDA 12.8 Installation
- Download:
wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run - Silent Install:
sh cuda_12.8.0_570.86.10_linux.run --toolkit --toolkitpath=/root/autodl-tmp/cuda-12.8 --silent - Modify Environment Variables:
echo 'export PATH=/root/autodl-tmp/cuda-12.8/bin:$PATH' >> ~/.bashrcecho 'export LD_LIBRARY_PATH=/root/autodl-tmp/cuda-12.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
xformers Error: No Kernel Image
- Error:
CUDA error ... no kernel image is available for execution on the device
- Fix: First check your CUDA version:
nvcc --version-> compare withnvidia-smiCUDA version. - If inconsistent, use the steps above to install the correct new version of CUDA.
- Check versions:
conda list. Ensure PyTorch version matches xformers version. If you installed 2.7.1 but xformers only supports up to 2.7.0, downgrade first:
pip3 install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 - If it still errors, it is indeed an xformers issue:
- Run
python -m xformers.info - Check
build.envs. It’s possible the package for 12.x compute capability only supports up to 9.0.build.env.TORCH_CUDA_ARCH_LIST: 6.0+PTX 7.0 7.5 8.0+PTX 9.0a
- Confirm current GPU compute capability:
nvidia-smi --query-gpu=compute_cap --format=csv(e.g., 12.0) - Modify environment variable for compute capability:
export TORCH_CUDA_ARCH_LIST="12.0"(single session) - Download source and compile (use mirror for acceleration):
pip install -v --no-build-isolation -U git+https://ghfast.top/https://github.com/facebookresearch/xformers.git@main#egg=xformers
- Run
- After installation, check
xformersbuild version again. If it matches your compute capability, it is correct.
xformers Error: Incompatibility & Slow Download
- If
install-cn.ps1fails to install the virtual environment, try switching toinstall.ps1. Network issues mainly affect Torch installation (domestic mirrors may not help much). - Manual Installation:
- If network is too slow, reinstall Torch manually:
- Open
install.ps1, verify commands. python.exe -m venv venv(Create venv).\venv\Scripts\activate(Activate venv)- Use
nvidia-smito find CUDA version, go to PyTorch Website to download the corresponding.whlfile. - Go to xformers repo/site, find the installation command for your CUDA version.
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu128- Place the two downloaded packages in the
lora-scriptsfolder. - Manually install:
pip install .\torch-2.7.0+cu128-cp310-cp310-win_amd64.whlpip install .\xformers-0.0.30-cp310-cp310-win_amd64.whl
- Finally, update the environment in the PS script.
Flux Training: google/t5-xxl Download Error
Multi-GPU Training Issues Summary
Complete Training Parameters
train_flux.sh
1 |
|
ds_config.json
1 | { |
dreambooth_config.toml
1 | ae = "train/sd-models/flux-ae.safetensors" |
1. Gradio Port Error
Traceback (most recent call last): OSError: Cannot find empty port in range: 28001-28001…
- Solution:
netstat -ano | findstr :28001taskkill /PID <PID> /F
2. CUDA Out of Memory
torch.OutOfMemoryError: CUDA out of memory.
- Note: Sometimes reported as VRAM error but is actually caused by system RAM overflow. Need to modify
batch_size.
3. use_libuv = 0
- Reference: Introduction to Libuv TCPStore Backend
- Issue: If
use_libuv = 0is set in environment variables but explicitly set toTruein code (route 3), the code value takes precedence. I setlib_usetoFalsein all relevant files.
4. DistributedDataParallel Error
ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules…
- Solution: Add parameter to cancel CPU offloading/swapping:
--blocks_to_swap = 0
5. NotImplementedError: Cannot copy out of meta tensor; no data!
- Cause: Model tensors not initialized properly.
- Fix: Modify
/root/autodl-tmp/kohya_ss/sd-scripts/library/flux_utils.py
6. DeepSpeed OOM
Cause: DeepSpeed loading Flux data into VRAM and RAM causing OOM.
Model Training Config:
- Full Fine-tuning Flux1-dev
- Environment: PyTorch 2.7.0, Python 3.12 (Ubuntu 22.04), CUDA 12.8
- Hardware: RTX 5090 (32GB) * 4, 360GB RAM
- Storage: 50GB System, 440GB Data
Solution: Offload optimizer and params to NVMe using DeepSpeed to move data pressure to disk.
(See ds_config.json configuration above, specifically offload_optimizer and offload_param set to nvme).
7. TypeError: adam_update(): incompatible function arguments.
DeepSpeed 3 passes incorrect eps value type causing this.
Fix Code:
1 | beta1, beta2 = group['betas'] |
Problem 8: No Data! (Meta Tensor Issue)
- File:
/root/miniconda3/envs/kohyass/lib/python3.11/site-packages/transformers/modeling_utils.py - Line 2031: Add
enabledparameter. init_contexts = [deepspeed.zero.Init(config_dict_or_path=deepspeed_config(), enabled=False), set_zero3_state()]- Reason: When loading models locally, transformers uses meta tensors, but importing checkpoints finds empty meta tensors causing errors. Disable DeepSpeed default initialization.
- Reference: https://github.com/zai-org/ChatGLM-6B/issues/530
Problem 9: mat1 and mat2 not equal (Dtype mismatch)
- File:
kohya_ss/sd-scripts/library/flux_models.py - Line 1068: Add text and img tensor type conversion.
- Reason: Clip and T5 process in float32, but other types set to bfloat16 caused mismatch.
1 | def forward( |
DeepSpeed AttributeError: ‘DeepSpeedZeRoOffload’ object has no attribute ‘backward’
- Reason: DeepSpeed not initialized.
- Debug: Check
/root/autodl-tmp/kohya_ss/sd-scripts/library/deepspeed_utils.pyLine 87. - Note: Line 64 in
deepspeed_utils.pymight returnNoneif DeepSpeed is not set.
NCCL enqueue.cc:1556 NCCL WARN Cuda failure 700 ‘an illegal memory access was encountered’
pip install nvidia-nccl-cu12>2.26.2- This error may occur on RTX 5090 but does not affect training.

