Comment
Author: Admin | 2025-04-28
I use image_input size 256x256. The code runs for one iteration. In the first iteration, the losses are normally computed. But before the next step, I got CUDA warning: an illegal memory access was encountered Error. I guess this is caused by the array-out-of-bound issue. Do you have any idea? timings = {config: self._bench(*args, config=config, **kwargs) File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 83, in _bench return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8)) File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/testing.py", line 105, in do_bench torch.cuda.synchronize() File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/torch/cuda/__init__.py", line 783, in synchronize return torch._C._cuda_synchronize()RuntimeError: CUDA error: an illegal memory access was encountered">train step 0; loss = 0.105516; scale_loss = 0.000002; rgb_loss = 0.105514; depth_normal_loss = 0.000000; supervise_normal_loss = 0.000000; non_local_loss = 0.000000; context = [[96, 121], [122, 147]]; bound = [0.5; 100.0]; scene = ['0f93fdb52c6933cf', 'a3a5e373d876db0e']; ......File "/group/40034/ozhengchen/scene_gen/LGM/Gaussian_final/src/model/encoder/mamba2_model/mamba2/ssd_combined.py", line 565, in backward dx, ddt, dA, dB, dC, dD, dz, ddt_bias, dinitial_states = _mamba_chunk_scan_combined_bwd(dout, x, dt, A, B, C, out, ctx.chunk_size, D=D, z=z, dt_bias=dt_bias, initial_states=initial_states, dfinal_states=dfinal_states, seq_idx=seq_idx, dt_softplus=ctx.dt_softplus, dt_limit=ctx.dt_limit) File "/group/40034/ozhengchen/scene_gen/LGM/Gaussian_final/src/model/encoder/mamba2_model/mamba2/ssd_combined.py", line 458, in _mamba_chunk_scan_combined_bwd ddA = _chunk_scan_bwd_ddAcs_stable(x, dt, dA_cumsum, dout, CB) File "/group/40034/ozhengchen/scene_gen/LGM/Gaussian_final/src/model/encoder/mamba2_model/mamba2/ssd_chunk_scan.py", line 1655, in _chunk_scan_bwd_ddAcs_stable _chunk_scan_bwd_ddAcs_stable_kernel[grid_ddtcs]( File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 100, in run timings = {config: self._bench(*args, config=config, **kwargs) File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 100, in timings = {config: self._bench(*args, config=config, **kwargs) File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 83, in _bench return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8)) File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/testing.py", line 105, in do_bench torch.cuda.synchronize() File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/torch/cuda/__init__.py", line 783, in synchronize return torch._C._cuda_synchronize()RuntimeError: CUDA error: an illegal memory access was encountered
Add Comment