gpgpu - DXGI_ERROR_DEVICE_HUNG resulting from concurrency::copy on C++ AMP -
i have created c++ amp code performing background gradient removal on astronomical images. come in 16-bit unsigned integers rgb. of application's processing , output occurs in single precision floating point, convert input data, run c++ amp code, , copy results cpu (in reality image go through many of these c++ amp filters on gpu before being copied back, test code have isolated single such filter.
everything goes until initiate concurrency::copy operation copy data cpu gpu array. operation throws exception indicating tdr has been triggered because of dxgi_error_device_hung. full error is:
d3d11 error: id3d11device::removedevice: device removal has been triggered following reason (dxgi_error_device_hung: device took unreasonable amount of time execute commands, or hardware crashed/hung. result, tdr (timeout detection , recovery) mechanism has been triggered. current device context executing commands when hang occurred. application may want respawn , fallback less aggressive use of display hardware). [ execution error #378: device_removal_process_at_fault]
below code in question. i've omitted filter's code since makes through filters fine (i stepped through in debugger) , throws exception when copies cpu. problem line concurrency::copy(frame, begin(cpu_frame)); in code below:
array<float_3, 2> convert_input(std::vector<float_3> &output, unsigned short *input, int n, int m) { int o = 0; (int = 0; < n * m * 3; += 3) { output[o] = float_3((float)input[i] / (float)maxuint16, (float)input[i + 1] / (float)maxuint16, (float)input[i + 2] / (float)maxuint16); o++; } return array<float_3, 2>(n, m, begin(output)); } void _stdcall remove_gradient(unsigned short *input, float *output, int n, int m) { std::vector<float_3> cpu_frame(n * m); array<float_3, 2> frame = convert_input(cpu_frame, input, n, m); gradientremovalfilter *filter = new gradientremovalfilter(); try { filter->filterframe(frame); concurrency::copy(frame, begin(cpu_frame)); } catch (accelerator_view_removed &ex) { std::cout << ex.what() << std::endl; std::cout << ex.get_view_removed_reason() << std::endl; } (int = 0; < n * m; ++) { output[(i * 3)] = cpu_frame[i].r; output[(i * 3) + 1] = cpu_frame[i].g; output[(i * 3) + 2] = cpu_frame[i].b; } } any idea what's going wrong , how prevent it? test images 10,000 total pixels, small , smaller working in reality, don't see why copy taking long enough cause tdr kick in, when complicated processing , copy gpu being accomplished fine.
the error output above told happened: shader took long driver figured gpu hung.
the recommendations here are:
- break processing smaller chunks or simplify compute operation
- use
d3d11_create_device_disable_gpu_timeoutdirectx 11.1+ (see this post) - or edit registry extend timeout, useful development.
edit: problem in filter->filterframe c++ amp code becomes directcompute shader causing tdr. fact error returned little later not surprising due cpu/gpu synchronization/timing differences.
Comments
Post a Comment