gpgpu - DXGI_ERROR_DEVICE_HUNG resulting from concurrency::copy on C++ AMP -


i have created c++ amp code performing background gradient removal on astronomical images. come in 16-bit unsigned integers rgb. of application's processing , output occurs in single precision floating point, convert input data, run c++ amp code, , copy results cpu (in reality image go through many of these c++ amp filters on gpu before being copied back, test code have isolated single such filter.

everything goes until initiate concurrency::copy operation copy data cpu gpu array. operation throws exception indicating tdr has been triggered because of dxgi_error_device_hung. full error is:

d3d11 error: id3d11device::removedevice: device removal has been triggered following reason (dxgi_error_device_hung: device took unreasonable amount of time execute commands, or hardware crashed/hung. result, tdr (timeout detection , recovery) mechanism has been triggered. current device context executing commands when hang occurred. application may want respawn , fallback less aggressive use of display hardware). [ execution error #378: device_removal_process_at_fault]

below code in question. i've omitted filter's code since makes through filters fine (i stepped through in debugger) , throws exception when copies cpu. problem line concurrency::copy(frame, begin(cpu_frame)); in code below:

array<float_3, 2> convert_input(std::vector<float_3> &output, unsigned short *input, int n, int m) {     int o = 0;      (int = 0; < n * m * 3; += 3) {         output[o] = float_3((float)input[i] / (float)maxuint16, (float)input[i + 1] / (float)maxuint16, (float)input[i + 2] / (float)maxuint16);         o++;     }      return array<float_3, 2>(n, m, begin(output)); }  void _stdcall remove_gradient(unsigned short *input, float *output, int n, int m) {     std::vector<float_3> cpu_frame(n * m);      array<float_3, 2> frame = convert_input(cpu_frame, input, n, m);      gradientremovalfilter *filter = new gradientremovalfilter();      try {         filter->filterframe(frame);          concurrency::copy(frame, begin(cpu_frame));     }     catch (accelerator_view_removed &ex) {         std::cout << ex.what() << std::endl;         std::cout << ex.get_view_removed_reason() << std::endl;     }      (int = 0; < n * m; ++) {         output[(i * 3)] = cpu_frame[i].r;         output[(i * 3) + 1] = cpu_frame[i].g;         output[(i * 3) + 2] = cpu_frame[i].b;     } } 

any idea what's going wrong , how prevent it? test images 10,000 total pixels, small , smaller working in reality, don't see why copy taking long enough cause tdr kick in, when complicated processing , copy gpu being accomplished fine.

the error output above told happened: shader took long driver figured gpu hung.

the recommendations here are:

  • break processing smaller chunks or simplify compute operation
  • use d3d11_create_device_disable_gpu_timeout directx 11.1+ (see this post)
  • or edit registry extend timeout, useful development.

edit: problem in filter->filterframe c++ amp code becomes directcompute shader causing tdr. fact error returned little later not surprising due cpu/gpu synchronization/timing differences.


Comments

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -