Thumbnail

The overhead linked to offloading work to an accelerator can be problematic, especially for short-running device kernels. Fusing multiple smaller kernels into one can be a solution to this problem, but manual implementation of fused kernels is tedious work, as it needs to be repeated for each potential combination of kernels. Codeplay have therefore developed an extension for the SYCL standard for user-driven, automatic kernel fusion. If you want to learn how to instruct the SYCL runtime to perform kernel fusion automatically for you, look no further and dive into this blog-post, which explains the extension and demonstrates its use on a simple example.