- What is SYCL
- Enqueueing a Kernel
- Managing Data
- Handling Errors
- Device Discovery
- Data Parallelism
- Introduction to USM
- Using USM
- Asynchronous Execution
- Data and Dependencies
- In Order Queue
- Advanced Data Flow
- Multiple Devices
- ND Range Kernels
- Image Convolution
- Coalesced Global Memory
- Vectors
- Local Memory Tiling
- Further Optimisations
- Matrix Transpose
- More SYCL Features
- Functors
More SYCL Features
This section shows the use of some SYCL features in order to perform a vector reduce operation.
reduce_naive.cpp
shows a reduction performed without using atomics or
any other SYCL feature. It uses two separate kernels to achieve device
synchronization between reductions step 1 and 2.
SYCL Atomics
reduce_atomic.cpp
shows a reduction operation using local memory and
global memory atomics. Workgroups perform a local reduction using local atomics
and the result is added together using a single global memory
sycl::atomic_ref
.
SYCL Group Algorithms
reduce_group_algorithm.cpp
replaces the local memory atomic add with a call
to sycl::reduce_over_group
, which can be expected to perform a similar operation
to the device function workgroup_reduce_local_mem
as seen in
reduce_naive.cpp
.
SYCL Reductions
reduce_sycl_reduction.cpp
replaces the entire logic of the reduction with a
SYCL reducer object. The reduction is a parameter of the parallel_for
invocation, and there is another reducer object used as a parameter to the
kernel lambda.