Matrix Transpose

This section shows the use of some SYCL features in order to perform a vector reduce operation.

reduce_naive.cpp shows a reduction performed without using atomics or any other SYCL feature. It uses two separate kernels to achieve device synchronization between reductions step 1 and 2.

SYCL Atomics

reduce_atomic.cpp shows a reduction operation using local memory and global memory atomics. Workgroups perform a local reduction using local atomics and the result is added together using a single global memory sycl::atomic_ref.

SYCL Group Algorithms

reduce_group_algorithm.cpp replaces the local memory atomic add with a call to sycl::reduce_over_group, which can be expected to perform a similar operation to the device function workgroup_reduce_local_mem as seen in reduce_naive.cpp.

SYCL Reductions

reduce_sycl_reduction.cpp replaces the entire logic of the reduction with a SYCL reducer object. The reduction is a parameter of the parallel_for invocation, and there is another reducer object used as a parameter to the kernel lambda.

Your Privacy

We genuinely value your privacy and only store data that you are comfortable with.

You can view and read our storage and privacy policies below. If you have any questions, please feel free to reach out to us via the contact details on the privacy policy.

Reject