- What is SYCL
- Enqueueing a Kernel
- Managing Data
- Handling Errors
- Device Discovery
- Data Parallelism
- Introduction to USM
- Using USM
- Asynchronous Execution
- Data and Dependencies
- In Order Queue
- Advanced Data Flow
- Multiple Devices
- ND Range Kernels
- Image Convolution
- Coalesced Global Memory
- Vectors
- Local Memory Tiling
- Further Optimisations
- Matrix Transpose
- More SYCL Features
- Functors
Local Memory Tiling
In this exercise you will learn how to cache global memory into local memory in tiles according to work-groups in order to compare the performance difference.
1.) Use local memory
Allocate local memory for the kernel function by creating a local accessor
.
The local accessor
is created by specifying a range
which is the number of
elements to allocate.
Note that local memory is allocated per work-group and each work-group can only access it's own local memory.
2.) Cache global memory access in local memory
Cache the global memory required for each work-group in local memory by reading
from global accessor
and then writing to the local accessor
.
Remember to use the local id to know where in local memory to store the data,
this can be queried by calling the nd_item
member function get_local_id
.
Compare the performance with local memory and without local memory.
Build and execution hints
For DevCloud via JupiterLab follow these instructions.
For DPC++: instructions.
For AdaptiveCpp: instructions.