listShow Lessons List
- What is SYCL
- Enqueueing a Kernel
- Managing Data
- Handling Errors
- Device Discovery
- Data Parallelism
- Introduction to USM
- Using USM
- Asynchronous Execution
- Data and Dependencies
- In Order Queue
- Advanced Data Flow
- Multiple Devices
- ND Range Kernels
- Image Convolution
- Coalesced Global Memory
- Vectors
- Local Memory Tiling
- Further Optimisations
- Matrix Transpose
- More SYCL Features
- Functors
Lesson
Source
Solution
Further Optimisations
In this exercise you will learn how to use different work-group sizes in order to compare the performance difference.
1.) Try different work-group sizes
In order to optimize for occupancy try different work-group sizes, by specifying
a different local range for the nd_range
.
Note that if the work-group size exceeds the maximum work-group size the target device supports then it will fail to execute.
Compare the performance of the various work-group sizes you try.
Build and execution hints
For DevCloud via JupiterLab follow these instructions.
For DPC++: instructions.
For AdaptiveCpp: instructions.