Research
Research Papers & Benchmarks
If you are looking at using SYCL as a programming model for heterogeneous and parallel software development there are a wide variety of published independent research papers. Here we have linked details for some of these papers.
Research Papers
-
Fast Merge Tree Computation via SYCL
Authors: Arnur Nigmetov, Dmitriy Morozov
A merge tree is a topological descriptor of a real-valued function. Merge trees are used in visualization and topological data analysis, either directly or as a means to another end: computing a 0-dimensional persistence diagram, identifying connected components, performing topological simplification, etc. Scientific computing relies more and more on GPUs to achieve fast, scalable computation. For efficiency, data analysis should...
View Paper -
Comparing SYCL™ Data Transfer Strategies for Tracking Use Cases
Authors: S Joube, H Grasland, D Chamont and E Brunet
The aim of this work is to compare the performance and ease of programming of the various data transfer strategies provided by SYCL 2020: buffers/accessors on one hand and the different storage types exposed by Unified Shared Memory (USM) on the other hand. We measured the relative performance of USM exclusively located either on the host (USM host) or on...
View Paper -
Evaluation of Intel's DPC++ Compatibility Tool in heterogeneous computing
Authors: German Castano, Youssef Faqir-Rhazoui, Carlos Garcia, Manual Prieto-Matias
"DPCT greatly streamlines the migration process from CUDA to oneAPI. Twenty out of the twenty three benchmarks were successfully migrated without major developer interventions.• Memory operations (device memory management operations and data transfers between host and device memories) take roughly the same time in the migrated and native codes.• While some migrated applications achieved similar performance to the original CUDA...
View Paper -
SYCL Code Generation for Multigrid Methods
Authors: Stefan Groth, Christian Schmitt, Jürgen Teich, and Frank Hannig
Multigrid methods are fast and scalable numerical solvers for partial differential equations (PDEs) that possess a large design space forimplementing their algorithmic components. Code generation ap-proaches allow formulating multigrid methods on a higher level of abstraction that can then be used to define a problem and hardware-specific solution. Since these problems have considerable implementation variability, it is crucial to define...
View Paper -
Performance Portability of Multi-Material Kernels
Authors: Istvan Z. Reguly
Trying to improve performance, portability, and productivity ance portability and code divergence metrics, contrasting performance, portability, and productivityof an application presents non-trivial trade-offs, which are often difficult to quantify. Recent work has developed metrics for performance portability, as well some aspects of productivity - in this case study, we present a set of challenging computational kernels and their implementations from...
View Paper -
Performance portability of a Wilson Dslash Stencil Operator Mini-App using Kokkos and SYCL
Authors: Balint Joo, Thorsten Kurth, M. A. Clark, Jeongnim Kim, Christian R. Trott, Dan Ibanez, Dan Sunderland, Jack Deslippe
We describe our experiences in creating mini-apps for the Wilson-Dslash stencil operator for Lattice Quantum Chromo dynamics using the Kokkos and SYCL programming models. In particular we comment on the performance achieved on a variety of hardware architectures, limitations we have reached in both programming models and how these have been resolved by us, or may be resolved by the...
View Paper -
Innovative language extensions for accelerator cards using the example of SYCL, HC, HIP and CUDA: research on usability and performance
Authors: Jan Stephan, Dr. Wolfgang E. Nagel
Translated from German: “The purpose of this work is a comparative analysis of the programming models CUDA, SYCL and ROCm (or HC and HIP) on GPUs of the manufacturers NVIDIA and AMD. On the one hand, the skills and concepts underlying the respective models are to be compared, on the other hand the concrete achievable performance is to be determined...
View Paper -
Celerity: High-level C++ for Accelerator Clusters
Authors: Peter Thoman, Philip Salzmann, Biagio Cosenza, and Thomas Fahringer
In the face of ever-slowing single-thread performance growthfor CPUs, the scientific and engineering communities increasingly turn toaccelerator parallelization to tackle growing application workloads. Ex-isting means of targeting distributed memory accelerator clusters imposesevere programmability barriers and maintenance burdens. The Celerity programming environment seeks to enable developers toscale C++ applications to accelerator clusters with relative ease, whileleveraging and extending the SYCL domain-specific...
View Paper -
Improving the Performance of Medical Imaging Applications using SYCL
Authors: Zheming Jin
In this report, we are interested in applying the SYCL programming model to medical imaging applications for a study on performance portability and programming productivity. The SYCL standard specifies a cross-platform abstraction layer that enables programming of heterogeneous computing systems using standard C++. As opposed to the Open Computing Language (OpenCL) programming model, in which host and device code are...
View Paper
Benchmarks
-
RSBench
RSBench is a mini-app representing a key computational kernel of the Monte Carlo neutron transport algorithm.
View Benchmarks -
ParResKernels
Parallel Research Kernels is a suite that contains a number of kernel operations, plus a simple build system intended for a Linux-compatible environment. Most of the code relies on open standard programming models including SYCL and thus can be executed on many computing systems.
View Benchmarks -
BabelStream
BabelStream is a benchmark used to measure the memory transfer rates to/from capacity memory. Unlike other memory bandwidth benchmarks this does not include any PCIe transfer time for attached devices.
View Benchmarks