Nov 16, 2025
Time: 8:30pm - 12pm CST (half-day tutorial)
Location: America’s Center Convention Complex, Room 121
High-performance computing and machine learning applications increasingly rely on mixed-precision arithmetic on CPUs and GPUs for superior performance. However, this shift introduces several challenging numerical issues such as increased round-off errors, and INF and NaN exceptions that can render the computed solutions useless.
At present, this places a heavy burden on developers, interrupting their work while they diagnose these problems manually. This tutorial presents three tools that target specific issues leading to floating-point bugs.
First, we present FPChecker, which not only detects and reports INF/NaN exceptions in parallel and distributed CPU codes, but also tells programmers about the exponent value ranges for avoiding exceptions while also minimizing rounding errors.
Second, we present GPU-FPX, which detects floating-point exceptions generated by NVIDIA GPUs, including their Tensor Cores via a “nixnan” extension to GPU-FPX.
Third, we present FloatGuard, a unique tool that detects exceptions in AMD GPUs. The tutorial is aimed at helping programmers avoid exception bugs; for this, we will demonstrate our tools on simple examples with seeded bugs. Attendees may optionally install and run our tools.
The tutorial also allocates question/answer time to address real situations faced by the attendees.
An overview video of the tutorial is at https://youtu.be/1Ka8g_06Nxg?si=EpYCeuADEVk2qT4u.