This adds a maintained version of the (since 2012)
stalled original project.
https://github.com/ComputationalRadiationPhysics/cuda_memtest
Nvidia's NVML (via the GPU deployment kit) could also be
added, providing serial number output of failing GPUs
for multi-GPU nodes.