I have spent some time examining to what extent GPUs could help us speed up scattered light fitting of ideal moon images to the observed images, while visiting Swinburne in Melbourne.

Ben Barsdell (Swinburne) had some c++ code to convolve two fits files using an NVIDIA compatible GPU (GTX480) running CUDA. The FFT library is FFTW, the same as I have been using for CPU-based modeling of the scattered light.

Upper left: ideal lunar image (intensity is log scale).

Upper right: convolved with our best PSF using CPU based FFT.

Lower left : convolved with PSF using GPU.

Lower right: ratio of the two methods (seen in more detail below).

On a desktop my fortran code calling FFTW does the three FFTs needed —

ideal moon image, psf image, their multiplication in the Fourier

domain, and the inverse FFT for the final result — in about 1100

milliseconds (ms). (This excludes the time needed to insert the 512×512

images into 1536×1536 images to sufficiently reduce edge wrapping

effects, which brings the total runtime to about 3000 ms). So the FFTs

are taking about 300 ms each.

Using the GPU, we attained FFT

speeds of about 40 ms each, a speed of a factor of 10 or so. This is

typical of what Ben expects for such applications (he is doing a careful

study of the types of astronomical problems GPUs can be profitably

applied to, and where the bottlenecks typically lie.)

Ostensibly

this means we can speed up our light modeling code by about 10 times — possibly

quite a bit more because the overheads per modeled image can be reduced

quite a lot by careful programming, since we want to explore a large

parameter space, but don’t need to do the same overheads each time we

run.

VERY IMPORTANT: the CPU code did the FFTs in double complex precision, whereas the GPU was doing single complex precision.

We

compared the output images of both methods — for the case of scattered

light from an ideal moon with a power law fall off PSF with a slope of

about r^-2.8.

VERY IMPORTANT: there is significant structure left in the methods if we divide the CPU output by the GPU output, as shown int he image above.

The scatter in the mean about 1.0000 is 2.028E-4 and there is a frame minimum of 0.9936 and frame maximum of 1.003, so the deviation from unity in the ratio of the two methods is not worse than ~0.7% anywhere on the frame. There is certainly structure, and it looks like it might be too much for our purposes! (We would like this to be better than 0.1% at the very worst). We are checking if this has to do with the single precision used.

Unless this problem can be solved, it is not clear that the speed gain with the GPUs

is worth having!

No — we checked for that — the PSF seems to be non-zero everywhere and the numerical values are all OK at single precision, there is no dropout…

Is the ‘shockwave’ effects caused by ‘numerical zero on the power law PSF’?