I have been looking at the use of GPUs versus CPUs for our scattered light analysis. We need to be able to convolve artificial Lunar images (outside the atmosphere) with the instrument PSF. GPUs offer a considerable speed advantage.

First look (CPU versus GPU):

Upper left panel: artificial lunar image outside the atmosphere.

This artificial image is then convolved with a 2-D Gaussian-like PSF
which has fat (powerlaw) tails, and which closely reproduces what we see
in real data.

Upper right panel: convolution using 2-D FFT code running on a CPU

Lower left panel: convolution using 2-D FFT code running on a GPU

Lower right panel: the ratio of the two methods, i.e. the ratio of the two previous panels

There is a lot of structure in there, mainly images of the lunar
crescent turning up in different places — at a level of about 0.1% of
the intensity.

IMPORTANT: the CPU code was written in double precision, whereas the GPU was in single precision.

(The above reproduces with more explanation an earlier post)

Notes: The CPU code calls the FFTW3 libraries from Fortran (Dec’s ifort compiler is used), just using the standard Fortran to C wrappers provided with FFTW3. The GPU code is in written in CUDA.

Second look (CPU only, single versus double precision):

The plot above shows the ratio of the single precision CPU versus
double precision CPU (i.e. no GPU results shown on this plot).

There is similar structure in the ratio — and at about the
same level as the GPU tests gave, i.e. discrepancies at the level of a
few x 0.1% of the intensity.

Third look (CPU in double precision, renormalisation)


In this plot
we compare CPU double precision, applied to the ideal Lunar image, and without “min/max
renormalisation”. (Min/max renormalisation means scaling the input image so that the smallest value in the frame is 0.0 and the largest value is 1.0).

The ratio panel of the two convolutions (bottom right) shows noise only, and at a
very low level — 1 part in 1E7. Highly acceptable!

Fourth look (CPU, single precision, renormalisation)


This plot shows the same as the previous one — but with single precision rather than double. The artefacts are back, at the same old level of a few x 0.1%!

Thoughts:

We might already be able to conclude from the above that double precision FFT/CPU is robust (negligible
artefacts), but that a single precision CPU, or a single precision GPU,
produces similar sized (few x 0.1%), and thus slightly worrying, artefacts.

But I need access to a double precision GPU to test this. Hope to do so next week!
The acid test will be the results of comparing double precision on a CPU to a GPU.