For a given stack of 100 images, how large is the uncertainty on derived albedo?

We investigate this using one of the ‘good’ stacks of 100 images for the V band. We call the stack ‘good’ because it seems that repeated fits to it with slightly different settings of the fitting algorithm give similar results.

We split the stack into 33 3-image averages, 16 6-image averages, 8 12-image averages, 4 25-image averages, and 2 50-image averages. We also fit the 100-image average several times. We fit each of these multi-image averages and plot the results:
– so the conclusion is that flux constancy varies.
Top left: log-log plot of how the RMSE depends on number of frames in each average. the RMSE is calculated in the areas used for the fitting of the model to the observed image. The fitted line has slope minus a third. Top right: log-log plot of how the estimated uncertainty of the Albedo-fit (given my routine LMFIT) depends on number of frames averaged. The fitted line has slope minus a third. Middle left: log-log plot of how RMSE and Albedo uncertainty depend on each other. The line has slope near 0.9. Middle right: Fitted albedo against number of frames. The error bars are the formal estimates of the albedo fit uncertainty (given by LMFIT). Bottom left: How the standard deviation of the Albedo determinations (in middle right plot) depends on number of frames averaged. To avoid ‘overplotting’ we have offset N slightly by adding small random numbers but still maintain a clusteringa round the actual N value.

We see some interesting things in this evaluation of ‘intra-stack variability’: The uncertainty (both RMSE and Albedo uncertainty) ought to drop as one over the square root of number of frames averaged, if the noise causing the scatter is normally distributed and independent and there were no biases. As it is, the errors drop with a minus 1/3 slope indicating that something is holding back error-averaging. We also see some hint in this in the middle right panel: albedo seems to depend on how many frames we average – the frames were also averaged sequentially (3-averages formed from frames 1,2,3 then 4,5,6 and so on). We can test if the order of averaging is important by doing some bootstrapping and randomizing (TBD) but there is also the possibility that increased averaging allows some subtle biasing factor to emerge. Finally, we see that intrinsic error (i.e. that not due to choice of image stack but due to factors like noise within a given stack) can be brought below 0.2% when averaging just 20 frames. We also see that the formal estimates of albedo uncertainty are not that far from the scatter of the albedo determinations themselves (but might be off by a factor of 2 or so).

So, in summary, we have found that scatter can be brought down to the few-tenths of a percent level by averaging and using our fitting method and observed images. [Wahey! That is actually a major point for the Big Paper! Rejoice!] We find that there is some sign of bias causing a monotonic evolution of the fitted albedo with increased averaging, as well as a decrease in scatter that does not drop as expected for averaged independent noise.

We can test if the order of averaging is important by bootstrapping, and this will be done. It could help rule out effects due to some time-dependence of the properties of the images in the stack.

CONTINUED LATER:

We have now tested some more ideas: We test effect of alignment as well as effect of the order in which the images are selected for averaging inside the 100-image stack.

Upper left: red line and black points show the old result – sequentially selected frames are averaged, without alignment. Green points and line – the images are selected non-sequentially as in ‘bootstrap with replacement’ and are also not aligned. We see that the slopes of the lines are the same – about -1/3 but that the bootstrapped results have about 25% higher error than the sequentially selected.

This gave the idea that ‘something changes’ along the stack which results in increased error. It could be that the Moon drifts slowly across the sky so that mis-alignment is more evident in sums of frames selected far apart as opposed to in sequence. There could also be ‘other time dependent’ things going on – such as various CCD-related effects.

Upper right: to test the drift hypothesis above we next aligned the selected frames when bootstrapping. As we see there is no effect of doing this! So drift is not the problem – or is it? Alignment consists of sub-pixel shifts that may induce variance-changes due to non-conservative interpolation in the image frame. We have considered this in this blog (see discussion and further blog references HERE). Since the green data look the same in upper left and upper right we conclude that alignment (i.e. possible flux non-conservation) cannot be the issue.

Bottom right: This shows that the scatter on bootstrapped albedo determinations are now larger than for sequential access, and that levels between 0.2 and 0.3% can be reached, at best.

So why is it ‘bad’ to pick the images in random order, rather than sequentially?

Speculation department: If it turns out that alignment simply isn’t the issue then what? We have the shutter and we have the bias. In the above we assume that shutter works fine and that the bias is the same in all frames. We test here the shutter, by looking at the total flux of each image in the 100-image stack:
Top panel shows percentage deviation of the total flux in each bias-subtracted frame relative to the average of the whole stack. The standard deviation is 0.46%. Flux is not constant from frame to frame. Why? Second frame shows the maximum flux in each frame after smoothing. The two graphs are very similar. The last frame shows the mean of a dark corner in each image. The graph does not look like the other two – so it is unlikely that strong sky variations or changes in bias caused the changes in total flux.

We guess that the changes in flux is due to either changes in atmospheric transmission or shutter variability.

But how could flux in-constancy cause the problems we see with increased errors when bootstrapping? Apart from a few big dips in the total flux, most points seem to be randomly scattered wrt the neighbours. In that case sequential access as opposed to random access of image sequence would not matter, would it?

Hmmm.

There is the ‘wobble’ of the image as seen in a film we have on this blog – waves of ‘something’ wash across the lunar disc causing small transmission changes. Sequential images may then be more alike than images selected far apart, and alignment, being whole-image shift based, would not repair the problem due to ‘wobble’. A test suggests itself – look at fits to single images in a stack and the scatter these have. Then consider whether that is more or less than the scatter seen between n-frame averages of the two types.

CONTINUED EVEN LATER:

It was tested whether albedo fit scatter increased or decreased when integer pixel shifts and sub-pixel shifts were used: It turns out that sub_pixel shifts is better than integer pixel shifts in most cases, but that a straight sum (i.e. un-aligned) is best! NB: This is the case for the present choice of 100-image stacks. YMMD.