This section of kernel code in TrackThroughGGEMSSolidBox.cl looks like it could be made more efficient (this is v1.3). Specifically, if we are storing scatter separately, it doesn't make sense to increment both histogram arrays (&histogram and &scatter_histogram) when a scattered detection occurs.
atomic_add(&histogram[voxel_id.x + voxel_id.y * virtual_element_number.x], 1);
// Storing scatter
if (scatter_histogram) {
if (primary_particle->scatter_[global_id] == TRUE) atomic_add(&scatter_histogram[voxel_id.x + voxel_id.y * virtual_element_number.x], 1);
}
Instead,one can increment just one of the histogram counters, e.g.,
const unsigned int idx =
(unsigned int)voxel_id.x +
(unsigned int)voxel_id.y * (unsigned int)virtual_element_number.x;
if (scatter_histogram && primary_particle->scatter_[global_id]) {
atomic_add(&scatter_histogram[idx], 1u);
} else {
atomic_add(&histogram[idx], 1u);
}
This would eliminate an atomic add. One might still want the final &histogram to be a combined tally of both primary and scattered events, but that can be done trivially by adding the two histograms together at the end of the simulation.
Thanks for the suggestion — you are correct that the code in v1.3 performs two atomic additions when a scattered detection occurs, and that this can be reduced to a single atomic operation. However, in GGEMS v1.3 the two histogram buffers have different semantics:
- histogram stores all detected events (primary + scatter)
- scatter_histogram stores only scattered events
Therefore, even when an event is scattered, the total histogram must still increment, because it represents the full projection image as detected by the system. The two buffers are not redundant; they encode different physical quantities. The alternative implementation you propose (only incrementing one buffer and reconstructing the total histogram afterwards) is technically possible, but it would move part of the tally logic outside the kernel and slightly change the meaning of the intermediate results produced during a run. For the design of v1.3, we kept the histogram semantics explicit inside the kernel, even at the cost of an additional atomic operation.