Properties specific to broadcasting tensors. More...
#include <BroadcastingTensor.hpp>
| Static Public Attributes | |
| static constexpr bool | BroadcastsLastDimension = BLD | 
| If true, the last dimension is broadcast.  More... | |
Properties specific to broadcasting tensors.
| 
 | staticconstexpr | 
If true, the last dimension is broadcast.
Other dimensions may also be broadcast in either case.
In general, a broadcasting tensor works by forcing the underlying tensor to actually store all its strides, and then sets some of them to zero. For at least partially packed tensors, Tensor doesn't usually store the last dimension since it is implicitly zero. The loop unroller is able to use this information to generate offset loads when unrolling inner loops with such tensors. The broadcasting tensor would break the compiler's ability to do this, because it would force it to generate a multiply by the stored stride. It is therefore of great value to codegen a special case when we know that last stride is statically zero, since the compiler can then use offset addressing as it would for the non-broadcasting case (with offsets of size zero in the last dimension).
The benefit from specialising every possible specific broadcast dimension would be far less. By the time you're manipulating dimensions higher than the last one, you're already forced to actually do the address calculation. The saving from such specialisations would be to optimise out one integer FMA per broadcast dimension, effectively reducing the dimensionality of the address calculation by the number of broadcast dimensions. This would come at a cost of multiplying the binary size by the product of the number of supported input dimensions and the number of possible combinations of broadcast dimensions for each of those. That's a lot. :D