Notes on Speed

Under tests/, the profile_xfms script tests the speed of several layers of the DTCWT for working on a moderately sized input \(X \in \mathbb{R}^{10 \times 10 \times 128 \times 128}\). As a reference, an 11 by 11 convolution takes 2.53ms for a tensor of this size.

A single layer DTCWT using the ‘near_sym_a’ filters (lengths 5 and 7) has 6 convolutional calls. I timed them at 238us each for a total of 1.43ms. Unfortunately, there is also a bit of overhead in calculating the DTCWT, and not all non convolutional operations are free. In addition to the 6 convolutions, there were:

  • 6 move ops @ 119us = 714us
  • 10 pointwise add ops @ 122us = 465us
  • 12 copy ops @ 35us = 381us
  • 6 different add ops @ 38us = 232us
  • 6 subtraction ops @ 37us = 220us
  • 3 constant division ops @ 57us = 173us
  • 6 more move ops @ 28us = 171us

Making the overheads 2.3ms, and 3.7ms total time.

For a two layer DTCWT, there are now 12 convolutional ops. The second layer kernels are slightly larger (10 taps each) so although they act over 1/4 the sample size, they take up an extra 1.1ms (2.5ms total for the 12 convs). The overhead for non convolution operations is 4.4ms, making 6.9ms. Roughly 3 times a long as an 11 by 11 convolution.

There is an option to not calculate the highpass coefficients for the first scale, as these often have limited useful information (see the skip_hps option). For a two scale transform, this takes the convolution run time down to 1.13ms and the overhead down to 2.49ms, totaling 3.6ms, or roughly the same time as the 1 layer transform.

A single layer inverse transform takes: 1.43ms (conv) + 2.7ms (overhead) totaling 4.1ms, slightly longer than the 3.7ms for the forward transform.

A two layer inverse transform takes: 2.24 (conv) + 5.9 (overhead) totaling 8.1ms, again slightly longer than the 6.9ms for the forward transform.

A single layer end to end transform takes 2.86ms (conv) + 5.8ms (overhead) = 8.6ms \(\approx\) 3.7 (forward) + 4.1 (inverse).

Similarly, a two layer end to end transform takes 4.4ms (conv) + 10.4ms (overhead) = 14.8ms \(\approx\) 6.9 (forward) + 8.1 (inverse).

If we use the near_sym_b filters for layer 1 (13 and 19 taps), the overhead doesn’t increase, but the time taken to do each convolution unsurprisingly triples to 600us each (up from 200us for near_sym_a).