I did this measurement a long time ago, but I had measured 0.3 us for the interrupt latency, which corresponds to roughly 14 clock cycles. But I put a stwio instruction to a GPIO directly in assembly as the first instruction under the "alt_irq_entry:" label. Depending on how far in the code you inserted your latency measuring code, it's possible that 41 cycles is what you measure. It's in the right ball-park range.