Merge Request #16

Created by Nyan

NEON fixes/tweaks

This merge request fixes some issues and adds some tweaks to NEON code:

  • SPLIT(16,4) ALTMAP implementation was broken as it only processed half the amount of data. As such, this fixed implementation is significantly slower than the old code (which is to be expected). Fixes #2
  • SPLIT(16,4) implementations now merge the ARMv8 and older code path, similar to SPLIT(32,4). This fixes the ALTMAP variant, and also enables the non-ALTMAP version to have consistent sizing
  • Unnecessary VTRN removed in non-ALTMAP SPLIT(16,4) as NEON allows (de)interleaving during load/store; because of this, ALTMAP isn't so useful in NEON
    • This can also be done for SPLIT(32,4), but I have not implemented it
  • I also pulled the if(xor) conditional from non-ALTMAP SPLIT(16,4) to outside the loop. It seems to improve performance a bit on my Cortex A7
    • It probably should be implemented everywhere else, but I have not done this
  • CARRY_FREE was incorrectly enabled on all sizes of w, when it's only available for w=4 and w=8
Assignee: None
Milestone: None

Merged by Loic Dachary

Commits (5)
2 participants