Zigzag Decoding with AVX-512

(zeux.io)

62 points | by luu 3 days ago

2 comments

MaskRay 58 minutes ago
While we could utilize zigzag encoding (i>>31) ^ (i<<1) to convert SLEB128-encoded type/addend to use ULEB128 instead, the generate code is inferior to or on par with SLEB128 for one-byte encodings on x86, AArch64, and RISC-V. Haven't tried wider values - but zigzag encoding is likely slower as well
// One-byte case for SLEB128 int64_t from_signext(uint64_t v) { return v < 64 ? v - 128 : v; }
// One-byte case for ULEB128 with zig-zag encoding int64_t from_zigzag(uint64_t z) { return (z >> 1) ^ -(z & 1); }
londons_explore 52 minutes ago
This sort of analysis is great.
Now why can't compilers do this sort of thing automatically?
Almost any problem seems to be possible to speed up 1000x in AVX512+days of thought compared to the naive version written in a python loop. If we could automate that whole process for big codebases the performance gains could be huge.
[-]
- cmovq 38 minutes ago
  Compilers can’t really, in a meaningful way, change the layout of your data in memory. And you do need to think about your memory layout to get any benefit from SIMD. You’ll notice a lot of compiler auto vectorization insert many instructions just to shuffle data around to get to a usable layout, which negates much of the benefit.
- diamondlovesyou 43 minutes ago
  > Now why can't compilers do this sort of thing automatically?
  They do - they just can't assume GFNI instructions are present unless you explicitly say so: https://godbolt.org/z/eYasbKsse