* (VBMI2) PSH[LR]D(V?) - double-wide shifts. yet another one that replaces what would usually be 3 instructions.
Notice a theme here? There's just a ton of efficiency stuff in there that patches holes that have been in the ISA forever and replaces 3-insn sequences with a single one, and it's often 3-insn seqs with 2 uops on a contended port down to 1.