MLIR n-D vector types are portrayed as the (n-1)-D arrays of 1-D vectors whenever reduced so you’re able to LLVM

MLIR n-D vector types are portrayed as the (n-1)-D arrays of 1-D vectors whenever reduced so you’re able to LLVM

Brand new implication of bodily HW restrictions on the programming design are this 1 try not to index dynamically all over hardware documents: a join file is also generally not indexed dynamically. For the reason that the newest register amount is fixed and one either has to unroll explicitly to acquire fixed check in numbers otherwise wade as a result of recollections. This is certainly a regulation common to CUDA coders: whenever saying an exclusive drift a ; and you can after that indexing which have a working really worth leads to very-titled local memory incorporate (i.e. roundtripping so you’re able to memories).

Implication for the codegen ¶

It brings up the results to your static against active indexing discussed in past times: extractelement , insertelement and you can shufflevector towards the n-D vectors for the MLIR simply support fixed indicator. Active indicator are only https://datingranking.net/escort-directory/peoria/ served into most small 1-D vector although not new external (n-1)-D . For other instances, specific stream / areas are expected.

  1. Loops as much as vector philosophy is actually secondary addressing from vector beliefs, they should operate on specific load / store procedures more letter-D vector items.
  2. Shortly after an enthusiastic letter-D vector particular was piled on the a keen SSA really worth (that or may not live-in n files, that have otherwise versus spilling, when at some point lowered), it can be unrolled to faster k-D vector brands and operations you to definitely match the fresh HW. It quantity of MLIR codegen is related to check in allowance and you will spilling you to exists far after throughout the LLVM tube.
  3. HW can get support >1-D vectors which have intrinsics to have secondary handling in these vectors. These may feel focused by way of direct vector_shed surgery regarding MLIR k-D vector brands and processes in order to LLVM step one-D vectors + intrinsics.

Alternatively, we believe personally reducing to an effective linearized abstraction hides away this new codegen intricacies linked to thoughts accesses giving a false feeling regarding magical vibrant indexing round the registers. Instead we prefer to build people very explicit for the MLIR and you may create codegen to understand more about tradeoffs. Various other HW requires various other tradeoffs on versions working in steps 1., 2. and you may step 3.

Behavior produced in the MLIR peak will get implications on a beneficial far later on phase during the LLVM (after check in allocation). We do not imagine to expose issues linked to modeling regarding check in allowance and you can spilling so you can MLIR explicitly. Instead, for every single target will establish some “good” address operations and you can letter-D vector products, in the will set you back one to PatterRewriters within MLIR level is in a position to address. Like will cost you at the MLIR level could well be conceptual and you can used to own ranks, maybe not to possess precise performance acting. Down the road such will cost you was discovered.

Implication on the Decreasing in order to Accelerators ¶

To target accelerators that support higher dimensional vectors natively, we can start from either 1-D or n-D vectors in MLIR and use vector.cast to flatten the most minor dimensions to 1-D vector where K is an appropriate constant. Then, the existing lowering to LLVM-IR immediately applies, with extensions for accelerator-specific intrinsics.

It is the role of an Accelerator-specific vector dialect (see codegen flow in the figure above) to lower the vector.cast . Accelerator -> LLVM lowering would then consist of a bunch of Accelerator -> Accelerator rewrites to perform the casts composed with Accelerator -> LLVM conversions + intrinsics that operate on 1-D vector .

Some of those rewrites may need extra handling, especially if a reduction is involved. For example, vector.cast %0: vector to vector when K != K1 * … * Kn and some arbitrary irregular vector.cast %0: vector<4x4x17xf32> to vector may introduce masking and intra-vector shuffling that may not be worthwhile or even feasible, i.e. infinite cost.

However vector.cast %0: vector to vector when K = K1 * … * Kn should be close to a noop.

Slideshow