i64x2.all_true instructions #415

Maratyszcza · 2020-12-29T06:21:40Z

Introduction

This is proposal to add 64-bit variant of existing all_true instruction. This instruction complement the set of double-precision (f64) and 64-bit integer (i64) SIMD operations and allow for checking for special conditions. It appears that these instructions were accidentally omitted what the double-precision SIMD was added back to the spec #101. Note that despite i64x2 in instruction names they are agnostic to the interpretation of SIMD lanes, and work for f64x2 as well.

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX instruction set

i64x2.all_true
- y = i64x2.all_true(x) is lowered to:
  - VPXOR xmm_tmp1, xmm_tmp1, xmm_tmp1
  - VPCMPEQD xmm_tmp2, xmm_tmp2, xmm_tmp2
  - VPCMPEQQ xmm_tmp1, xmm_tmp1, xmm_x
  - VPTEST xmm_tmp1, xmm_tmp2
  - SETNZ reg_y
  - MOVZXBL reg_y, reg_y

x86/x86-64 processors with SSE4.1 instruction set

i64x2.all_true
- y = i64x2.all_true(x) is lowered to:
  - PXOR xmm_tmp1, xmm_tmp1
  - PCMPEQD xmm_tmp2, xmm_tmp2
  - PCMPEQQ xmm_tmp1, xmm_x
  - PTEST xmm_tmp1, xmm_tmp2
  - SETNZ reg_y
  - MOVZXBL reg_y, reg_y

x86/x86-64 processors with SSE2 instruction set

i64x2.all_true
- y = i64x2.all_true(x) is lowered to:
  - PXOR xmm_tmp1, xmm_tmp1
  - PCMPEQD xmm_tmp1, xmm_x
  - PSHUFD xmm_tmp2, xmm_tmp1, 0xB1
  - PAND xmm_tmp1, xmm_tmp2
  - PMOVMSKB reg_y, xmm_tmp1
  - CMP reg_y, 0xFFFF
  - SETE reg_y
  - MOVZXBL reg_y, reg_y

ARM64 processors

i64x2.all_true
- y = i64x2.all_true(x) is lowered to:
  - CMEQ Vtmp.2D, Vx.2D, 0
  - UMINV Stmp, Vtmp.4S
  - UMOV Ry, Vtmp.S[0]
  - NEG Ry, Ry

ARMv7 processors with NEON instruction set

i64x2.all_true
- y = i64x2.all_true(x) is lowered to:
  - VCEQ.I32 Qtmp, Qx, 0
  - VPMIN.U32 Dtmp_lo, Dtmp_lo, Dtmp_hi
  - VPMIN.U32 Dtmp_lo, Dtmp_lo, Dtmp_lo
  - VMOV Ry, Dtmp_lo[0]
  - NEG Ry, Ry

ngzhian · 2021-01-06T02:03:11Z

I guess with #416 (likely to happen), this PR can be simplified to just i64x2.all_true.

Maratyszcza · 2021-01-11T21:21:21Z

Removed i64x2.any_true

abrown · 2021-01-11T21:26:23Z

I think this PR is similar to #368: a reduction from many lanes to a scalar. Since its i32x4 version, i32x4.all_true, has a lowering as unfortunate on x86 as this one, why not consider both together? I would argue that the long sequences that *.all_true introduces are potholes in the spec--not giant cliffs, but still unexpected performance issues in a spec designed for near-native performance. But, doubting that we would agree to remove *.all_true altogether in favor of the *.bitmask instructions, then this i64x2.all_true version makes sense.

lars-t-hansen · 2021-01-12T08:34:49Z

@abrown, it's a reduction but my expectation here would be that all_true is frequently used in a control context, ie as the argument to a branch, not captured for its value. In that case there's no reason for the final setnz / movzxbl pair, and the final code doesn't look all that bad IMO.

abrown · 2021-01-12T17:26:19Z

Yes, under that assumption it is not as bad (still not great, though). I guess you are also assuming that engines would be looking for and applying this type of optimization? I think I would as well but not sure about everyone else.

lars-t-hansen · 2021-01-12T19:30:02Z

Yes, I would assume engines would optimize boolean evaluation for control even for SIMD (SpiderMonkey does, anyway).

Curious about the sse4.1 lowering actually. Why not

PXOR tmp, tmp
PCMPEQQ tmp, src
PMOVMSKB dest, tmp
TEST dest, dest
SETZ dest
MOVZXBL dest, dest

Maratyszcza · 2021-01-12T22:44:54Z

@lars-t-hansen Both sequences use the same number of instructions, but the SSE4.1 in the proposal has shorter dependency chain. Also, TEST dest, dest should be CMP dest, 0xFFFF with 4-byte immediate.

penzn · 2021-01-13T02:09:40Z

Ideally it would be great to test this, but that is just my opinion.

dtig · 2021-01-25T19:05:55Z

Approving as this was voted on as of #419. @tlively could you confirm that the assigned opcode looks good for the tools?

tlively

Yes, the opcode looks good.

ngzhian · 2021-01-28T00:50:39Z

0xfdc3 collides with i16x8.extadd_pairwise_i8x16_u in v8 currently, i guess i should change ext add (since i picked that for prototyping).

tlively · 2021-01-28T01:47:04Z

Probably best to err on the side of caution and make the opcode TBD, then.

ngzhian · 2021-01-30T00:19:39Z

Think the last step of ARMv7 lowering is incorrect:
NEG Ry, Ry should be ADD Ry, Ry, #1

IIUC Ry will either be -1 or 0, and the result we want is 0 and 1 respectively.

Also the sequence doesn't seem to work for i64x2(0, -1), think all the vpmin should be vpmax. Here's what I cooked up that seems to pass the tests:

vpmax.32 tmp_lo, src_low, src_high
vceq.32 tmp, tmp #0
vpmax(32 tmp_low, tmp_low, tmp_low
vmov.32 dst, tmp_low[0]
add dst dst #1

Still 5 instructions, lmk if you figure out something better.

akirilov-arm · 2021-02-01T14:58:21Z

@Maratyszcza The ARM64 instruction sequence is wrong too - here's a corrected version:

CMEQ Vtmp.2d, Vx.2d, #0
UMAXV Stmp, Vtmp.4s
FMOV Wy, Stmp
ADD Wy, Wy, #1

However, the following is an improved sequence, since it avoids the reduction:

CMEQ Vtmp.2d, Vx.2d, #0
ADDP Dtmp, Vtmp.2d
FCMP Dtmp, Dtmp
CSET Xy, eq

Finally, possibly the best alternative, which is, unfortunately, an instruction longer:

FMOV Xy, Dx
FMOV Xtmp, Vx.d[1]
CMP Xy, #0
CCMP Xtmp, #0, #4, ne
CSET Xy, ne

This instruction was accepted into the proposal in WebAssembly#415.

lars-t-hansen · 2021-02-03T08:59:41Z

In reference to #415 (comment)

Also, TEST dest, dest should be CMP dest, 0xFFFF with 4-byte immediate.

@Maratyszcza, I don't think that's right, if what you're getting at is correctness and not performance. My sequence compares the input to zero, meaning the resulting bitmask will be nonzero in any lane that is zero, hence the test succeeds if the bitmask is zero everywhere, and setz captures that.

* [spectext] Add i64x2.all_true This instruction was accepted into the proposal in #415. * Simplify syntax/instruction bitmask

This was merged in WebAssembly#415.

This was merged in #415.

Maratyszcza mentioned this pull request Jan 5, 2021

Agenda for sync meeting 1/8/2021 #410

Closed

tlively mentioned this pull request Jan 8, 2021

Agenda for sync meeting 1/22/21 #419

Closed

Maratyszcza force-pushed the all-64bit branch from 5257ef3 to da1ea23 Compare January 11, 2021 21:08

Maratyszcza changed the title ~~i64x2.any_true and i64x2.all_true instructions~~ i64x2.all_true instructions Jan 11, 2021

Maratyszcza force-pushed the all-64bit branch from da1ea23 to bc07d22 Compare January 12, 2021 00:25

i64x2.all_true instructions

dd965ae

Maratyszcza force-pushed the all-64bit branch from bc07d22 to dd965ae Compare January 19, 2021 20:48

dtig approved these changes Jan 25, 2021

View reviewed changes

yurydelendik mentioned this pull request Jan 27, 2021

SIMD: Add i64x2.all_true, i64x2.eq, i64x2.ne bytecodealliance/wasm-tools#212

Merged

1 task

tlively approved these changes Jan 28, 2021

View reviewed changes

tlively added 3 commits January 29, 2021 19:04

Update BinarySIMD.md

3e2dcec

Update NewOpcodes.md

8557491

Merge branch 'master' into all-64bit

fbba411

tlively merged commit 3d8c870 into WebAssembly:master Jan 30, 2021

ngzhian added a commit to ngzhian/simd that referenced this pull request Feb 3, 2021

[spectext] Add i64x2.all_true

03d11f4

This instruction was accepted into the proposal in WebAssembly#415.

ngzhian mentioned this pull request Feb 3, 2021

[spectext] Add i64x2.all_true #444

Merged

ngzhian added a commit that referenced this pull request Feb 3, 2021

[spectext] Add i64x2.all_true (#444)

b300c7a

* [spectext] Add i64x2.all_true This instruction was accepted into the proposal in #415. * Simplify syntax/instruction bitmask

ngzhian added a commit to ngzhian/simd that referenced this pull request Feb 12, 2021

[interpreter] Add i64x2.all_true

0268294

This was merged in WebAssembly#415.

ngzhian mentioned this pull request Feb 12, 2021

[interpreter] Add i64x2.all_true #469

Merged

ngzhian added a commit that referenced this pull request Feb 17, 2021

[interpreter] Add i64x2.all_true

4c442ce

This was merged in #415.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i64x2.all_true instructions #415

i64x2.all_true instructions #415

Maratyszcza commented Dec 29, 2020 •

edited

Loading

ngzhian commented Jan 6, 2021

Maratyszcza commented Jan 11, 2021

abrown commented Jan 11, 2021

lars-t-hansen commented Jan 12, 2021

abrown commented Jan 12, 2021

lars-t-hansen commented Jan 12, 2021

Maratyszcza commented Jan 12, 2021 •

edited

Loading

penzn commented Jan 13, 2021

dtig commented Jan 25, 2021

tlively left a comment

ngzhian commented Jan 28, 2021 •

edited

Loading

tlively commented Jan 28, 2021

ngzhian commented Jan 30, 2021 •

edited

Loading

akirilov-arm commented Feb 1, 2021

lars-t-hansen commented Feb 3, 2021

i64x2.all_true instructions #415

i64x2.all_true instructions #415

Conversation

Maratyszcza commented Dec 29, 2020 • edited Loading

Introduction

Mapping to Common Instruction Sets

x86/x86-64 processors with AVX instruction set

x86/x86-64 processors with SSE4.1 instruction set

x86/x86-64 processors with SSE2 instruction set

ARM64 processors

ARMv7 processors with NEON instruction set

ngzhian commented Jan 6, 2021

Maratyszcza commented Jan 11, 2021

abrown commented Jan 11, 2021

lars-t-hansen commented Jan 12, 2021

abrown commented Jan 12, 2021

lars-t-hansen commented Jan 12, 2021

Maratyszcza commented Jan 12, 2021 • edited Loading

penzn commented Jan 13, 2021

dtig commented Jan 25, 2021

tlively left a comment

Choose a reason for hiding this comment

ngzhian commented Jan 28, 2021 • edited Loading

tlively commented Jan 28, 2021

ngzhian commented Jan 30, 2021 • edited Loading

akirilov-arm commented Feb 1, 2021

lars-t-hansen commented Feb 3, 2021

Maratyszcza commented Dec 29, 2020 •

edited

Loading

Maratyszcza commented Jan 12, 2021 •

edited

Loading

ngzhian commented Jan 28, 2021 •

edited

Loading

ngzhian commented Jan 30, 2021 •

edited

Loading