proposal: cmd/compile: floating point to integer conversions should saturate on overflow

Proposal Details

What

Change (implementation-defined) overflow behavior for Go's conversions from float{32,64} to [u]int{8,16,32,64} to be saturating on all platforms. Put this under a GOEXPERIMENT for 1.26, if that goes well, make this the implementation-defined behavior in 1.27, and change the spec in 1.28. This is already directly supported for conversions to 32 and 64 bit integers on arm64, loong64, riscv64, s390x, ppc64, and wasm; the notable outlier is amd64.

int64(NaN) is an additional concern; hardware produces 3 different results:

0 - arm64, loong64, wasm
MININT - amd64, s390x, ppc64
MAXINT - riscv64

Mandating a single answer for NaN conversion will add some code to signed conversions for some languages; defining all result bits zero except for the sign bit only requires an additional change (beyond others already made) on Riscv64.

Why

Variations in overflow behavior lead to unportability and we don't have any particular tools for checking this. Removing this unnecessarily implementation-defined behavior would remove this source of portability bugs and also make the out-of-range results easier to describe and understand. We've had the same bug reported twice against x/time/rate because of this implementation defined behavior (#71154, #18762)

For conversions to 8 and 16-bit integers, the overflow results are not intuitive even when they agree across platforms. I am slightly concerned that there will be more of these in our glorious machine-learning future, and that their current behavior will cause bugs. Changing the behavior of these conversions will result in a simpler and more consistent specification.

Compatibility issues

Technically, this is implementation-defined behavior, and for conversions to 32 and 64 bit integers this is merely normalizing the implementation-defined behavior to be what is supported by (some) popular hardware. The "compatibility issues" are actually "portability issues".

A not-experiment-gated version of this was tried, and was found to break some external code (that was quickly modified to remove the break).

Within Google, testing generated 147 apparent failures. Bisection succeeded for 124 of these (that is, there are at least 124 real failures) and 92 of these were traced to an at-best-confused conversion in a testing library. For the other 32 bisected failures, neither the code nor the test showed any indication of intent in the case of overflow; apparently someone had observed a behavior and declared it golden.

The largest compatibility change is the conversions to 8 and 16-bit types. Because no hardware supports this, overflow behavior across platforms is somewhat more consistent, and the implementation cost increases somewhat across all platforms. If, however, the former behavior/performance is required/desired, it is easily obtained by first casting the floating point number to int64, then to the smaller integer type.

Implementation plans/costs

For conversions to 32 and 64 bit signed integers, most hardware already supports saturating conversions. Amd64 doesn't, and there's some variation in NaN conversion that we might want to eliminate. The details below show only 64-bit architectures; the pattern is similar for 32-bit, plus softfloat work.

Hardware support

Saturating conversions to 32 and 64-bit integers are already supported by many 64-bit hardware platforms. The exceptions are amd64 and some older mips64 chips (FCSRNAN2008=0). Starred platforms have non-zero NaN conversion.

	arm64	riscv64*	ppc64*	wasm	s390x*	sparcv9	mips64,loong64
f32->s32	FCVTSSW	FCVTWS	-	I32TruncSatF32S	CFEBRA	FSTOI	TRUNCFW
f32->u32	FCVTZUSW	FCVTWUS	-	I32TruncSatF32U	-	-	-
f32->s64	FCVTSS	FCVTLS	-	I64TruncSatF32S	CGEBRA	FSTOX	TRUNCFV
f32->u64	FCVTZUS	FCVTLUS	-	I64TruncSatF32U	-	-	-
f64->s32	FCVTSDW	FCVTWD	FCTIWZ	I32TruncSatF64S	CFDBRA	FDTOI	TRUNCDW
f64->u32	FCVTZUDW	FCVTWUD	FCTIWUZ	I32TruncSatF64U	-	-	-
f64->s64	FCVTSD	FCVTLD	FCTIDZ	I64TruncSatF64S	CGDBRA	FDTOX	TRUNCDV
f64->u64	FCVTZUD	FCVTLUD	FCTIDUZ	I64TruncSatF64U	-	-	-

Signed 64-bit conversion on Amd64

func Int64(x float64) (result int64) {
	result = int64(x)     // CVTTSD2SQ (-)1 << 63 if NaN, +Inf, or -Inf
	if result == -1<<63 { // error value
		// should be one comparison of x and 0, and two conditional moves (but isn't yet)
		if !(x < 0) { // x > 0 || x is NaN
			result = 0
		}
		if x > 0 {
			result = 0x7fff_ffff_ffff_ffffw
		} 
	}
	return
}

Normalizing NaN before hardware conversion to [u]int{32,64} if necessary (riscv64, ppc64, s390x)

func Int64(x float64) (result int64) {
	if x != x { // NaN // compiles to conditional move if there is one.
		x = 0
	}
	result = int64HW(x)
}

Unsigned 32/64 from signed conversion (amd64, s390x, sparcv9, loong64)

func Uint32(x float32) (result uint32) {
	const W = 32
	const cutoff = 1 << (W - 1)
	if !(x >= cutoff) { // likely; also NaN
		result = uint32(int32HW(x)) // first signed conversion
		if !(x >= 0) {            // unlikely; also handles NaN
			result = 0
		}
	} else {
		y := x - cutoff
		z := int32HW(y) // second signed conversion
		if amd64 && z < 0 { // amd64 positive overflow
			z-- // or, z = cutoff-1
		}
		result = uint32(z) | cutoff
	}
	return
}

8/16 unsigned from unsigned float conversion (arm64, riscv64, wasm)

func Uint8(x float64) (result uint8) {
	const cutoff = 1<<(intY_Size) // intY_Size is 16 or 8
	result = uintY(uint64HW(x))
	if riscv64 && !(x < cutoff) { // optional riscv64 NaN correction 
		result = 0
	}
	if x >= cutoff {
		result = cutoff - 1
	}
}

8/16 unsigned from signed float conversion (amd64, s390x, sparcv9, loong64)

func Uint8(x float64) (result uint8) {
	const Y = 8
	const cutoff = 1 << (Y - 1)
	if !(x >= cutoff) { // likely; also NaN
		result = uint8(int64HW(x)) // first signed conversion
		if !(x >= 0) {           // unlikely; also NaN
			result = 0
		}
	} else {
		result = cutoff - 1
	}
	return
}

8/16 signed from signed float conversion (all platforms)

func Int8(x float64) int8 {
	const Y = 8
	const W = 64
	const OVERFLOW = (1<<(Y-1) - 1)
	s := int64HW(x)
	signs := uint64(s >> (Y - 1)) // want all 1s or 0s
	if signs+1 < 2 {              // all 1s or 0s, +1, yields 0 or 1
		return int8(s) // includes 0 for NaN on arm64, loong64, wasm
	}
	// Some kind of overflow or amd64 && , ppc64, s390x, riscv64} nonzero NaN conversion
	s = (s >> (W - 1)) ^ OVERFLOW // +/- overflow 
	if (amd64 || ppc64 || s390x || riscv64) && x != x {
		s = 0 // NaN
	}
	If amd64 && x > 0 { // NaN has low bits zero, and NaN is not > 0.
		s = OVERFLOW
	}
	return int8(s)
}

Existing conversion overflow behavior

Here 87381=1.5p16, big32=1.5p32, big64=1.5p64, values chosen so that they overflow conversions into both signed and unisgned integer types but are not infinity. Saturated results (that is, the goal of this proposal) appear in bold italic.

conversion	386	amd64	arm	arm64	loong64	riscv64	s390x	ppc64	ppc64le	conversion
int8(+87381)	0x+55	0x+55	0x+55	0x+55	0x+55	0x+55	0x+55	0x+55	0x+55	int8(+87381)
int8(+big32)	0	0	-1	-1	-1	-1	-1	-1	-1	int8(+big32)	DIFFER
int8(+big64)	0	0	-1	-1	-1	-1	-1	-1	-1	int8(+big64)	DIFFER
int8(-87381)	0x-55	0x-55	0x-55	0x-55	0x-55	0x-55	0x-55	0x-55	0x-55	int8(-87381)
int8(-big32)	0	0	0	0	0	0	0	0	0	int8(-big32)
int8(-big64)	0	0	0	0	0	0	0	0	0	int8(-big64)

int16(+87381)	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	int16(+87381)
int16(+big32)	0	0	-1	-1	-1	-1	-1	-1	-1	int16(+big32)	DIFFER
int16(+big64)	0	0	-1	-1	-1	-1	-1	-1	-1	int16(+big64)	DIFFER
int16(-87381)	0x-5555	0x-5555	0x-5555	0x-5555	0x-5555	0x-5555	0x-5555	0x-5555	0x-5555	int16(-87381)
int16(-big32)	0	0	0	0	0	0	0	0	0	int16(-big32)
int16(-big64)	0	0	0	0	0	0	0	0	0	int16(-big64)

int32(+big32)	MININT32	MININT32	MAXINT32	MAXINT32	MAXINT32	MAXINT32	MAXINT32	MAXINT32	MAXINT32	int32(+big32)	DIFFER
int32(+big64)	MININT32	MININT32	MAXINT32	MAXINT32	MAXINT32	MAXINT32	MAXINT32	MAXINT32	MAXINT32	int32(+big64)	DIFFER
int32(-big32)	MININT32	MININT32	MININT32	MININT32	MININT32	MININT32	MININT32	MININT32	MININT32	int32(-big32)
int32(-big64)	MININT32	MININT32	MININT32	MININT32	MININT32	MININT32	MININT32	MININT32	MININT32	int32(-big64)

int64(+big64)	0	MININT64	0x-100000000	MAXINT64	MAXINT64	MAXINT64	MAXINT64	MAXINT64	MAXINT64	int64(+big64)	DIFFER
int64(-big64)	0	MININT64	0	MININT64	MININT64	MININT64	MININT64	MININT64	MININT64	int64(-big64)	DIFFER

uint8(+87381)	0x+55	0x+55	0x+55	0x+55	0x+55	0x+55	0x+55	0x+55	0x+55	uint8(+87381)
uint8(+big32)	0	0	0x+ff	0x+ff	0x+ff	0x+ff	0x+ff	0x+ff	0x+ff	uint8(+big32)	DIFFER
uint8(+big64)	0	0	0x+ff	0x+ff	0x+ff	0x+ff	0x+ff	0x+ff	0x+ff	uint8(+big64)	DIFFER
uint8(-87381)	0x+ab	0x+ab	0x+ab	0x+ab	0x+ab	0x+ab	0x+ab	0x+ab	0x+ab	uint8(-87381)
uint8(-big32)	0	0	0	0	0	0	0	0	0	uint8(-big32)
uint8(-big64)	0	0	0	0	0	0	0	0	0	uint8(-big64)

uint16(+87381)	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	0x+5555	uint16(+87381)
uint16(+big32)	0	0	0x+ffff	0x+ffff	0x+ffff	0x+ffff	0x+ffff	0x+ffff	0x+ffff	uint16(+big32)	DIFFER
uint16(+big64)	0	0	0x+ffff	0x+ffff	0x+ffff	0x+ffff	0x+ffff	0x+ffff	0x+ffff	uint16(+big64)	DIFFER
uint16(-87381)	0x+aaab	0x+aaab	0x+aaab	0x+aaab	0x+aaab	0x+aaab	0x+aaab	0x+aaab	0x+aaab	uint16(-87381)
uint16(-big32)	0	0	0	0	0	0	0	0	0	uint16(-big32)
uint16(-big64)	0	0	0	0	0	0	0	0	0	uint16(-big64)

uint32(+big32)	0x+55555555	0x+55555555	MAXUINT32	0x+55555555	0x+55555555	0x+55555555	0x+55555555	0x+55555555	0x+55555555	uint32(+big32)	DIFFER
uint32(+big64)	0	0	MAXUINT32	MAXUINT32	MAXUINT32	MAXUINT32	MAXUINT32	MAXUINT32	MAXUINT32	uint32(+big64)	DIFFER
uint32(-87381)	0x+fffeaaab	0x+fffeaaab	0	0x+fffeaaab	0x+fffeaaab	0x+fffeaaab	0x+fffeaaab	0x+fffeaaab	0x+fffeaaab	uint32(-87381)	DIFFER
uint32(-big32)	0x+aaaaaaab	0x+aaaaaaab	0	0x+aaaaaaab	0x+aaaaaaab	0x+aaaaaaab	0x+aaaaaaab	0x+aaaaaaab	0x+aaaaaaab	uint32(-big32)	DIFFER
uint32(-big64)	0	0	0	0	0	0	0	0	0	uint32(-big64)

uint64(+big64)	0	1<<63	0x+ffffffff00000000	MAXUINT64	MAXUINT64	MAXUINT64	MAXUINT64	MAXUINT64	MAXUINT64	uint64(+big64)	DIFFER
uint64(-87381)	0x+fffffffffffeaaab	0x+fffffffffffeaaab	0x+fffffffffffeaaab	0	0x+fffffffffffeaaab	0x+fffffffffffeaaab	0	0x+fffffffffffeaaab	0x+fffffffffffeaaab	uint64(-87381)	DIFFER
uint64(-big32)	0x+fffffffeaaaaaaab	0x+fffffffeaaaaaaab	0x+fffffffeaaaaaaab	0	0x+fffffffeaaaaaaab	0x+fffffffeaaaaaaab	0	0x+fffffffeaaaaaaab	0x+fffffffeaaaaaaab	uint64(-big32)	DIFFER
uint64(-big64)	0	1<<63	0	0	1<<63	1<<63	0	1<<63	1<<63	uint64(-big64)	DIFFER

conversions	386	amd64	arm	arm64	loong64	riscv64	s390x	ppc64	ppc64le	conversions
int64(+Inf)	0	MININT64	0x-100000000	MAXINT64	MAXINT64	MAXINT64	MAXINT64	MAXINT64	MAXINT64	int64(+Inf)	DIFFER
int64(-Inf)	0	MININT64	0	MININT64	MININT64	MININT64	MININT64	MININT64	MININT64	int64(-Inf)	DIFFER
int64(NaN)	0	MININT64	0	0	0	MAXINT64	MININT64	MININT64	MININT64	int64(NaN)	DIFFER
int8(+Inf)	0	0	-1	-1	-1	-1	-1	-1	-1	int8(+Inf)	DIFFER
int8(-Inf)	0	0	0	0	0	0	0	0	0	int8(-Inf)
int8(NaN)	0	0	0	0	0	-1	0	0	0	int8(NaN)	DIFFER
uint32(+Inf)	0	0	MAXUINT32	MAXUINT32	MAXUINT32	MAXUINT32	MAXUINT32	MAXUINT32	MAXUINT32	uint32(+Inf)	DIFFER
uint32(-Inf)	0	0	0	0	0	0	0	0	0	uint32(-Inf)
uint32(NaN)	0	0	0	0	0	MAXUINT32	0	0	0	uint32(NaN)	DIFFER
uint8(+Inf)	0	0	255	255	255	255	255	255	255	uint8(+Inf)	DIFFER
uint8(-Inf)	0	0	0	0	0	0	0	0	0	uint8(-Inf)
uint8(NaN)	0	0	0	0	0	255	0	0	0	uint8(NaN)	DIFFER

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: cmd/compile: floating point to integer conversions should saturate on overflow #76264

Proposal Details

What

Why

Compatibility issues

Implementation plans/costs

Hardware support

Signed 64-bit conversion on Amd64

Normalizing NaN before hardware conversion to [u]int{32,64} if necessary (riscv64, ppc64, s390x)

Unsigned 32/64 from signed conversion (amd64, s390x, sparcv9, loong64)

8/16 unsigned from unsigned float conversion (arm64, riscv64, wasm)

8/16 unsigned from signed float conversion (amd64, s390x, sparcv9, loong64)

8/16 signed from signed float conversion (all platforms)

Existing conversion overflow behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: cmd/compile: floating point to integer conversions should saturate on overflow #76264

Description

Proposal Details

What

Why

Compatibility issues

Implementation plans/costs

Hardware support

Signed 64-bit conversion on Amd64

Normalizing NaN before hardware conversion to [u]int{32,64} if necessary (riscv64, ppc64, s390x)

Unsigned 32/64 from signed conversion (amd64, s390x, sparcv9, loong64)

8/16 unsigned from unsigned float conversion (arm64, riscv64, wasm)

8/16 unsigned from signed float conversion (amd64, s390x, sparcv9, loong64)

8/16 signed from signed float conversion (all platforms)

Existing conversion overflow behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions