bfloat16

bfloat16 (brain floating point、BF16)とは、浮動小数点を16ビット長で表す形式のひとつです。

GoogleがTensorFlow向けに開発したものです。

BF16は「単純にFP32の仮数部を切り詰めた」という仕様になっています。汎用的な「FP16」に対して「BF16はFP32と高速変換できる」のが利点であり、FP32との変換が頻繁に発生する用途（主に人工知能）向けだと言われています。

FP16
sign		exponent (5 bit)					fraction (10 bit)
┃		┌───────┐					┌─────────────────┐
	0	0	1	1	0	0	0	1	0	0	0	0	0	0	0	0
	15	14				10	9									0

bfloat16
sign		exponent (8 bit)								fraction (7 bit)
┃		┌─────────────┐								┌───────────┐
	0	0	1	1	1	1	1	0	0	0	1	0	0	0	0	0
	15	14							7	6						0

FP32
sign		exponent (8 bit)								fraction (23 bit)
┃		┌─────────────┐								┌───────────────────────────────────────────┐
	0	0	1	1	1	1	1	0	0	0	1	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
	31	30							23	22																						0

FP32からBF16への丸め処理の発動条件は

FP32の仮数部の「7ビット目が0」かつ「8ビット目が1」

主なBF16をサポートする製品編集

IntelのCPU (AVX-512対応製品の一部)
NVIDIAのGPU (NVIDIA Ampere搭載品)

関連項目編集

「https://monobook.org/w/index.php?title=Bfloat16&oldid=18951」から取得