forked from ROCm/ROCm.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
GCN_Float16.html
135 lines (111 loc) · 6.98 KB
/
GCN_Float16.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
<!DOCTYPE html>
<html>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-83146017-1', 'auto');
ga('send', 'pageview');
</script>
<head>
<meta charset='utf-8'>
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<meta name="description" content="ROCm, A New Era in Open GPU Computing : Platform for GPU Enabled HPC and UltraScale Computing ">
<link rel="stylesheet" type="text/css" media="screen" href="stylesheets/stylesheet.css">
<title>ROCm, A New Era in Open GPU Computing</title>
</head>
<body>
<!-- HEADER -->
<div id="header_wrap" class="outer">
<header class="inner">
<a id="forkme_banner" href="https://github.com/RadeonOpenCompute">View on GitHub</a>
<img class="wrap" src="images/ROCm_Logo_128.png" alt="ROCm_Logo" />
<h2 id="project_title">ROCm, A New Era in Open GPU Computing</h2>
<h3 id="project_tagline">Platform for GPU Enabled HPC and UltraScale Computing </h3>
</header>
</div>
<div id="nav">
<div id="nbar">
<ul>
<li><a href="index.html">Overview</a></li>
<li><a href="ROCmInstall.md">Install</a></li>
<li><a href="languages.html">Languages</a></li>
<li><a href="http://rocm-documentation.readthedocs.io/en/latest/index.html">Documentation</a></li>
<li><a href="packages.html">ROCm Solutions </a></li>
<li><a href="tutorials.html">Tutorials</a></li>
<li><a href="hardware.html">ROCm Hardware</a></li>
<li><a href="blog.html">ROCm BLOG</a></li>
</ul>
</div>
</div>
<!-- MAIN CONTENT -->
<div id="main_content_wrap" class="outer">
<section id="main_content" class="inner">
<h1>ROC ON, Float16 and Integer16 support in AMD GPUs</h1>
<p>It has been a secret for too long. AMD GPUs do support Float16 and Int16 instructions. The current GPUs execute at same speed as Float32. </p>
<ul>
<li> Fiji Family of Hardware: Radeon R9 Nano, R9 Fury, R9 Fury X, FirePro S9300x2,</li>
<li> Tonga Family of Hardware: R9 285, R9 380, R9 380x, FirePro S7150x2, S7150, W7100</li>
<li> Polaris Family of Hardware: RX480. RX470, RX460 </li>
</ul>
<p>We will also expose our GCN 3 ISA via assembler directly support by compiler. The new LLVM Native GCN ISA compiler supports a disassembler, assembler and soon inline-assembly so you be able tune your code even further. </p>
<p>ROCm Compilers will be bring full richness of FLOAT16 and Int16 via HCC, HIP and OpenCL. </p>
<p><a href="http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2016/08/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf">You can find out more on Float16 and other instruction in the GCN version 3 ISA manual</a></p>
+
+<p>Here are examples of some of the instructions supported:</p>
<ul>
<li>V_FREXP_EXP_I16_F16 Returns exponent of half precision float input, such that the original single float = significand * (2 ** exponent).</li>
<li>V_CVT_F16_F32 Float32 to Float16.</li>
<li>V_ADD_F16 D.f16 = S0.f16 + S1.f16. Supports denormals, round mode, exception flags, saturation.</li>
<li>V_SUB_F16 D.f16 = S0.f16 - S1.f16. Supports denormals, round mode, exception flags, saturation. SQ translates to V_ADD_F16.</li>
<li>V_MAC_F16 16-bit floating point multiply -accumulate</li>
<li>V_FMA_F16.Fused half precision multiply add.</li>
<li>V_MAD_F16 Floating point multiply-add (MAD). Gives same result as ADD after MUL_IEEE. Uses IEEE rules for 0*anything. </li>
<li>V_MADAK_F16 16-bit floating-point multiply-add with constant add operand.</li>
<li>V_MADMK_F16 16-bit floating-point multiply-add with multiply operand immediate.</li>
<li>V_COS_F16 Cosine function </li>
<li>V_SIN_F16 Sin function</li>
<li>V_EXP_F16 Base2 exponent function</li>
<li>V_LOG_F16 Base2 log function.
<li> V_SQRT_F16 if(S0.f16 == 1.0f) D.f16 = 1.0f; else D.f16 = ApproximateSqrt(S0.f16).</li>
<li>V_FRACT_F16 Floating point ‘fractional’ part of S0.f. </li>
<li>V_RCP_F16 if (S0.f16 == 1.0f), D.f16 = 1.0f; else D.f16 = ApproximateRecip(S0.f16).</li>
<li>V_RSQ_F16 if(S0.f16 == 1.0f) D.f16 = 1.0f; else D.f16 = ApproximateRecipSqrt(S0.f16).</li>
<li>V_RNDNE_F16 Floating-point Round-to-Nearest-Even Integer.</li>
<li>V_TRUNC_F16 Floating point ‘integer’ part of S0.f. D.f16 = trunc(S0.f16). Round-to-zero semantics.</li>
<li>V_LDEXP_F16</li>
<li>V_CEIL_F16 Floating point ceiling function.</li>
<li>V_FLOOR_F16 Floating-point floor function</li>
<li>V_MAX_F16 D.f16 = max(S0.f16, S1.f16). IEEE compliant. Supports denormals, round mode, exception flags, saturation.</li>
<li>V_MAX_I16 D.f16 = max(S0.f16, S1.f16). IEEE compliant. Supports denormals, round mode, exception flags, saturation. </li>
<li>V_MIN_F16 D.f16 = min(S0.f16, S1.f16). IEEE compliant. Supports denormals, round mode, exception flags, saturation. </li>
<li>V_CVT_PKRTZ_F16_F32 Convert two float 32 numbers into a single register holding two packed 16-bit floats.</li>
<li>V_DIV_FIXUP_F16 Given a numerator, denominator, and quotient from a divide, this opcode detects and applies special case numerics, modifies the quotient if necessary. This opcode also generates invalid, denorm, and divide by zero exceptions caused by the division.</li>
<li>V_SUBREV_F16 D.f16 = S1.f16 - S0.f16. Supports denormals, round mode, exception flags, saturation. SQ translates to V_ADD_F16.</li>
</ul>
+<p>Also the GCN 3 Architecture supports 32-bit, 24-bit, and 16-bit integer math.</p>
<ul>
<li>V_ADD_U16 D.u16 = S0.u16 + S1.u16. Supports saturation (unsigned 16-bit integer domain).</li>
<li>V_SUB_U16 D.u16 = S0.u16 - S1.u16. Supports saturation (unsigned 16-bit integer domain).</li>
<li>V_MAD_I16 Signed integer muladd.</li>
<li>V_MAD_U16 Unsigned integer muladd.</li>
<li>V_SAD_U16 Sum of absolute differences with accumulation.</li>
<li>V_MAX_I16 D.i[15:0] = max(S0.i[15:0], S1.i[15:0]).</li>
<li>V_MAX_U16 D.u[15:0] = max(S0.u[15:0], S1.u[15:0]).</li>
<li>V_MIN_I16 D.i[15:0] = min(S0.i[15:0], S1.i[15:0]).</li>
<li>V_MIN_U16 D.u[15:0] = min(S0.u[15:0], S1.u[15:0]).</li>
<li>V_MUL_LO_U16 D.u16 = S0.u16 * S1.u16. Supports saturation (unsigned 16-bit integer domain).</li>
<li>V_CVT_F16_U16 D.f16 = uint16_to_flt16(S.u16). Supports denormals, rounding, exception flags and saturation.</li>
<li>V_CVT_F16_I16 D.f16 = int16_to_flt16(S.i16). Supports denormals, rounding, exception flags and saturation</li>
<li><p>V_SUBREV_U16 D.u16 = S1.u16 - S0.u16. Supports saturation (unsigned 16-bit integer domain). SQ translates this to V_SUB_U16 with reversed operands.</p>
</ul>
</div>
<!-- FOOTER -->
<div id="footer_wrap" class="outer">
<footer class="inner">
<p> © 2016 AMD Corporation <a href="legal.html">Disclaimer and Legal Information</a> </p>
</footer>
</div>
</body>
</html>