-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathCHANGES
198 lines (145 loc) · 9.33 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
Changes made since 1.1.0 (2019/06/30).
Implemented a little-known SHA-2 Maj() optimization proposed by Wei Dai.
Minor code cleanups and documentation updates.
Changes made between 1.0.3 (2018/06/13) and 1.1.0 (2019/06/30).
Merged yescrypt-opt.c and yescrypt-simd.c into one source file, which is
a closer match to -simd but is called -opt (and -simd is now gone).
With this change, performance of SIMD builds should be almost unchanged,
while scalar builds should be faster than before on register-rich 64-bit
architectures but may be slower than before on register-starved 32-bit
architectures (this shortcoming may be addressed later). This also
happens to make SSE prefetch available even in otherwise-scalar builds
and it paves the way for adding SIMD support on big-endian architectures
(previously, -simd assumed little-endian).
Added x32 ABI support (x86-64 with 32-bit pointers).
Changes made between 1.0.2 (2018/06/06) and 1.0.3 (2018/06/13).
In SMix1, optimized out the indexing of V for the sequential writes.
Changes made between 1.0.1 (2018/04/22) and 1.0.2 (2018/06/06).
Don't use MAP_POPULATE anymore because new multi-threaded benchmarks on
RHEL6'ish and RHEL7'ish systems revealed that it sometimes has adverse
effect far in excess of its occasional positive effect.
In the SIMD code, we now reuse the same buffer for BlockMix_pwxform's
input and output in SMix2. This might slightly improve cache hit rate
and thus performance.
Also in the SIMD code, a compiler memory barrier has been added between
sub-blocks to ensure that none of the writes into what was S2 during
processing of the previous sub-block are postponed until after a read
from S0 or S1 in the inline asm code for the current sub-block. This
potential problem was never observed so far due to other constraints
that we have, but strictly speaking those constraints were insufficient
to guarantee it couldn't occur.
Changes made between 1.0.0 (2018/03/09) and 1.0.1 (2018/04/22).
The included documentation has been improved, most notably adding new
text files PARAMETERS (guidelines on parameter selection, and currently
recommended parameter sets by use case) and COMPARISON (comparison to
scrypt and Argon2).
Code cleanups have been made, including removal of AVX2 support, which
was deliberately temporarily preserved for the 1.0.0 release, but which
almost always hurt performance with currently recommended low-level
yescrypt parameters on Intel & AMD CPUs tested so far. (The low-level
parameters are chosen with consideration for relative performance of
defensive vs. offensive implementations on different hardware, and not
only for seemingly best performance on CPUs. It is possible to change
them such that AVX2 would be worthwhile, and this might happen in the
future, but currently this wouldn't be obviously beneficial overall.)
Changes made between 0.8.1 (2015/10/25) and 1.0.0 (2018/03/09).
Hash string encoding has been finalized under the "$y$" prefix for both
native yescrypt and classic scrypt hashes, using a new variable-length
and extremely compact encoding of (ye)scrypt's many parameters. (Also
still recognized under the "$7$" prefix is the previously used encoding
for classic scrypt hashes, which is fixed-length and not so compact.)
Optional format-preserving salt and hash (re-)encryption has been added,
using the Luby-Rackoff construction with SHA-256 as the PRF.
Support for hash upgrades has been temporarily excluded to allow for its
finalization at a later time and based on actual needs (e.g., will 3x
ROM size upgrades be in demand now that Intel went from 4 to 6 memory
channels in their server CPUs, bringing a factor of 3 into RAM sizes?)
ROM initialization has been sped up through a new simplified algorithm.
ROM tags (magic constant values) and digests (values that depend on the
entire computation of the ROM contents) have been added to the last
block of ROM. (The placement of these tags/digests is such that nested
ROMs are possible, to allow for ROM size upgrades later.)
The last block of ROM is now checked for the tag and is always used for
hash computation before a secret-dependent memory access is first made.
This ensures that hashes won't be computed with a partially initialized
ROM or with one initialized using different machine word endianness, and
that they will be consistently miscomputed if the ROM digest is other
than what the caller expected. This in turn helps early detection of
problems with ROM initialization even if the calling application fails
to check for them. This also helps mitigate cache-timing attacks when
the attacker doesn't know the contents of the last block of ROM.
Many implementation changes have been made, such as for performance,
portability, security (intentional reuse and thus rewrite of memory
where practical and optional zeroization elsewhere), and coding style.
This includes addition of optional SSE2 inline assembly code (a macro
with 8 instructions) to yescrypt-simd.c, which tends to slightly
outperform compiler-generated code, including AVX(2)-enabled code, for
yescrypt's currently recommended settings. This is no surprise since
yescrypt was designed to fit the 64-bit mode extended SSE2 instruction
set perfectly (including SSE2's lack of 3-register instructions), so for
its optimal implementation AVX would merely result in extra instruction
prefixes and not provide any benefit (except for the uses of Salsa20
inherited from scrypt, but those are infrequent).
The auxiliary files inherited from scrypt have been sync'ed with scrypt
1.2.1, and the implementation of PBKDF2 has been further optimized,
especially for its use in (ye)scrypt where the "iteration count" is 1
but the output size is relatively large. (The speedup is measurable at
realistically low settings for yescrypt, such as at 2 MiB of memory.)
The included tests have been revised and test vectors regenerated to
account for the ROM initialization/use updates and hash (re-)encryption.
The PHC test vectors have been compacted into a single SHA-256 hash of
the expected output of phc.c, but have otherwise remained unchanged as
none of the incompatible changes have affected the subset of yescrypt
exposed via the PHS() interface for the Password Hashing Competition.
The specification document and extra programs that were included with
the PHC submission and its updates are now excluded from this release.
The rest of documentation files have been updated for the 1.0.0 release.
Changes made between 0.7.1 (2015/01/31) and 0.8.1 (2015/10/25).
pwxform became stateful, through writes to its S-boxes. This further
discourages TMTO attacks on yescrypt as a whole, as well as on pwxform
S-boxes separately. It also increases the total size of the S-boxes by
a factor of 1.5 (8 KiB to 12 KiB by default) and it puts the previously
mostly idle L1 cache write ports on CPUs to use.
Salsa20/8 in BlockMix_pwxform has been replaced with Salsa20/2.
An extra HMAC-SHA256 update of the password buffer (which is eventually
passed into the final PBKDF2 invocation) is now performed right after
the pwxform S-boxes initialization.
Nloop_rw rounding has been adjusted to be the same as Nloop_all's.
This avoids an unnecessary invocation of SMix2 with Nloop = 2, which
would otherwise have occurred in some cases.
t is now halved per hash upgrade (rather than reset to 0 right away on
the very first upgrade, like it was in 0.7.1).
Minor corrections and improvements to the specification and the code
have been made.
Changes made between 0.6.4 (2015/01/30) and 0.7.1 (2015/01/31).
The YESCRYPT_PARALLEL_SMIX and YESCRYPT_PWXFORM flags have been removed,
with the corresponding functionality enabled along with the YESCRYPT_RW
flag. This change has simplified the SIMD implementation a little bit
(eliminating specialized code for some flag combinations that are no
longer possible), and it should help simplify documentation, analysis,
testing, and benchmarking (fewer combinations of settings to test).
Adjustments to pre- and post-hashing have been made to address subtle
issues and non-intuitive behavior, as well as in some cases to reduce
impact of garbage collector attacks.
Support for hash upgrades has been added (the g parameter).
Extra tests have been written and test vectors re-generated.
Changes made between 0.5.2 (2014/03/31) and 0.6.4 (2015/01/30).
Dropped support for ROM access frequency mask since it made little sense
when supporting only one ROM at a time. (It'd make sense with two ROMs,
for simultaneous use of a ROM-in-RAM and a ROM-on-SSD. With just one
ROM, the mask could still be used for a ROM-on-SSD, but only in lieu of
a ROM-in-RAM, which would arguably be unreasonable.)
Simplified the API by having it accept NULL for the "shared" parameter
to indicate no ROM in use. (Previously, a dummy "shared" structure had
to be created.)
Completed the specification of pwxform, BlockMix_pwxform, Salsa20 SIMD
shuffling, and potential endianness conversion. (No change to these has
been made - they have just been specified in the included document more
completely.)
Provided rationale for the default compile-time settings for pwxform.
Revised the reference and optimized implementations' source code to more
closely match the current specification document in terms of identifier
names, compile-time constant expressions, source code comments, and in
some cases the ordering of source code lines. None of these changes
affect the computed hash values, hence the test vectors have remained
the same.