Skip to content

Commit 7d435e1

Browse files
state presented at PHDays V, 26 May 2015, + slides
1 parent 127b8c1 commit 7d435e1

File tree

2 files changed

+363
-0
lines changed

2 files changed

+363
-0
lines changed

README.md

+31
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,39 @@ Then the following will work:
5454

5555
After the first run (for each format), you need to cd into JohnTheRipper/src/ , rerun ./configure script (otherwise you'll get "Unknown ciphertext format name requested") and rerun the command.
5656

57+
## Slides for PHDays V
58+
59+
Source file of slides for talk about john-devkit at PHDays V is in
60+
`slides_2015-05-26_phdays_v.src.org`
61+
62+
The slides contain code example from Keccak (from [JohnTheRipper/src/KeccakF-1600-unrolling.macros](https://github.com/magnumripper/JohnTheRipper/blob/bleeding-jumbo/src/KeccakF-1600-unrolling.macros) ) and code example from NetBSD's libcrypt (from [here](https://github.com/rumpkernel/netbsd-userspace-src/blob/3280867f12bbd346f39d5a4efb41fcf9b087bf33/lib/libcrypt/hmac_sha1.c) ). Everything else is under the following license:
63+
64+
`Copyright © 2015 Aleksey Cherepanov <[email protected]>`
65+
66+
`Redistribution and use in source and binary forms, with or without modification, are permitted.`
67+
68+
Code examples are between `>>>>` and `<<<<` . `#` in text is quoted with `\`: so `\#include` is for plain `#include`.
69+
70+
TODO: link to .pdf file, the script to compile slides.
71+
72+
## Usage without John the Ripper
73+
74+
The idea to write a hash algo in Python and get optimized C code is very attractive. But there are limitations with john-devkit: john-devkit is not suitable for regular applications. It is a really bad idea to use john-devkit in most cases.
75+
76+
It is possible to separate john-devkit and John the Ripper (use custom C template, fix output in `output_c.py` for some instructions that depend on `pseudo_intrinsics.h` and/or `johnswap.h` or pull the headers into your application if the licenses permit).
77+
78+
Optimizations in john-devkit rely onto high parallelism of attacker's position, so john-devkit is rather useless if you would like to hash just 1 candidate at a time.
79+
80+
Also it should be easier to implement and optimize 1 hash algo manually than using john-devkit (at least at the moment). john-devkit will benefit mostly from its scale: 200+ formats are still to be implemented.
81+
82+
There is a totally "no-no" thing for regular applications: john-devkit does not care about security, there are no defensive tricks, for instance john-devkit does not prevent information disclose through timings, so produced code is weak against a range of attacks.
83+
84+
Please use standard/good libraries for hashing. For instance phpass for php.
85+
5786
## License
5887

88+
Currently, files generated by john-devkit are subject for original license of John the Ripper. See below.
89+
5990
Each file in john-devkit has its license written in a comment close to the beginning of the file. Usually it is the following cut-down BSD license:
6091
`Redistribution and use in source and binary forms, with or without modification, are permitted.`
6192

slides_2015-05-26_phdays_v.src.org

+332
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,332 @@
1+
Abstract from phdays.com:
2+
3+
A lot of time was spent to improve hash cracking speed, but the
4+
results still leave much to be desired. However, what if it was
5+
possible to make computer optimize the code and to separate crypto
6+
primitives and optimizations? The most flexible and powerful solution
7+
is code generation. The speaker will make an overview of various
8+
approaches and demonstrate the code generation techniques used in
9+
john-devkit to improve John the Ripper, the famous password cracker.
10+
11+
Slides below:
12+
---
13+
14+
john-devkit: specialized compiler for hash cracking
15+
16+
Aleksey Cherepanov
17+
18+
---
19+
20+
General
21+
22+
---
23+
24+
john-devkit
25+
- is an experiment
26+
- not yet embraced by John the Ripper developer community
27+
- is a code generator
28+
- on input: algo written in special language and a list of
29+
optimizations to apply
30+
- on output: C file for John the Ripper
31+
32+
---
33+
34+
John the Ripper (JtR)
35+
- the famous hash cracker
36+
- primary purpose is to detect weak Unix passwords
37+
- supports 200+ hash formats (types)
38+
- supports several kinds of compute devices:
39+
- CPU, Xeon Phi
40+
- scalar
41+
- SIMD: SSE2+/AVX/XOP, AVX2, MIC/AVX-512, AltiVec, NEON
42+
- GPU
43+
- OpenCL, CUDA
44+
- FPGA, Epiphany
45+
- currently for bcrypt only
46+
47+
---
48+
49+
Problems of JtR development
50+
- scalability of programmers is low due to 200+ formats: sometimes it
51+
is hard to apply even 1 optimization to all formats:
52+
- important formats get the optimization first
53+
- each additional format to optimize eats more time
54+
- support for each device needs a separate implementation
55+
- readability degrades when various cases are handled by preprocessor
56+
57+
---
58+
59+
Aims of john-devkit
60+
- to separate crypto algorithms, optimizations, and output code for
61+
various devices
62+
- to include optimizations specific for hash cracking and John the Ripper
63+
- to provide better syntax
64+
- to retain or improve performance
65+
- to provide precise control over optimizations
66+
- to support various devices: CPU, GPU, FPGA
67+
- to give great output for great input (not for any input)
68+
- to be simple
69+
70+
---
71+
72+
Early results
73+
- john-devkit is not mature
74+
- 7 formats were implemented separating crypto primitives,
75+
optimizations, and device specific code
76+
- good speeds (over default implementation in JtR):
77+
- raw-sha256 +22%
78+
- raw-sha224 +20%
79+
- raw-sha512 +6%
80+
- raw-sha384 +5%
81+
- bad speeds (but expose interesting features of john-devkit):
82+
- raw-sha1 -1%
83+
- raw-md4 -11%
84+
- raw-md5 -15%
85+
- optimizations implemented: interleave, vectorization, unroll of
86+
loops, early reject, additional batching (loop around algo)
87+
- all formats got all optimizations without effort
88+
89+
---
90+
91+
Optimizations
92+
93+
---
94+
95+
Cracking process
96+
- we are in attacker's position
97+
- we have a lot of candidates to try
98+
- high parallelism
99+
- high level algo:
100+
- load hashes (once)
101+
- generate some candidates
102+
- compute hashes (or only parts)
103+
- reject most of wrong candidates
104+
- check probable passwords precisely (rare case)
105+
- generate next batch of candidates and repeat
106+
- formats are integrated into this process using OOP-like calls over
107+
function pointers
108+
109+
---
110+
111+
Optimizations
112+
- some optimizations do not affect internals of crypto algorithms in
113+
any way and may be added to any algorithm
114+
- additional loop around algo to process more candidates in 1 call
115+
- OpenMP support
116+
- other optimizations affect crypto algorithms
117+
- vectorization (SIMD)
118+
- precomputation
119+
- e.g. first few steps in MD*/SHA* for partially changed input
120+
- reversal of operations
121+
- e.g. last few steps in MD*/SHA*, DES final permutation
122+
- loop unrolling
123+
- interleaving
124+
- bitslicing
125+
- and others
126+
127+
---
128+
129+
Bitslice
130+
- splits numbers into bits and computes everything through bitwise
131+
operations
132+
- optimization focuses on minimization of Boolean formula (or circuit)
133+
- Roman Rusakov generated current formulas for S-boxes of DES used in
134+
John the Ripper with custom generator
135+
- it took 3 months
136+
- Billy Bob Brumley demonstrated application of simulated annealing
137+
algorithm to scheduling of DES S-box instructions
138+
- so code generation is not new for John the Ripper (not even speaking
139+
about C preprocessor)
140+
141+
---
142+
143+
Other solutions
144+
145+
---
146+
147+
OpenCL
148+
- is the first thing I hear when I say about output for both CPU and GPU
149+
- has quite heavy syntax (based on C)
150+
- knows nothing about John the Ripper
151+
- does not have automatic bitslicing
152+
153+
---
154+
155+
Dynamic formats in John the Ripper
156+
- were implemented by Jim Fougeron
157+
- separate crypto primitives from formats
158+
- so md5($p) and md5(md5($p)) have one code base
159+
- work at runtime
160+
- john-devkit aims to be able to do similar thing but at compile time
161+
and with ability to optimize better
162+
- so md5(md5($p)) would get more optimizations (at price of separate
163+
code)
164+
165+
---
166+
167+
C Macros
168+
- allow to do things, but are not smart
169+
- an example of loop unroll in Keccak defining all useful variants:
170+
>>>>
171+
[...]
172+
#elif (Unrolling == 3)
173+
#define rounds \
174+
prepareTheta \
175+
for(i=0; i<24; i+=3) { \
176+
thetaRhoPiChiIotaPrepareTheta(i , A, E) \
177+
thetaRhoPiChiIotaPrepareTheta(i+1, E, A) \
178+
thetaRhoPiChiIotaPrepareTheta(i+2, A, E) \
179+
copyStateVariables(A, E) \
180+
} \
181+
copyToState(state, A)
182+
#elif (Unrolling == 2)
183+
#define rounds \
184+
prepareTheta \
185+
for(i=0; i<24; i+=2) { \
186+
thetaRhoPiChiIotaPrepareTheta(i , A, E) \
187+
thetaRhoPiChiIotaPrepareTheta(i+1, E, A) \
188+
} \
189+
copyToState(state, A)
190+
[...]
191+
<<<<
192+
193+
---
194+
195+
X-Macro
196+
- is a tricky way to use macros, most likely with a separate file to
197+
be included multiple times:
198+
- the file has code with variable parts
199+
- these parts are defined before \#include
200+
- so \#include provides a "template engine"
201+
- example from NetBSD's libcrypt:
202+
>>>>
203+
[...]
204+
#define HASH_Init SHA1Init
205+
#define HASH_Update SHA1Update
206+
#define HASH_Final SHA1Final
207+
#include "hmac.c"
208+
<<<<
209+
210+
---
211+
212+
john-devkit technical details
213+
214+
---
215+
216+
From Python to C in john-devkit
217+
- bytecode is generated from algorithm description
218+
- bytecode is modified by several functions chosen by user
219+
- C code is generated from the modified bytecode using a template
220+
221+
---
222+
223+
bytecode
224+
- while algorithms are written in Python with modified environment,
225+
john-devkit uses flat representation of code using its own
226+
instruction language called bytecode
227+
- some instructions of this language express constructions specific to
228+
hash cracking
229+
- for instance, state variables of hash functions are defined by
230+
special instruction
231+
- bytecode is very simple
232+
- bytecode is intended to be rich to express common constructions
233+
natively to simplify optimization
234+
235+
---
236+
237+
Example of specific instruction
238+
- separate instruction is used to define state variable, so
239+
john-devkit uses a filter to replace initial state with values for
240+
SHA-224 having code for SHA-256:
241+
>>>>
242+
def override_state(code, state):
243+
consts = {}
244+
for l in code:
245+
if l[0] == 'new_const':
246+
consts[l[1]] = l
247+
if l[0] == 'new_state_var':
248+
consts[l[2]][2] = str(state.pop(0))
249+
return code
250+
<<<<
251+
252+
---
253+
254+
Optimizations specific to password cracking
255+
- use knowledge not available to regular compiler:
256+
- code can be moved between some functions of format
257+
- the functions have different probability to be called
258+
- so main computation is always called
259+
- check of probable candidates is very rare
260+
- it almost implies a successful guess (for strong hashes),
261+
- also hashes are loaded only once while there are millions of
262+
candidates being hashed every second
263+
264+
---
265+
266+
Specific optimization: early reject
267+
- hashes are long
268+
- some output values may be computed a bit quicker than others
269+
- a 32-bit or 64-bit one value is usually enough to reject almost all
270+
wrong candidates
271+
- so john-devkit drops instructions for computation of other output
272+
values in main working function and places full implementation into
273+
function for precise check of possible password
274+
- main implementation is vectorized while full implementation is
275+
scalar because it checks only 1 candidate
276+
277+
---
278+
279+
Specific optimization: steps reversal
280+
- some operations can be reversed
281+
- if r = i + C, we know r, and C is a constant, then i = r - C
282+
- John the Ripper learns "r" when it loads hashes
283+
- john-devkit can sometimes reverse operations, replacing "forward"
284+
computation during cracking (applied per candidate password) with
285+
reverse computation at startup (applied per hash)
286+
287+
---
288+
289+
Full Python
290+
- is available to define algorithms
291+
- the environment has some objects with overloaded instructions to
292+
produce bytecode in a global variable instead of running it right away
293+
- but any Python code can be used
294+
- it is evaluated fully before further steps of code generation
295+
- but to produce good output some additional markup may be needed
296+
297+
---
298+
299+
Full Python, example
300+
- a part of MD4 definition adapted right from RFC 1320:
301+
>>>>
302+
def make_round(func, code):
303+
res = ''
304+
func = re.sub('([abcdks])', r'{\1}', func)
305+
parts = re.compile(r'\[(.)(.)(.)(.)\s+(\d+)\s+(\d+)\]').findall(code)
306+
for a, b, c, d, k, s in parts:
307+
res += func.format(**vars()) + "\n"
308+
return res
309+
310+
exec make_round('a = rol((a + F(b, c, d) + X[k]), s)',
311+
''' [ABCD 0 3] [DABC 1 7] [CDAB 2 11] [BCDA 3 19]
312+
[ABCD 4 3] [DABC 5 7] [CDAB 6 11] [BCDA 7 19]
313+
[ABCD 8 3] [DABC 9 7] [CDAB 10 11] [BCDA 11 19]
314+
[ABCD 12 3] [DABC 13 7] [CDAB 14 11] [BCDA 15 19]
315+
''')
316+
<<<<
317+
318+
---
319+
320+
Conclusions
321+
- john-devkit demonstrates practical application of code generation
322+
approach
323+
- john-devkit is a real way to automate programmer's work at such
324+
scale
325+
326+
---
327+
328+
Thank you!
329+
- Thank you!
330+
- code: https://github.com/AlekseyCherepanov/john-devkit
331+
- more technical detail will be on john-dev mailing list
332+
- my email: [email protected]

0 commit comments

Comments
 (0)