README.md 12.5 KB
Newer Older
1
[![Build Status](https://travis-ci.com/Netflix-Skunkworks/jvmquake.svg?branch=master)](https://travis-ci.com/Netflix-Skunkworks/jvmquake)
2

Joseph Lynch's avatar
Joseph Lynch committed
3
4
5
# `jvmquake`
A JVMTI agent that attaches to your JVM and automatically signals and kills it
when the program has become unstable.
Joseph Lynch's avatar
Joseph Lynch committed
6

Joseph Lynch's avatar
Joseph Lynch committed
7
8
9
10
11
The name comes from "jvm earth`quake`" (a play itself on hotspot).

This project is heavily inspired by [`airlift/jvmkill`](https://github.com/airlift/jvmkill)
written by `David Phillips <david@acz.org>` but adds the additional innovation of
a GC instability detection algorithm for when a JVM is unstable but not quite
12
dead yet (aka "GC spirals of death").
Joseph Lynch's avatar
Joseph Lynch committed
13

14
**Production Quality**
Joseph Lynch's avatar
Joseph Lynch committed
15

16
17
18
19
20
This agent has a thorough test suite and error handling, and has been
demonstrated in production to be superior to the built in JVM options.
Netflix currently (2019-11-11) run this software attached to a very large
number of Cassandra and Elasticsearch JVMs.

Joseph Lynch's avatar
Joseph Lynch committed
21
A detailed motivation is below. To just start using `jvmquake`, skip to
22
23
[Building and Usage](#building-and-usage) for how to build and use this agent.

Joseph Lynch's avatar
Joseph Lynch committed
24
25
# Motivation
Java Applications, especially databases such as Elasticsearch and Cassandra
26
27
can easily enter GC spirals of death, either resulting in eventual OOM or
Concurrent Mode Failures (aka "CMF" per CMS parlance although G1 has similar
28
29
30
31
issues with frequent mixed mode collections). Concurrent mode failures, when
the old gen collector is running frequently expending a lot of CPU resources
but is still able to reclaim enough memory so that the application does not
cause a full OOM, are particularly pernicious as they appear as 10-30s
32
33
34
35
36
37
38
39
40
41
42
43
"partitions" (duration is proportional to heap size) which repeatedly form
and heal ...

This grey failure mode *wreaks havoc* on distributed systems. In the case of
databases it can lead to degraded performance or even data corruption.  General
jvm applications that use distributed locks to enter a critical section may
make incorrect decisions under the assumption they have a lock when they in
fact do not (e.g. if the application pauses for 40s and then continues
executing assuming it still held a lock in Zookeeper).

As pathological heap situations are so problematic, the JVM has various flags
to try to address these issues:
Joseph Lynch's avatar
Joseph Lynch committed
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

* `OnOutOfMemoryError`: Commonly used with `kill -9 %p`. This options sometimes
works but most often results in no action, especially when the JVM is out of
file descriptors and can't execute the command at all. As of Java 8u92 there
is a better option in the `ExitOnOutOfMemoryError` option below. This option
furthermore does not handle excessive GC.
* `ExitOnOutOfMemoryError` and `CrashOnOutOfMemoryError`: Both options were
added as part of [JDK-8138745](https://bugs.openjdk.java.net/browse/JDK-8138745)
Both work great for dealing with running out memory, but do not handle other
edge cases such as [running out of threads](https://bugs.openjdk.java.net/browse/JDK-8155004).
Also naturally these do nothing when you are in the "grey" failure mode of CMF.

There are also some options that are supposed to control GC overhead:

* `GCHeapFreeLimit`, `GCTimeLimit` and `+UseGCOverheadLimit`. These options
are supposed to cause an OOM in the case where we are not collecting enough
memory, or are spending too much time in GC. However in practice I've never
been able to get these to work well, and `GCOverheadLimit` is afaik only
supported in CMS.

**TLDR**: In my experience these JVM flags are **hard to tune** and **only
sometimes work** if they work at all, and often are limited to a subset of JVMs
of collectors.

## Premise of `jvmquake`
`jvmquake` is designed with the following guiding principles in mind:

1. If my JVM becomes totally unusable (OOM, out of threads, etc), I want it to
   die.
2. If my JVM spends excessive time garbage collecting, I want it to die.
3. I may want to be able to debug why my JVM ran out of memory (e.g.
75
76
77
   heap dumps or core dumps). I may want jvmquake to signal me that JVM is in
   trouble before it kills it so I can start gathering additional diagnostics.
4. This should work on any JVM (Java 6, Java 7, Java 8, Java 11, w.e.).
Joseph Lynch's avatar
Joseph Lynch committed
78
79
80
81
82
83

These principles are in alignment with **Crash Only Software**
([background](https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf))
which implores us to crash when we encounter bugs instead of limping along.

## Knobs and Options
84
85
`jvmquake` has three options passed as comma delimited integers
`<threshold>,<runtime_weight>,<action>`:
Joseph Lynch's avatar
Joseph Lynch committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99

 * `threshold` (default: 30): the maximum GC "deficit" which can be
   accumulated before jvmquake takes action, specified in seconds.
 * `runtime_weight` (default: 5): the factor by which to multiply
   running JVM time, when weighing it against GCing time. "Deficit" is
   accumulated as `gc_time - runtime * runtime_weight`, and is compared against
   `threshold` to determine whether to take action. (default: 5)
 * `action` (default: 0): what action should be taken when `threshold` is
   exceeded. If zero, jvmquake attempts to produce an OOM within the JVM
   (allowing standard OOM handling such as `HeapDumpOnOutOfMemoryError` to
   trigger). If nonzero, jvmquake raises that signal number as an OS-level
   signal. **Regardless of the action, the JVM is then forcibly killed via a
   `SIGKILL`.**

100
101
102
103
104
105
106
107
108
109
110
In addition, `jvmquake` supports keyword arguments passed as comma separated
`key=value` pairs in a fourth argument, so
`int,int,int,key1=value1,key2=value2`. The currently supported key value pairs
are:
 * `warn` (type: int, default: maxint): an amount of GC "deficit" (analogous
   to `threshold` which will  cause `jvmquake` to touch a file (see `touch`)
   before it kills the JVM. The default setting is not to warn.
 * `touch` (type: string, default: `/tmp/jvmquake_warn_gc`): The file path that
   jvmquake should open (creating if neccesary) and update the access and
   modification time on when there is more than `warn` GC "deficit".

Joseph Lynch's avatar
Joseph Lynch committed
111
112
113
114
115
116
117
118
119
120
## Algorithm Details
To achieve our goal, we build on `jvmkill`. In addition to dying when we see a
[`ResourceExhausted`](https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#ResourceExhausted)
event, `jvmquake` keeps track of every GC entrance and exit that pause the
application using
[`GarbageCollectionStart`](https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#GarbageCollectionStart)
and
[`GarbageCollectionFinish`](https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#GarbageCollectionFinish).
`jvmquake` then keeps a *token bucket* algorithm to keep track of how
much time is spent GCing relative to running application code. Note that per
121
the `JVMTI` spec these only track *stop the world* pausing phases of collections.
Joseph Lynch's avatar
Joseph Lynch committed
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
. The following pseudocode is essentially all of `jvmquake`:

```python3
# The bucket for keeping track of relative running and non running time
token_bucket : int = 0
# The amount of weight to give running seconds over GCing seconds. This defines
# our expected application throughput
runtime_weight : int = 5
# The amount of time that we must exceed the expected throughput by before
# triggering the signal and death actions
gc_threshold : int = 30

# Time bookeeping
last_gc_start : int = current_time()
last_gc_end : int = current_time()

def on_gc_start()
    last_gc_start = current_time()
    time_running = (last_gc_start - last_gc_end)
    token_bucket = max(0, token_bucket - (time_running * runtime_weight))

def on_gc_end()
    last_gc_end = current_time()
    time_gcing = (last_gc_end - last_gc_start)
    token_bucket += time_gcing

    if token_bucket > gc_threshold:
        take_action()
```

152
153
154
155
The `warn` and `touch` options just touch a file (specified by `touch`) when
the `token_bucket` exceeds the warning gc threshold instead of the kill
threshold.

156
# Building and Usage
157
As `jvmquake` is a JVMTI C agent (so that it lives outside the heap and cannot
158
159
be affected by GC behavior), you must compile it before using it against
your JVM. You can either do this on the machine running the Java project or
160
more commonly in an external build that generates the `.so` or a package such
161
162
163
as a `.deb`. The generated `.so` depends only your architecture and libc and
should work with any JDK newer than the one you compiled it with on the same
platform (so e.g. `linux-x86_64` will work on all `x86_64` linux systems).
Joseph Lynch's avatar
Joseph Lynch committed
164

165
166
167
```bash
# Compile jvmquake against the JVM the application is using. If you do not
# provide the path, the environment variable JAVA_HOME is used instead
Joseph Lynch's avatar
Joseph Lynch committed
168
169

make JAVA_HOME=/path/to/jvm
Joseph Lynch's avatar
Joseph Lynch committed
170
171
```

172
For example if the Oracle Java 8 JVM is located at `/usr/lib/jvm/java-8-oracle`:
Joseph Lynch's avatar
Joseph Lynch committed
173

174
```bash
175
make JAVA_HOME=/usr/lib/jvm/java-8-oracle
Joseph Lynch's avatar
Joseph Lynch committed
176
177
```

178
179
180
The agent is now available at `build/libjvmquake-<platform>.so`. For example,
on a linux machine you should get `libjvmquake-linux-x86_64.so` and on mac
you might see `libjvmquake-darwin-x86_64.so`.
181

182
183
184
185
186
187
188
189
*Note*: A `libjvmquake.so` built from source like this is portable to all JVMs
that implement the same [`JVMTI`](https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html)
specification. In practice I find the same `.so` works fine with Java 8, 9, 11
and I imagine it will work until Java changes the spec.

See [Testing](#Testing) for the full set of platforms that we test against.

[![Build Status](https://travis-ci.org/Netflix-Skunkworks/jvmquake.svg?branch=master)](https://travis-ci.org/Netflix-Skunkworks/jvmquake)
190
191

## How to Use the Agent
192
193
Once you have the agent library, run your java program with `agentpath` or `agentlib`
to load it.
194
195
196

```
java -agentpath:/path/to/libjvmquake.so <your java program here>
197
198
```

199
200
201
If you have installed the `.so` to `/usr/lib` (for example using a debian
package) you can just do `java -agentpath:libjvmquake.so`.

202
203
204
The default settings are 30 seconds of GC deficit with a 1:5 gc:running time
weight, and the default action is to trigger an in JVM OOM. These defaults
are reasonable for a latency critical java application.
205

206
207
If you want different settings you can pass options per the
[option specification](#knobs-and-options).
Joseph Lynch's avatar
Joseph Lynch committed
208
209

```
Josh Snyder's avatar
Josh Snyder committed
210
java -agentpath:/path/to/libjvmquake.so=<options> <your java program here>
Joseph Lynch's avatar
Joseph Lynch committed
211
```
Josh Snyder's avatar
Josh Snyder committed
212

Joseph Lynch's avatar
Joseph Lynch committed
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
Some examples:

If you want to cause a java level `OOM` when the program exceeds 30 seconds of
deficit where running time is equally weighted to gc time:
```
java -agentpath:/path/to/libjvmquake.so=30,1,0 <your java program here>
```

If you want to trigger an OS **core dump** and then die when the program
exceeds 30 seconds of deficit where running time is 5:1 weighted to gc time:
```
java -agentpath:/path/to/libjvmquake.so=30,1,6 <your java program here>
```

If you want to trigger a `SIGKILL` immediately without any form of diagnostics:
```
java -agentpath:/path/to/libjvmquake.so=30,1,9 <your java program here>
```

If you want to trigger a `SIGTERM` without any form of diagnostics:
```
java -agentpath:/path/to/libjvmquake.so=30,1,15 <your java program here>
```

If you want to cause a java level `OOM` when the program exceeds 60 seconds of
deficit where running time is 10:1 weighted to gc time:
```
java -agentpath:/path/to/libjvmquake.so=60,10,0 <your java program here>
```
242

243
244
245
246
247
248
249
250
If you want to trigger a `SIGKILL` immediately after a 30s GC deficit accrues
and touch `/tmp/jvmquake` after _any_ 1s GC pause or more (presumably to inform
a watching process to fire off some kind of profiler or other diagnostics.

```
java -agentpath:/path/to/libjvmquake.so=30,1,9,warn=1,touch=/tmp/jvmquake <your java program here>
```

251
252
# Testing
`jvmquake` comes with a test suite of OOM conditions (running out of memory,
253
threads, gcing too much, etc) which you can run if you have a `jdk`, `tox` and
254
255
256
257
258
`python3` available:

```bash
# Run the test suite which uses tox, pytest, and plumbum under the hood
# to run jvmquake through numerous difficult failure modes
259
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ make test
260
261
```

262
263
264
If you have docker you can also run specific environment tests that bundle all
dependencies for a platform into a single dockerized build:

265
```bash
266
267
# Run the Ubuntu bionic openjdk8 test suite via Docker
make test_bionic_openjdk8
268
269
```

270
There is also a test suite in `tests/test_java_opts.py` which shows that
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
the standard JVM options do not work to remediate the situations `jvmquake`
handles.

## Automated Tests

We currently [test](.travis.yml) every commit and the released `.so` generated
with OpenJDK 8 against the following platforms:

* Ubuntu Xenial with OpenJDK8
* Ubuntu Bionic with OpenJDK8
* Ubuntu Bionic with OpenJDK11
* Ubuntu Bionic with Zulu8
* Ubuntu Bionic with Zulu11
* Ubuntu Focal with OpenJDK8
* Ubuntu Focal with OpenJDK11
* Centos7 with OpenJDK8

[![Build Status](https://travis-ci.org/Netflix-Skunkworks/jvmquake.svg?branch=master)](https://travis-ci.org/Netflix-Skunkworks/jvmquake)