Google Compute Engine (GCE) is Google's VM service, similar to Amazon's EC2 for those more familiar with this ecosystem. One of neatest features of GCP (which might exist on EC2, I don't know) is Live Migration. This system, used by GCP sysadmins, allows Google to migrate a GCE workload from one physical host to another, without that VM ever noticing it. The process allows the GCP team to work on the underlying physical hosts (which may require a reboot) without causing the slightest bit of downtime to GCE users.

Albeit convenient, there are two scenarios in which that tool cannot be used:

The instance has a GPU attached. Migrating GPU memory and workloads from one device to the next is quite a challenging thing, and GPU vendors don't typically spend a lot of time allowing for such things.
The instance uses Confidential Computing, one of Google's latest toys. Using AMD's Secure Encrypted Virtualization, found on EPYC processors, GCE and GKE workloads can have their data (that is, RAM) encrypted. Again, the current generation of those processors do not offer the features required to allow migrating workloads from one node to the next without deciphering.

In those two cases, when a physical host needs to be operated on, all its VMs are terminated, moved, then restarted: downtime!

Maintenance events for GPU hosts

Since that kind of service interruption is far from convenient (and a big red flag for many customers), GCP made an effort to make those events as manageable as possible. In scenario one (GPU attached), you can be warned in advance that such a shutdown is about to happen. This can be achieved using the GCE metadata server: a special HTTP network endpoint through which VMs can interact with the broader GCP environment. Among the many things you can do with it, the endpoint allows use to receive live migration notices.

When the coast is clear, the notification endpoint will return a simple NONE reply:

$ curl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event -H "Metadata-Flavor: Google"
NONE

This means no operation is planned. If you query the endpoint within the hour before the maintenance time slot, you will instead get:

$ curl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event -H "Metadata-Flavor: Google"
TERMINATE_ON_HOST_MAINTENANCE

More convenient perhaps, there is a way to wait for those events and not poll:

$ curl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event?wait_for_change=true -H "Metadata-Flavor: Google"

There, curl will hang until the maintenance-event key changes. When that happens, curl will return with TERMINATE_ON_HOST_MAINTENANCE. Now all you have to do is put that in a systemd service and script whatever behaviour you want your VM to have before it is shut down: sync whatever you're doing to disk, notify another workload to take over, and so on.

A note on long-running curls: using ETags

Actually, if you use the above curl you may also end up with a timeout. To avoid long-running HTTP requests like those, GCP have implemented a workaround using ETags. Instead of querying for an indeterminate amount of time, you instead set a timeout yourself, say, 10 minutes:

$ curl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event?wait_for_change=true?timeout_sec=600 -H "Metadata-Flavor: Google"

Then if you receive NONE you simply loop and wait once again. To avoid missing notifications in between calls, a tagging system allows you to ensure the second curl takes over where the first one left off. In the HTTP response you get from the first call, you will find an ETag header:

$ curl -v http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event?wait_for_change=true?timeout_sec=600 -H "Metadata-Flavor: Google"
[...]
ETag: 411261ca6c9e654e
[...]
NONE

If you pass it to the second curl as a query parameter, you let the metadata server know where you left off, and it will make sure to send any event you might have missed in between your calls:

$ curl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event?wait_for_change=true?timeout_sec=600&last_etag=411261ca6c9e654e -H "Metadata-Flavor: Google"

Confidential Computing

If you are in situation #2 then you've got a bit of bad luck: that notification system does not apply to your workload. If you query the endpoint and a maintenance event happens, you will receive the TERMINATE_ON_HOST_MAINTENANCE response at the exact same time as the power off signal to your VM. If that machine needs no time to shut down, everything will happen in under a second.

Fortunately, you can actually "force" GCP to give you a little bit of time using your OS. The actual shutdown sequence for a GCP VM is as follows:

GCP sends a regular shutdown signal to your VM, which Linux will perceive as the power key being pressed.
If the VM's OS has not powered off the workload in the next 60 seconds, GCP sends a rougher termination signal, akin to someone holding that power key down or pulling the plug. When that happens, brace yourself for some possibly corrupt filesystems (and lost data) at reboot. Note that GCP's documentation actually mentions 90 seconds as the default delay for standard instances (less for preemptible ones) but my own testing taught me to be a little bit more conservative here.

In other words, if 60 seconds are enough for you, then you can use that termination delay to react to a shutdown. The trick is give the OS something to do then, something that delays shutdown. For example, you can define a systemd service to run at shutdown with something like:

[Unit]
Description=My Shutdown Script

[Service]
Type=oneshot
RemainAfterExit=true
ExecStop=/path/to/script.sh
TimeoutStopSec=50s

[Install]
WantedBy=multi-user.target

Then, start and enable it:

$ systemctl daemon-reload
$ systemctl enable --now myunit.service

To avoid having the script violently interrupted by the forced power off, you can set TimeoutStopSec to have systemd send you SIGTERM when things are getting a little hot (here, 10s before GCP pulls the plug). Then you can write an emergency signal handler that can do some emergency magic in the remaining 10s.

If systemd is not your thing, another option is to use a GCE shutdown script. This might be more or less convenient depending on how you manage your infrastructure and configurations.

Reacting to GCP maintenance events only

One drawback of the above solution is that you will trigger your script every time the VM shuts down. That includes when you shut it down yourself, voluntarily. In that case, you may not want to trigger your emergency script, since you've probably already taken precautions before shutting your machine down.

Again, you can use the GCE metadata server to check what caused the power-off. If GCP did it, you will see:

$ curl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event -H "Metadata-Flavor: Google"
TERMINATE_ON_HOST_MAINTENANCE

Include that curl call at the beginning of your script and you'll be able to exit it early if that shutdown is on you.

Google Compute Engine: handing GCP maintenance events without Live Migration

jjpk.me