« Back to Blog

Keeping a Pet in a Cattle Yard - Spot Instances

By Jason Kölker
Aug 7, 2017

GPU Research Instances

AWS GPU instances are a great way to perform data manipulation and tinker with machine learning without the expense of a dedicated research machine for the casual data scientist. While the minimum instance type (p2.xlarge) is nothing to sneeze at, its larger brethren (the g3 instances) host a step above in terms of CPU and memory. However the p2.xlarge is $0.9/hour and the smallest g3 is $1.14/hour. If tinkering is the goal, the expenditure may give pause.

Spot Instances

Spot instances are a great way to save vs on-demand pricing, but come at the cost that the OS is not persistent. Data that must be transported between coding sessions needs to live on an external EBS volume.

For the lazy (or time constrained) researcher, the thought of spending 30-60 minutes every time they desire to play becomes cumbersome. Tools like Ansible and Terraform can automate the initial provisioning; however, a spot instance may be terminated at any time if the current price exceeds its bid price. An alternative is to use a custom ami, but every spot instance will start back from where that ami was created.

Keeping a pet in a cattle yard

Using a clever hack, we can turn a spot instance into a persistent instance. The flow is.

  1. Create an on-demand (or spot if brave and the bid is high enough) with a large EBS root volume.
  2. Install and configure it as you like.
  3. Terminate the instance but keep the EBS volume.
  4. Boot a spot instance with the same (or similar enough) ami.
  5. Attach the EBS volume to the instance
  6. Modify the grub config to boot the kernel with the root= param set to the other volume.
  7. Reboot and ssh back into the server.

For this example, the latest Fedora 26 ami was used.

Setting up the persistent volume

After setting up an instance with all the required research packages and perhaps setting up the environment(s) like Jupyter, a custom script needs to be installed to run on shutdown. This is due to cloud-init writing the mac address to the /etc/sysconfig/network-scripts/ifcfg-* files.When this image is booted up on another instance, the mac address will of course be different. A static [ENI][8] could be carried around from instance to instance, but it is easier to just delete the files.

/usr/bin/clear-network

/etc/sysconfig/systemd/clear-network.service

Place the files as specified then run

chmod +x /usr/bin/clear-network
systemctl start clear-network.service
systemctl enable clear-network.service

Terminate the instance making sure to preserve the EBS root volume.

Automating the Petification

A script can get written to automate the launching of a spot instance with a user-data script that will transform the spot instance to boot from the saved volume.

$HOME/bin/spotter

This script requires that elements needed (VPC, security group, volume, instance profile, etc) be name and or tagged. Check out the script for details. The iam polices can be found below

aws-ec2-spot-fleet-role

instance-profile