AWS GPU Instances: Turning Spot Instances Into Persistent Instances
GPU Research Instances
AWS GPU instances are a great way to perform data manipulation and tinker with machine learning without the expense of a dedicated research machine for the casual data scientist. While the minimum instance type (p2.xlarge) is nothing to sneeze at, its larger brethren (the g3 instances) host a step above in terms of CPU and memory. However the p2.xlarge is $0.90/hour and the smallest g3 is $1.14/hour. If tinkering is the goal, the expenditure may give pause.
Spot instances are a great way to save compared to on-demand pricing, but come at the cost that the operating system is not persistent. Data that must be transported between coding sessions needs to live on an external EBS volume.
For the lazy (or time-constrained) researcher, the thought of spending 30-60 minutes every time they desire to play becomes cumbersome. Tools like Ansible and Terraform can automate the initial provisioning; however, a spot instance may be terminated at any time if the current price exceeds its bid price. An alternative is to use a custom AMI, but every spot instance will start back from where that AMI was created.
Keeping a pet in a cattle yard
Using a clever hack, we can turn a spot instance into a persistent instance. The flow is:
- Create an on-demand (or spot if you’re brave and the bid is high enough) with a large EBS root volume.
- Install and configure it as you like.
- Terminate the instance but keep the EBS volume.
- Boot a spot instance with the same (or similar enough) AMI.
- Attach the EBS volume to the instance
- Modify the grub config to boot the kernel with the root= param set to the other volume.
- Reboot and ssh back into the server.
For this example, the latest Fedora 26 AMI was used.
Setting up the persistent volume
After setting up an instance with all the required research packages and
perhaps setting up the environment(s) like Jupyter, a custom script
needs to be installed to run on shutdown. This is due to
writing the mac address to the
/etc/sysconfig/network-scripts/ifcfg-* files. When this image is booted up on another instance, the mac address will
of course be different. A static ENI could be carried around from
instance to instance, but it is easier to just delete the files.
Place the files as specified then run
chmod +x /usr/bin/clear-network systemctl start clear-network.service systemctl enable clear-network.service
Terminate the instance making sure to preserve the EBS root volume.
Automating the Petification
A script can get written to automate the launching of a spot instance with
user-data script that will transform the spot instance to boot from
the saved volume.
This script requires that elements needed (VPC, security group, volume, instance profile, etc.) be name and or tagged. Check out the script for details. The IAM polices can be found below: