Permission denied issues with AWS instances

Quick facts: the issue is caused by an unexpected change in the EC2 system, there is no solution or workaround yet but we are working on it.

In the last week a number of people reported an issue with newly created EC2 instances of VyOS where they could not login to their newly created instance. At first we thought it may be an intermittent fault in the AWS since the AMI has not changes and we could not reproduce the problem ourselves, but the number of reports grew quickly, and our own test instances started showing the problem as well.

Since EC2 instances don't provide any console access, it took us a bit of time to debug. By juggling EBS volumes we finally managed to boot an affected instance with an disk image modified to include our own SSH keys.

The root cause is in our script that checks if the machine is running in EC2. We wanted to produce the AMI from an unmodified image, which required inclusion of the script that checks if the environment is EC2. Executing a script that obtains an SSH key from a remote (even if link-local) address is a security risk since in a less controlled environment an attacker could setup a server that could inject their keys to all VyOS systems.

The key observation was that in EC2, both system-uuid and system-serial-number fields in the DMI data always start with "EC2". We thought this is a good enough condition, and for the few years we've been providing AMIs, it indeed was.

However, Amazon changed it without warning and now the system-uuid may not start with EC2 (serial numbers still do), and VyOS instances stopped executing their key fetching script.

We are working on the 1.1.8 release now, but it will go through an RC phase, while the solution to the AWS issue is needed right now. We'll contact Amazon support to see what are the options, stay tuned.

9 responses
I worked on an AMI for 1.1.7 in the past ( and now work for Amazon. Feel free to let me know if you get stuck and I'll see if I can help.
Amos, do you happen to know why that change to the system UUID format was made?
So far I didn't find internal or external documentation that says that there was a change. e.g. the following still states that it should start with "ec2", but it warns that there is a non-zero chance that other systems would match this too and suggesting using the Instance Identity Document for a more reliable check: Could you provide more details (region, sample system uuid's) so I can try to ask internally?
Checking the instance identity data signature against the cert sounds like the best way to go. I _think_ at the time the VyOS AMI was first introduced it wasn't there yet (or we somehow missed it). I agree, that's why we used to check both system UUID and system serial number as having them both start with EC2 by accident would be an incredibly unlikely event (or if the attacker could do that on the hypercisor, they wouldn't even need to bother with injecting keys over the network). But, it's the system UUID that, contrary to the Amazon docs, no longer starts with EC2. It seems to have started in the eu-west1 (Ireland) region and then spread to other regions. I've just made an instance in the us-east1 region and got this: vyos@VyOS-AMI:~$ sudo dmidecode --string system-uuid 08732BEC-1E1A-B9B3-CCB5-FABA246C758A vyos@VyOS-AMI:~$ sudo dmidecode --string system-serial-number ec2b7308-1a1e-b3b9-ccb5-faba246c758a Earlier I've tested it with a few more instancesin eu-west1 and us-east1 as well and none of the UUIDs started with EC2. Also, when we've got first bug reports on October the 12th I believe, initially we couldn't reproduce it ourselves and believed it to be an intermittent fault, we could only replicate the issue a few days later.
Thanks for the details. I asked on the AWS forums here I'll also try to ask around internally.
In case anyone stumbles upon this issue before 1.1.8, the 1.1.7 vyos-cloudinit AMI continues to work and can successfully connect via SSH. The AMI IDs are listed here:
3 visitors upvoted this post.