One constant source of amazement for me is people not using ssh keys / using passwords with ssh.
Especially at a BigCo, where there are different environments, with different passwords, and password expiry/rotation/complexity rules.
Like, when asking for help, or working together... you say to them "ok, lets ssh to devfoo1234", and they do it, and then type in their password, and maybe get it wrong, then need to reset it, or whatever... and it takes half a minute or more for them to just ssh to some host. Maybe there are several hosts involved, and it all multiplies out.
I mention to them "you know... i never use ssh passwords, i don't actually know my devfoo1234 password... maybe you should google for ssh-keygen, set it up, let me know if you have any problems?" and they're like "oh yeah, thats cool. i should do that sometime later!".... and then they never do, and they are forever messing with passwords.
Since a few years now I only ever use SSH private keys safely hidden behind a HSM with a tinier than tiny attack surface: Yubikeys do it for me (but other vendors would work too). My SSH keys do not have a password but when I log in using SSH, it requires me to physically touch my Yubikey (well one of my Yubikeys).
Windows has great support, surprisingly, for TPM-backed sk keys using Windows Hello and OpenSSH. Protected with physical presence and anti-hammering at the hardware level, and easy to setup by just selecting a sk type key.
I only use password keys for things that need to be scripted.
Keys is great for individual use, or for company use if you have centralized key control (and issue one or more keys per user).
Often you either end up with one "dev ssh key" for all machines (which is bad) or you end up with people sharing around keys and unidentified keys on machines.
Passwords at least are "simple" for people to work with.
> Often you either end up with one "dev ssh key" for all machines (which is bad) or you end up with people sharing around keys and unidentified keys on machines
That hasn't been my experience at all. I've never encountered ssh key sharing in any environment, that would be insane.
We enforced different ssh keys per environment at my previous company: fingerprint of your key would get logged, and if the SIEM detected a reuse of keys across environments (dev, test, prod, etc) you'd get a stern talking to.
Unregulated/decentralized SSH key usage (i.e. allowing ssh-copy-id) is a dream for hackers to move laterally through a network. That's why many orgs disable it, and otherwise haven't invested resources in getting a proper centralized CA/authz server set up.
Every couple of months someone re-discovers SSH certificates, and blogs about them.
I'm guilty of it too. My blog post from 15 years ago is nowhere near as good as OP's post, but if I though me of 15 years ago lived up to my standards of today, I'd be really disappointed: https://blog.habets.se/2011/07/OpenSSH-certificates.html
I think the scary reality is most people conflate "keys" and "certificates". I have worked with security engineers that I need to remind that we do not use SSH certs, but rather key auth, and they have to think it through to make it click.
I'm consistently amazed how many developers and security professionals don't have a clear understanding how PPK even works conceptually.
Things like deploying dev keys to various production environments, instead of generating/registering them within said environment.
One of the worst recent security examples... You can't get this data over HTTPS from $OtherAgency, it's "not secure" ... then their suggestion is a "secure" read-only account to the other agency's SQL server (which uses the same TLS 1.3 as HTTPS). This is from person in charge of digital security for a government org.
In the example, it wasn't even that complex... I have used patterns to register allowed signer keys based on environment variables that an application runs under, initializing at startup... so "register" just meant assigning the correct values for 2-4 environment variables per public signer allowed... and removing the dev signer. (JWT based auth)
One key technological cause is that PKCS#12 standardizes a format (you've most likely seen it as .PFX files) in which a certificate and its associated private key are bundled. This is in an effort to simplify the software...
So you get a situation where the lay person is given a "certificate" but it's not really just the certificate it's a PFX file and so e.g. no they mustn't show it you, it has their private key inside it and so you will learn that key and if you're honest you've just ruined their day because they need to start over...
I would say in my career I've had at least two occasions where I did that and I felt awful for the person, because I had set out to help them but now things are worse, and I've had a good number of later occasions where I spent a lot more of their time and mine because I knew I need to be very sure whether their "certificate" is actually a certificate (which they can show me, e.g. Teams message me the file) or a PFX file (thus it has their private key) and I must caution them to show nobody the file yet also try to assist them.
Another useful feature of SSH certificates is that you can sign a user’s public key to grant them access to a remote machine for a limited time and as a specific remote user.
The capacity to grant access as a specific remote user is present without certs as well right? The typical authorized_keys file lives under a user directory and grants access only to that user.
Certs may still be the right approach, but OpenSSH also supports an AuthorizedKeysCommand which could be a secure HTTPS request to a central server to pull down a dynamically generated authorized_keys file content for the particular user and host.
If your endpoints can securely and reliably reach a central server, this gives you maximum control (your authorized_keys HTTPS server can have any custom business logic you want) without having to deal with certs/CAs.
Exactly. This is really useful in larger organizations where you may want more complex rules on access. For example, you can easily build "break glass" or 2nd party approved access on demand. You can put whatever logic you need in a CA front-end.
You can also make all the certs short-lived (and only store them in ram).
And when your or someone else's infra down to such a degree that you need SSH access, you do not want to depend on being able to touch that machine first. The same is true with custom AuthorizedKeysCommands that phone home.
I've known SSH certs for a while but never went through the effort of migrating away from keys. I'm very frustrated about manually managing my SSH keys across my different servers and devices though.
I assume you gathered a lot of thoughts over these 15 years.
A big problem I have with ssh carts is that they are not universally supported. For me, there is always some device or daemon (for example tinyssh in the initramfs of my gaming pc so that I can unlock it remotely) that only works with “plain old ssh keys”. And if I have to distribute and sync my keys onto a few hosts anyway, it takes away the benefits.
Adding to this: while certs are indeed well-supported by OpenSSH, it's not always the SSH daemon used on alternate or embedded platforms.
For example, OpenWRT used Dropbear [1] instead, which does not support certs. Also, Java programs that implement SSH stuff, like Jenkins, may be doing so using Apache Mina [2] which, though the underlying library supports certs, it is buggy [3] and requires the application to add the UX to also support it.
You can just replace dropbear with openssh on OpenWRT. That was one of the first things I did, since DropBear also doesn't support hardware backed (sk) keys. Just move it to 2222 and disable the service.
I reenabled DB on that alt port when I did the recent major update, just in case, but it wasn't necessary. After the upgrade, OpenSSH was alive and ready.
Might actually be a positive instead of a negative. Gaming use-cases should have not any effect on security policies, these should be as separate as possible, different auth mechanisms for your gaming stuff and your professional stuff ensures nothing gets mixed.
Hah? It being my gaming machine has nothing to do with the problem. It’s also my FPGA development machine, though it gets used less for that. It only happens to be the only Linux workstation in my home (the others are Macs or OpenBSD).
If your use case is such that you are frustrated about managing keys, host or user keys, then yes it does sound like SSH certs would help you. E.g. when you have many users, servers, or high enough cartesian product of the two.
In environment where they don't cause frustration they're not worth it.
Not really more to it than that, from my point of view.
I am keeping an eye on the new (and alpha) Authentik agent which will allow idp based ssh logins. There's also SSSD already supported but it requires glibc (due to needing NSS) meaning it's not available on Alpine.
It depends on what you want to do. CA certs are easy to manage, you just put the CA key instead of the SSH public key in authorized_keys.
They also provide a way to get hardware-backed security without messing with SSH agent forwarding and crappy USB security devices. You can use an HSM to issue a temporary certificate for your (possibly temporary) public key and use it as normal. The certificate can be valid for just 1 hour, enough to not worry about it leaking.
Yes. Caveat: It might not really be worth it if all your infrastructure is managed by these newfangled infrastructure-as-code-things that are quick to roll out (OpenShift/OKD, Talos, etc.) and you have only one repo to change SSH keys (single cluster or single repo for all clusters).
There are some serious security benefits for larger organizations but it does not sound as if you are part of one.
Despite the drawbacks of its grassroot nature TOFU goes a looooong way.
With my own machines I can just physically check that the server host key matches what the ssh client sees. Once TOFU looks good I'm all set with that host because I don't change any of the keys ever.
In a no-frills corporate unix environment it's enough to have a list of the internal servers' public keys listed on an internal website, accessible via SSL so it's effectively signed by a known corporate identity. You only need to check this list once to validate carrying out TOFU after which you can trust future connections.
In settings with huge fleet of machines or in a very dynamic environment where new machines are rolled out all the time it probably makes things easier to use certificates. Of course, certificates come with some extra work and some extra features so the amount of benefit depend on the case. But at this scale TOFU is breaking down bad on multiple levels so you can't afford a strong opinion against certificates, really.
I wish web browsers could remember server TLS host keys easily too and at least notify me whenever they change even if they'd still accept the new keys via ~trusted CAs.
I work in a corporate setting and the money and time we wasted because of Zscaler and its SSL inspection [1] is beyond your wildest imagination. Whenever I see a "SSL certificate problem: self-signed certificate in certificate chain" error, I know I'm in trouble.
The author lists all the advantes of CA certificates, yet doesn't list the disadvantages. OTOH, all the many steps required to set it up make the disadvantages rather obvious.
Also, I've never had a security issue due to TOFU, have you?
In our dev/stg environment we reinstall half our machines every morning (largely to test our machine setup automation), and SSH host certificates make that so much nicer than having to persist host keys or remove/replace them in known_hosts. Highly recommended.
SSH certs quietly hurt in prod. Short-lived creds + centralized CA just moves complexity upward without solving the core problem: user management.
The system shifts from many small local states to one highly coupled control point. That control point has to be correct and reachable all the time. When it isn’t, failures go wide instead of narrow.
Example: a few boxes get popped and start hammering the CA. Now what? Access is broken everywhere at once.
Common friction points:
1. your signer that has to be up and correct all the time
2. trust roots everywhere (and drifting)
3. TTL tuning nonsense (too short = random lockouts, too long = what was the point)
4. limited on-box state makes debugging harder than it should be
5. failures tend to fan out instead of staying contained
Revocation is also kind of a lie. Just waiting for expiry and hoping that’s good enough.
What actually happens is people reintroduce state anyway: sidecars, caches, agents… because you need it.
We went the opposite direction:
1. nodes pull over outbound HTTPS
2. local authorized_keys is the source of truth locally
3. users/roles are visible on the box
4. drift fixes itself quickly
5. no inbound ports, no CA signatures (WELL, not strictly true*!)
You still get central control, but operation and failure modes are local instead of "everyone is locked out right now."
That’s basically what we do at Userify (https://userify.com). Less elegant than certs, more survivable at 2am. Also actually handles authz, not just part of authn.
And the part that usually gets hand-waved with SSH CAs:
1. creating the user account
2. managing sudo roles
3. deciding what happens to home directories on removal
4. cleanup vs retention for compliance/forensics
Those don’t go away - they're just not part of the certificate solution.
* (TLS still exists here, just at the transport layer using the system trust store. That channel delivers users, keys, and roles. The rest is handled explicitly instead of implied.)
With the recent wave of npm hacks stealing private keys, I wanted to limit key's lifetimes.
I've set up a couple of yubikeys as SSH CAs on hosts I manage. I use them to create short lived certs (say 24h) at the start of the day. This way i only have to enter the yubikey pin once a day.
I could not find an easy way to limit maximum certificate lifetime in openssh, except for using the AuthorizedPrincipalCommand, which feels very fragile.
Does anyone else have any experience with a similar setup? How do you limit cert max lifetime?
The experience might be better right up until you're running it in prod and someone happens to ask about:
Cert revocation (or even expiration)
Sudo roles
User removal and process termination
Is the cert server HA and locked down
How you log in when the cert server is down or under attack (rich target!)
How to easily add Alice to server group A, Bob to B, and Carlos to both A and B, and then to remove them..
(disclaimer we're celebrating our 15th anniversary at https://Userify.com, but those are actually legit concerns and not only a sales pitch. You certainly can build a solid and secure ssh cert infra, but doing it in production is just not an easy set-it-and-forget-it sort of thing.)
We added SSH certificates support to pico.sh [1] and it's been great. Utilizing principals gave us the ability to implement a RBAC like system for specific parts of the pico.sh ecosystem. Users get the flexibility they want with limited complexity.
>then I don’t need to type the target user’s password; instead I enter the key’s passphrase, a hopefully much more complicated combination of words, to unlock the private key.
This sentence is a bit of a red flag, it looks like the author is making a (subtle) mistake in the category of too much security, or at least misjudging the amount of security (objectively measurable entropy) needed. This is of course a less consequential error than too little entropy/security measures, but still if one wants to be a cybersecurity professional, especially one with influence, they must know exactly the right amount needed, because our resources are limited, and each additional bit of entropy and security step not only costs time of the admin that implements it, but of the users that have to follow it, and this can even impact security itself by fatiguing the user and causing them to circumvent measures or ignore alerts.
On to specifically what's wrong:
Either a key file or a password can be used to log in to a server or authenticate to any service in general. Besides the technical implementation, the main difference is whether the secret is stored on the device, or in the user's brain. One is not more correct than the other, there's a lot of tradeoffs, one can ensure more bits and is more ergonomic, the other is not stored on device so it cannot be compromised that way.
That said a 2FA approach, in whatever format, is (generally speaking) safer than any individual method, in that the two secrets are necessary to be granted access. In this scenario one needs both the file and the password to authenticate, even if the password is 4 digits long, that increases the security of the system when compared to no password. An attacker would have to setup a brute force attempt along with a way to verify the decryption was successful. If local decryption confirmation is not possible, then such a brute force attack would require to submit erroneous logins to the server potentially activating tripwires or alerting a monitoring admin.
There's nothing special about the second factor authorization being equal or equivalent in entropy to the first, and there's especially no requirement that a password have more entropy when it's a second authorization, in fact it's the other way around.
tl;dr You can consider each security mechanism in the wider context rather than in isolation and you will see security fatigue go down without compromising security.
shameless plug: I've been tinkering with a tool to make SSH certificate-based login a bit easier. it's called Sshifu.
basically, you set up a sshifu-server that acts as a certificate authority + SSO server. then on your SSH servers, you configure them to trust this CA (there are helper npx commands / bash scripts to make this easy).
after that, for each user who wants access, they just run:
npx sshifu
this starts the SSO login flow, sets up the CA public key if needed, and immediately opens an SSH session.
npx is just the easiest way to get started, there are other install options too.
I built this as a smaller alternative to Smallstep / Teleport. it's still very early and mostly vibe-coded, but it's already scratching my own itch.
128 comments
Especially at a BigCo, where there are different environments, with different passwords, and password expiry/rotation/complexity rules.
Like, when asking for help, or working together... you say to them "ok, lets ssh to devfoo1234", and they do it, and then type in their password, and maybe get it wrong, then need to reset it, or whatever... and it takes half a minute or more for them to just ssh to some host. Maybe there are several hosts involved, and it all multiplies out.
I mention to them "you know... i never use ssh passwords, i don't actually know my devfoo1234 password... maybe you should google for ssh-keygen, set it up, let me know if you have any problems?" and they're like "oh yeah, thats cool. i should do that sometime later!".... and then they never do, and they are forever messing with passwords.
I just don't get it.
I only use password keys for things that need to be scripted.
Often you either end up with one "dev ssh key" for all machines (which is bad) or you end up with people sharing around keys and unidentified keys on machines.
Passwords at least are "simple" for people to work with.
> Often you either end up with one "dev ssh key" for all machines (which is bad) or you end up with people sharing around keys and unidentified keys on machines
That hasn't been my experience at all. I've never encountered ssh key sharing in any environment, that would be insane.
We enforced different ssh keys per environment at my previous company: fingerprint of your key would get logged, and if the SIEM detected a reuse of keys across environments (dev, test, prod, etc) you'd get a stern talking to.
> Often you either end up with one "dev ssh key" for all machines (which is bad)
Or, conversely: with one "dev password" for all machines.
I'm guilty of it too. My blog post from 15 years ago is nowhere near as good as OP's post, but if I though me of 15 years ago lived up to my standards of today, I'd be really disappointed: https://blog.habets.se/2011/07/OpenSSH-certificates.html
Things like deploying dev keys to various production environments, instead of generating/registering them within said environment.
One of the worst recent security examples... You can't get this data over HTTPS from $OtherAgency, it's "not secure" ... then their suggestion is a "secure" read-only account to the other agency's SQL server (which uses the same TLS 1.3 as HTTPS). This is from person in charge of digital security for a government org.
> Things like deploying dev keys to various production environments, instead of generating/registering them within said environment.
I can see this happening when a developer is authorized to generate, but not to register. So, they just reuse an already-registered one.
So you get a situation where the lay person is given a "certificate" but it's not really just the certificate it's a PFX file and so e.g. no they mustn't show it you, it has their private key inside it and so you will learn that key and if you're honest you've just ruined their day because they need to start over...
I would say in my career I've had at least two occasions where I did that and I felt awful for the person, because I had set out to help them but now things are worse, and I've had a good number of later occasions where I spent a lot more of their time and mine because I knew I need to be very sure whether their "certificate" is actually a certificate (which they can show me, e.g. Teams message me the file) or a PFX file (thus it has their private key) and I must caution them to show nobody the file yet also try to assist them.
If your endpoints can securely and reliably reach a central server, this gives you maximum control (your authorized_keys HTTPS server can have any custom business logic you want) without having to deal with certs/CAs.
You can also make all the certs short-lived (and only store them in ram).
What I've done is generate a cert for the host(s) the user needs, for the time-span they need (subject to authorization logic).
I assume you gathered a lot of thoughts over these 15 years.
Should I invest in making the switch?
For example, OpenWRT used Dropbear [1] instead, which does not support certs. Also, Java programs that implement SSH stuff, like Jenkins, may be doing so using Apache Mina [2] which, though the underlying library supports certs, it is buggy [3] and requires the application to add the UX to also support it.
[1] https://matt.ucc.asn.au/dropbear/dropbear.html
[2] https://mina.apache.org/sshd-project/
[3] I've been dealing for years with NullPointerExceptions causing the connection to crash when presented with certain ed25519 certificates.
I reenabled DB on that alt port when I did the recent major update, just in case, but it wasn't necessary. After the upgrade, OpenSSH was alive and ready.
In environment where they don't cause frustration they're not worth it.
Not really more to it than that, from my point of view.
The workflows SSH CA's are extremely janky and insecure.
With some creative use of
AuthorizedKeysCommandyou can make SSH key rotation painless and secure.With SSH certificates you have to go back to the "keys to the kingdom" antipattern and just hope for the best.
They also provide a way to get hardware-backed security without messing with SSH agent forwarding and crappy USB security devices. You can use an HSM to issue a temporary certificate for your (possibly temporary) public key and use it as normal. The certificate can be valid for just 1 hour, enough to not worry about it leaking.
There are some serious security benefits for larger organizations but it does not sound as if you are part of one.
Thank for writing it!
With my own machines I can just physically check that the server host key matches what the ssh client sees. Once TOFU looks good I'm all set with that host because I don't change any of the keys ever.
In a no-frills corporate unix environment it's enough to have a list of the internal servers' public keys listed on an internal website, accessible via SSL so it's effectively signed by a known corporate identity. You only need to check this list once to validate carrying out TOFU after which you can trust future connections.
In settings with huge fleet of machines or in a very dynamic environment where new machines are rolled out all the time it probably makes things easier to use certificates. Of course, certificates come with some extra work and some extra features so the amount of benefit depend on the case. But at this scale TOFU is breaking down bad on multiple levels so you can't afford a strong opinion against certificates, really.
I wish web browsers could remember server TLS host keys easily too and at least notify me whenever they change even if they'd still accept the new keys via ~trusted CAs.
[1] https://www.zscaler.com/resources/security-terms-glossary/wh...
Also, I've never had a security issue due to TOFU, have you?
There really needs to be a definitive best practices guide published by a trusted authority.
The system shifts from many small local states to one highly coupled control point. That control point has to be correct and reachable all the time. When it isn’t, failures go wide instead of narrow.
Example: a few boxes get popped and start hammering the CA. Now what? Access is broken everywhere at once.
Common friction points:
Revocation is also kind of a lie. Just waiting for expiry and hoping that’s good enough.What actually happens is people reintroduce state anyway: sidecars, caches, agents… because you need it.
We went the opposite direction:
You still get central control, but operation and failure modes are local instead of "everyone is locked out right now."That’s basically what we do at Userify (https://userify.com). Less elegant than certs, more survivable at 2am. Also actually handles authz, not just part of authn.
And the part that usually gets hand-waved with SSH CAs:
Those don’t go away - they're just not part of the certificate solution.* (TLS still exists here, just at the transport layer using the system trust store. That channel delivers users, keys, and roles. The rest is handled explicitly instead of implied.)
I've set up a couple of yubikeys as SSH CAs on hosts I manage. I use them to create short lived certs (say 24h) at the start of the day. This way i only have to enter the yubikey pin once a day.
I could not find an easy way to limit maximum certificate lifetime in openssh, except for using the AuthorizedPrincipalCommand, which feels very fragile.
Does anyone else have any experience with a similar setup? How do you limit cert max lifetime?
Openssh supports checking the DNSSEC signature in the client, in theory, but it's a configure option and I'm not sure if distros build with it.
[1] https://pico.sh/access-control#ssh-certificates
>then I don’t need to type the target user’s password; instead I enter the key’s passphrase, a hopefully much more complicated combination of words, to unlock the private key.
This sentence is a bit of a red flag, it looks like the author is making a (subtle) mistake in the category of too much security, or at least misjudging the amount of security (objectively measurable entropy) needed. This is of course a less consequential error than too little entropy/security measures, but still if one wants to be a cybersecurity professional, especially one with influence, they must know exactly the right amount needed, because our resources are limited, and each additional bit of entropy and security step not only costs time of the admin that implements it, but of the users that have to follow it, and this can even impact security itself by fatiguing the user and causing them to circumvent measures or ignore alerts.
On to specifically what's wrong:
Either a key file or a password can be used to log in to a server or authenticate to any service in general. Besides the technical implementation, the main difference is whether the secret is stored on the device, or in the user's brain. One is not more correct than the other, there's a lot of tradeoffs, one can ensure more bits and is more ergonomic, the other is not stored on device so it cannot be compromised that way.
That said a 2FA approach, in whatever format, is (generally speaking) safer than any individual method, in that the two secrets are necessary to be granted access. In this scenario one needs both the file and the password to authenticate, even if the password is 4 digits long, that increases the security of the system when compared to no password. An attacker would have to setup a brute force attempt along with a way to verify the decryption was successful. If local decryption confirmation is not possible, then such a brute force attack would require to submit erroneous logins to the server potentially activating tripwires or alerting a monitoring admin.
There's nothing special about the second factor authorization being equal or equivalent in entropy to the first, and there's especially no requirement that a password have more entropy when it's a second authorization, in fact it's the other way around.
tl;dr You can consider each security mechanism in the wider context rather than in isolation and you will see security fatigue go down without compromising security.
basically, you set up a sshifu-server that acts as a certificate authority + SSO server. then on your SSH servers, you configure them to trust this CA (there are helper npx commands / bash scripts to make this easy).
after that, for each user who wants access, they just run:
npx sshifu
this starts the SSO login flow, sets up the CA public key if needed, and immediately opens an SSH session.
npx is just the easiest way to get started, there are other install options too.
I built this as a smaller alternative to Smallstep / Teleport. it's still very early and mostly vibe-coded, but it's already scratching my own itch.
would love to hear what you guys think
repo: github.com/azophy/sshifu