The SSH Key Chaos Problem
Every engineering team starts the same way: someone generates an SSH key pair, copies the public key to a server, and gets to work. It's fast, it's simple, and it works. Then the team grows. Five engineers become fifty. One server becomes two hundred. And suddenly, those innocent key pairs have metastasized into a sprawling, untracked web of persistent access credentials scattered across your entire infrastructure.
This is SSH key sprawl, and it's one of the most underestimated security risks in modern infrastructure. Unlike passwords, SSH keys don't expire. Unlike API tokens, they're rarely rotated. Unlike SSO sessions, there's no centralized dashboard telling you which keys grant access to which servers. They just sit there — in ~/.ssh/authorized_keys files on hundreds of machines — quietly waiting.
The scale of the problem is staggering. Most organizations have no idea how many SSH keys exist in their environment, who created them, or what they unlock.
Consider a typical scenario: an engineer joins your company, generates a key pair, and distributes the public key to every server they need. Over the next two years, they accumulate access to production databases, CI/CD runners, internal tools, and staging environments. Then they leave the company. Their laptop is wiped, but their public keys remain on every server they ever touched. Those keys are now orphaned credentials — valid access paths that your offboarding process never cleaned up because nobody knew they existed.
Security Warning
A single orphaned SSH key on a production server is a persistent backdoor. Unlike expired passwords or revoked OAuth tokens, static SSH keys remain valid indefinitely unless someone manually removes them from every server where they were installed. Most organizations never do.
The root cause isn't carelessness — it's the architecture of SSH key-based authentication itself. It was designed for a world where one sysadmin managed a handful of servers. It was never designed for dynamic, elastic infrastructure operated by distributed teams shipping code dozens of times a day.
The Real Risks of Poor SSH Key Management
SSH key sprawl isn't a theoretical risk. It creates concrete, exploitable attack surfaces that modern adversaries actively target. Here are the categories of risk that unmanaged SSH keys introduce:
Lateral Movement
Once an attacker compromises a single server — through an application vulnerability, a misconfigured container, or a phishing attack on an engineer's workstation — they harvest the private keys stored on that machine. Because engineers routinely reuse the same key pair across multiple servers, a single compromised key can unlock dozens of systems. The attacker moves laterally through your infrastructure, escalating privileges at each hop, and your monitoring sees nothing unusual because the access looks like legitimate SSH traffic from an authorized key.
This is exactly how many of the most damaging breaches unfold. The initial compromise is small — a single endpoint. But SSH key reuse transforms it into full infrastructure access.
Orphaned Keys and Ghost Access
When employees leave, contractors finish engagements, or teams restructure, their SSH keys don't automatically disappear. Every authorized_keys file on every server retains those entries. In an organization with hundreds of servers, manually tracking down and removing departed users' keys is operationally infeasible. Most teams simply don't do it.
The result: former employees, former contractors, and former team members retain valid access credentials to production infrastructure for months or years after their departure. These orphaned keys are invisible to your identity provider, your HR system, and your access reviews.
Compliance and Audit Gaps
Frameworks like SOC 2, ISO 27001, PCI DSS, and HIPAA all require organizations to demonstrate control over access to sensitive systems. They mandate access reviews, entitlement documentation, and evidence that access is revoked when no longer needed. Static SSH keys fail every one of these requirements:
- No centralized inventory — You can't produce a list of who has access to what.
- No expiration — Keys are valid until manually removed, violating least-privilege principles.
- No session logging — You know a key was used, but not what the user did after authenticating.
- No approval workflow — There's no record of who authorized the access or why.
Auditors will ask you to prove that only authorized personnel have SSH access to production. If your answer involves grepping authorized_keys files on 200 servers, you've already failed the audit.
Shared and Unattributable Keys
In many teams, engineers share SSH keys for "convenience" — a shared deploy key for CI/CD, a root key stashed in a wiki, a service account key emailed to the team. These shared keys make it impossible to attribute actions to individual users. When something goes wrong in production, you can't determine who connected or what they did. You just see that a key was used.
Traditional Solutions and Their Limits
Organizations have tried several approaches to tame SSH key sprawl. Each addresses a piece of the problem, but none solve it completely.
Manual Key Rotation
The most basic approach: periodically generate new key pairs and replace old ones on every server. In practice, this means writing scripts that SSH into each machine, update authorized_keys, and verify the change didn't lock anyone out. It's fragile, error-prone, and doesn't scale. Most teams that attempt manual rotation either abandon it after the first cycle or rotate so infrequently (annually, if ever) that it provides minimal security benefit.
# Typical manual rotation — fragile at scale
for host in $(cat server_list.txt); do
ssh admin@$host "cat >> ~/.ssh/authorized_keys" < new_key.pub
ssh admin@$host "grep -v 'old_key_fingerprint' ~/.ssh/authorized_keys \
> /tmp/ak && mv /tmp/ak ~/.ssh/authorized_keys"
done
This approach also requires the person running the script to already have root or sudo access on every machine — which is itself a privileged access problem.
LDAP / Directory-Based SSH Authentication
Some organizations integrate OpenSSH with LDAP or Active Directory, storing SSH public keys as user attributes. This centralizes key storage and ties SSH access to directory group membership. When a user is disabled in the directory, their SSH access is revoked.
The downsides are significant: LDAP adds infrastructure complexity, creates a single point of failure for SSH access, and still relies on static keys that don't expire. If the LDAP server goes down, nobody can SSH into anything. And you still need to distribute the AuthorizedKeysCommand configuration to every server.
Bastion Hosts / Jump Servers
A bastion host acts as a single chokepoint: all SSH traffic flows through one hardened server. This simplifies network security (only one ingress point) and provides a central logging point.
But bastion hosts are themselves high-value targets. If compromised, the attacker gains a pivot point to your entire network. They also create bottlenecks, add latency, and don't solve the underlying key management problem — engineers still manage static keys on the bastion itself. They add a layer of indirection without fixing the root cause.
| Approach | Centralized | Keys Expire | Session Audit | Scales |
|---|---|---|---|---|
| Manual rotation | ||||
| LDAP integration | ||||
| Bastion host | ||||
| Certificate-based SSH |
Keyless SSH Explained: Certificate-Based Authentication
OpenSSH has natively supported certificate-based authentication since version 5.4, released in 2010. Despite being available for over fifteen years, most organizations still rely on raw key pairs. Certificate-based SSH fundamentally changes the authentication model by introducing a trusted Certificate Authority (CA) that issues short-lived, scoped credentials instead of static keys.
How It Works
Instead of distributing public keys to individual servers, you configure servers to trust a CA. When a user needs SSH access, they authenticate through an identity provider (SSO, OIDC, SAML), and the CA issues a signed SSH certificate. This certificate includes:
- Identity — The authenticated user's identity (username, email, groups)
- Principals — Which usernames on which servers the certificate authorizes
- Validity period — A short expiration window (minutes to hours, not years)
- Extensions — Optional restrictions like port forwarding, agent forwarding, or PTY access
- Critical options — Forced commands or source address restrictions
Servers validate the certificate against the CA's public key. If the signature is valid, the certificate hasn't expired, and the principals match, access is granted. No authorized_keys file needed. No key distribution. No orphaned credentials.
Why This Changes Everything
Access is ephemeral. Certificates expire automatically — typically in 8 to 24 hours. Even if an attacker steals a certificate, it becomes useless within hours. Contrast this with static SSH keys that remain valid for years or decades.
Access is identity-bound. Certificates encode the user's identity. Every SSH session is attributable to a specific person via their SSO login. Shared keys and anonymous access become impossible.
Revocation is instant. Disable a user in your identity provider, and they can no longer obtain new certificates. Existing certificates expire on their own. No need to hunt down keys on individual servers.
Onboarding is zero-touch. New engineers authenticate via SSO, receive a certificate, and can immediately access the servers their role permits. No ticket to IT. No waiting for someone to copy a key to a server.
Ready to eliminate SSH key sprawl?
OnePAM provides keyless, certificate-based SSH access with built-in session recording and just-in-time provisioning — no key management required.
Request a DemoStep-by-Step: Setting Up Secure, Keyless SSH
Whether you're building a certificate-based SSH setup from scratch or evaluating a managed solution, understanding the underlying mechanics is essential. Here's how to implement certificate-based SSH authentication using OpenSSH's native capabilities.
Step 1: Generate a Certificate Authority Key Pair
Create a dedicated CA key pair that will sign all user certificates. This key must be protected with extreme care — it's the root of trust for your entire SSH infrastructure. Store it offline or in a hardware security module (HSM).
# Generate the CA key pair (use a strong passphrase)
ssh-keygen -t ed25519 -f /etc/ssh/ca_user_key -C "SSH User CA"
# This creates:
# /etc/ssh/ca_user_key (private — guard with your life)
# /etc/ssh/ca_user_key.pub (public — distribute to servers)
Critical: Protect the CA Private Key
If the CA private key is compromised, an attacker can issue valid certificates for any user on any server. Store it in an HSM, a vault, or at minimum an air-gapped machine. Never store it on a server that's reachable from the internet.
Step 2: Configure Servers to Trust the CA
On every server that should accept certificate-based authentication, add the CA's public key to the SSH daemon configuration. This tells OpenSSH to trust any certificate signed by your CA.
# Copy the CA public key to each server
scp /etc/ssh/ca_user_key.pub target-server:/etc/ssh/
# Add to the server's sshd_config
echo "TrustedUserCAKeys /etc/ssh/ca_user_key.pub" >> /etc/ssh/sshd_config
# Optionally set an AuthorizedPrincipalsFile for fine-grained control
echo "AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u" >> /etc/ssh/sshd_config
# Restart sshd to apply
systemctl restart sshd
With configuration management tools like Ansible, Puppet, or Terraform, you can automate this across your entire fleet in minutes.
Step 3: Define Principals for Role-Based Access
Principals determine which users can access which accounts on each server. Create a principals file for each Unix user on each server:
# On the target server, create principals directory
mkdir -p /etc/ssh/auth_principals
# Allow the "webdev" and "sre" roles to log in as "deploy"
echo -e "webdev\nsre" > /etc/ssh/auth_principals/deploy
# Allow only "sre" and "dba" roles to log in as "root"
echo -e "sre\ndba" > /etc/ssh/auth_principals/root
# Allow all engineers to log in as "ubuntu" (default cloud user)
echo "engineers" > /etc/ssh/auth_principals/ubuntu
Step 4: Issue Short-Lived Certificates
When a user needs access, sign their public key with the CA. The critical parameter is -V, which sets the certificate's validity window. Keep it as short as operationally feasible.
# Sign a user's public key with an 8-hour validity window
ssh-keygen -s /etc/ssh/ca_user_key \
-I "alice@example.com-$(date +%s)" \
-n "sre,engineers" \
-V +8h \
-z $(date +%s) \
~/.ssh/id_ed25519.pub
# This creates ~/.ssh/id_ed25519-cert.pub
# The certificate contains:
# Identity: alice@example.com-1712851200
# Principals: sre, engineers
# Valid: from now to +8 hours
# Serial: Unix timestamp (for revocation tracking)
# Verify the certificate
ssh-keygen -L -f ~/.ssh/id_ed25519-cert.pub
The user can now SSH into any server that trusts the CA and has matching principals — no authorized_keys entry required. After 8 hours, the certificate becomes invalid automatically.
Step 5: Automate Certificate Issuance with SSO
Manual certificate signing doesn't scale. The real power comes from integrating certificate issuance with your identity provider. When a user authenticates via SSO (Okta, Google Workspace, Azure AD, etc.), a service automatically issues a signed certificate based on their identity and group memberships.
# Example: automated cert issuance API (pseudocode)
# User authenticates via OIDC, the backend issues a cert:
POST /api/ssh/certificate
Authorization: Bearer <oidc_token>
Response:
{
"certificate": "ssh-ed25519-cert-v01@openssh.com AAAA...",
"valid_until": "2026-04-13T20:00:00Z",
"principals": ["sre", "engineers"],
"identity": "alice@example.com"
}
This is the pattern that tools like OnePAM implement: SSO-authenticated, automated certificate issuance with short validity windows and full audit logging.
Step 6: Implement Certificate Revocation
Even with short-lived certificates, you need the ability to revoke access immediately (e.g., when an employee is terminated mid-day). OpenSSH supports a Key Revocation List (KRL):
# Create a revocation list from a certificate
ssh-keygen -k -f /etc/ssh/revoked_keys -s /etc/ssh/ca_user_key.pub \
~/.ssh/compromised_cert.pub
# Add to sshd_config
echo "RevokedKeys /etc/ssh/revoked_keys" >> /etc/ssh/sshd_config
systemctl restart sshd
# The KRL file can be distributed to all servers via
# configuration management or a pull-based sync
With short certificate lifetimes (8–24 hours), revocation is primarily a safety net. In most cases, you simply don't issue a new certificate, and the old one expires on its own.
How OnePAM Simplifies Secure SSH Access
The step-by-step setup above works, but it requires you to build and maintain significant infrastructure: a CA, an issuance service, SSO integration, principal management, revocation distribution, and logging. OnePAM handles all of this as a managed platform, eliminating the operational burden while providing capabilities that go beyond what DIY setups can achieve.
Browser-Based SSH Terminal
OnePAM provides a fully functional SSH terminal directly in the browser. Engineers authenticate via your existing identity provider, and OnePAM establishes the SSH connection using an auto-provisioned, short-lived certificate. No local SSH client configuration. No key pair management. No ~/.ssh/config files to maintain. Just open a browser, authenticate, and connect.
Auto-Provisioned Certificates
When a user initiates an SSH session through OnePAM, the platform automatically generates a short-lived certificate signed by a managed CA. The certificate is scoped to the specific server and user account being accessed, valid for the duration of the session only. After the session ends, the certificate expires. No credential persists beyond the immediate need.
Session Recording and Playback
Every SSH session through OnePAM is recorded — not just connection metadata, but the full terminal session. Administrators and auditors can replay any session to see exactly what commands were executed, what output was returned, and what files were modified. This provides the forensic capability that compliance frameworks demand and that incident response teams need.
Just-in-Time Access with Approval Workflows
OnePAM supports just-in-time (JIT) access models where engineers request access to specific servers for a defined time window. Requests can be auto-approved based on policy or routed to a manager for approval. Access is granted for exactly the requested duration and automatically revoked when the window closes. No standing privileges. No forgotten access.
Infrastructure Discovery and Inventory
OnePAM automatically discovers servers in your cloud environments (AWS, GCP, Azure) and maintains a real-time inventory of what exists and who has access. You always have a current, accurate answer to the question auditors love to ask: "Show me who has access to what."
The Result
With OnePAM, there are zero static SSH keys to manage, rotate, or audit. Every SSH session is identity-bound, time-limited, fully recorded, and automatically cleaned up. Your attack surface shrinks, your compliance posture strengthens, and your engineers spend zero time managing keys.
SSH Security Best Practices Checklist
Whether you implement certificate-based SSH yourself or use a platform like OnePAM, these are the practices that separate secure SSH infrastructure from vulnerable infrastructure:
- Eliminate static SSH keys — Migrate to certificate-based authentication. Static keys are persistent, unscoped, and invisible to your identity provider.
- Set certificate lifetimes under 24 hours — Short-lived certificates dramatically reduce the window of exposure if a credential is compromised. Aim for 8 hours for interactive sessions.
- Bind SSH access to your identity provider — Certificates should be issued only after SSO authentication. When a user is disabled in your IdP, SSH access stops immediately.
- Use principals for role-based access — Map certificate principals to team roles (sre, developer, dba) rather than individual users. This scales with your organization and simplifies access reviews.
- Disable password authentication entirely — Set
PasswordAuthentication noandChallengeResponseAuthentication noinsshd_config. Password-based SSH is brute-forceable and has no place in production. - Record all SSH sessions — Maintain full session recordings for compliance and incident response. Connection logs alone are insufficient — auditors and investigators need to see what happened during each session.
- Implement just-in-time access — Eliminate standing privileges. Require engineers to request access for a specific duration with a stated reason. Auto-revoke when the window expires.
- Audit authorized_keys files regularly — Until you've fully migrated to certificates, run periodic sweeps of
authorized_keysfiles across your fleet to identify orphaned, shared, or unauthorized keys. - Restrict SSH agent forwarding — Agent forwarding exposes your private keys to compromised intermediate hosts. Disable it by default and use
ProxyJumpinstead ofProxyCommandwith agent forwarding. - Monitor and alert on anomalous SSH activity — Set up alerts for SSH connections from unusual IPs, at unusual times, or to servers the user hasn't accessed before. Combine with session recording for full context.
Implement all of these in minutes, not months
OnePAM delivers certificate-based SSH, session recording, JIT access, and infrastructure discovery out of the box. No CA infrastructure to build. No agents to install. No keys to manage.
Start Free TrialThe Path Forward: SSH Without Keys
SSH key management is one of those problems that feels manageable at small scale and becomes a crisis at medium scale. By the time you have fifty engineers and three hundred servers, the number of key-to-server relationships has grown into the thousands. Tracking them manually is impossible. Auditing them is a multi-week project. And cleaning them up without causing an outage is a nightmare.
Certificate-based SSH eliminates this entire category of risk. Access becomes ephemeral, identity-bound, auditable, and automatically scoped. You stop managing keys and start managing policies. Your identity provider becomes the single source of truth for who can access what. And your compliance evidence generates itself — every access request, every certificate issuance, every session recording is logged automatically.
The technology has been available in OpenSSH for over fifteen years. What's changed is that platforms like OnePAM make it operationally trivial to adopt. You don't need to build CA infrastructure, write certificate issuance services, or maintain revocation lists. You connect your identity provider, register your servers, and your engineers get secure, keyless SSH access from day one.
The question isn't whether to move away from static SSH keys — it's how long you can afford to wait.