Introducing the Corelight SSH Inference Package

By Anthony Kasza, Security Researcher, Corelight Labs

Corelight has recently released a new package, focusing on SSH inferences, as part of our Encrypted Traffic Collection. The package installs on sensors with a few clicks and provides network traffic analysis (NTA) inferences on live SSH traffic. Which SSH connections transferred files? Which SSH connections transferred keystrokes? And approximately how many commands were typed during the connection? The SSH Inference package provides these insights as well as others, detailed below. These new insights bring valuable context to threat hunters and incident responders who struggle with visibility in encrypted environments.

What is an SSH inference? Richard Bejtlich provided a great explanation in a previous blog post. The analogy I like to use follows. If you break your arm, your doctor doesn’t take a cross section of your arm. There’s no need to cut open and inspect the bone. She takes an x-ray image. Similarly, purposely breaking or downgrading encryption is often overkill and a violation of privacy. You don’t need to see what’s inside an encrypted tunnel to infer what’s occurring within it. 

What Does the SSH Inference Package Do?

By loading the SSH Inference package on a Corelight sensor, customers automatically get access to a bunch of new capabilities and insights around SSH traffic. These new features are briefly outlined below. If you’re a customer and would like a more detailed look at the feature set, see the technical documentation.

Inference tags based on SSH usage – SSH can be used in many different ways, including transferring files, executing a single command, or providing an interactive terminal. The following tags will be present in a newly added field, inferences, of the SSH log if present during an SSH connection:

  • Client Authentication Bypass (ABP) – The connection did not adhere to certain expectations of SSH according to the RFCs. This can occur when a client exploits a server or when a client and server switch to a protocol other than SSH once encryption begins.
    What’s interesting with this inference is that the exploit occurs within encrypted packets, which means we do not identify the exploit by content but rather by behavior. We cannot link the exploit to a CVE but we can tell that an exploit occurred. This approach may be useful for identifying some types of zero-day exploits, because it doesn’t require knowledge of the specifics of the attack, but instead of a general property of how some such attacks manifest upon succeeding.
    This inference is related to MITRE ATT&CK techniques:
    • T1210 (Exploitation of Remote Services)
    • T1190 (Exploit Public-Facing Application)
    • T1212 (Exploitation for Credential Access)
  • Keystrokes (KS) – An interactive session where the client sent user-driven keystrokes to the server.
    This inference is related to MITRE ATT&CK technique:
    • T1071 (Standard Application Layer Protocol)
  • Client File Upload (FU) – A file transfer occurred during the session where the client sent a sequence of bytes to the server.
    This inference is related to MITRE ATT&CK techniques:
    • T1074 (Data Staged)
    • T1105 (Remote File Copy)
    • T1071 (Standard Application Layer Protocol)
    • T1020 (Automated Exfiltration)
    • T1041 (Exfiltration Over Command and Control Channel)
  • Client File Download (FD) – A file transfer occurred during the session where the server sent a sequence of bytes to the client.
    This inference is related to MITRE ATT&CK techniques:
    • T1074 (Data Staged)
    • T1105 (Remote File Copy)
    • T1071 (Standard Application Layer Protocol)
    • T1020 (Automated Exfiltration)
    • T1041 (Exfiltration Over Command and Control Channel)
  • Client Bruteforce Guessing (BF) – A client was seen attempting to authenticate more than some configured threshold. This threshold is per connection, not per host.
    This inference is related to MITRE ATT&CK techniques:
    • T1133 (External Remote Services)
    • T1110 (Brute Force)
    • T1201 (Password Policy Discovery)
    • T1021 (Remote Services)
  • Client Bruteforce Success (BFS) – A client was seen attempting to authenticate more than some configured threshold and then successfully authenticated.
    This inference is related to MITRE ATT&CK techniques:
    • T1133 (External Remote Services)
    • T1110 (Brute Force)
    • T1201 (Password Policy Discovery)
    • T1021 (Remote Services)
  • Version Scanning (SV) – A client exchanged version strings with a server but then disconnected.
    This inference is related to MITRE ATT&CK technique:
    • T1046 (Network Service Scanning)
  • Capabilities Scanning (SC) – A client exchanged capabilities with a server but then disconnected.
    This inference is related to MITRE ATT&CK technique:
    • T1046 (Network Service Scanning)
  • Other Scanning (SP) – A client and server exchanged no encrypted packets, but the client was not a version or capabilities scanner.
    This inference is related to MITRE ATT&CK technique:
    • T1046 (Network Service Scanning)

An aside on keystroke inferences: 

It would be very useful to infer the approximate size of commands a client sent to a server. For example, the string “sudo su” will always be 7 characters long (and will rarely be completed using the tab key), the server response size will generally be small, and the client-provided password will not be line-buffered. This could lead to more complex analyses identifying password lengths. Corelight Labs attempted to add such an inference to this package but found too many edge cases to consider our prototype sufficiently robust to release. Ben Reardon, now a member of the Corelight team and the developer of packetStrider, agrees with our opinion. Clients using things like visual editors, screen and tmux, ncurses, tab completion, the backspace key, history navigation (up/down arrow keys), command aliases, and client keystroke buffering all make determining client command size difficult. (Note: I did not say “impossible”.)

An improved SSH authentication result – open source Zeek employs some very clever packet-level processing in its core SSH analyzer, which raises authentication attempt, success,  and failure events to scriptland. We developed some improvements to this logic and the logic around logging the authentication result. Users should see fewer unset auth_success fields in their SSH logs.

Tunable configuration options – what may be worth an analyst’s attention at one site may be very normal at another. This is one of the core tenants of Zeek’s policy-neutral event system. By exposing tunable knobs to customers you get to decide which inferences are worth turning on or being notified about.

Demo Demo Demo! 

The following is a video demonstrating, at a high level, how the SSH Inference package analyzes SSH encrypted packet lengths, order, and direction. By hooking the ssh_encrypted_packet() event and printing the size to the screen, we can see what an SSH sequence looks like for an interactive session containing keystrokes. Positive sized packets are transmitted by the client while negative sized packets are transmitted by the server.

Source: https://asciinema.org/a/XKYUVycHNCi9G5mNKvfKCVXpd

The following video demonstrates the extensions to the SSH log that the SSH Inference package makes. The client issues three commands to the server via keystrokes.

Source: https://asciinema.org/a/TuXVmz6A4BN7iEaW7u1cv8eQ7

If you’d like to see a live demo of the SSH Inference package in action on your network, contact us!

How Are Inferences Made?

Inferences are based on the concept of sequence of lengths. During an SSH connection, packets are exchanged between clients and servers. By analyzing the size, order, and direction of these packets, the SSH sub-protocols’ state machines can be modeled and tracked throughout the life of a connection, even without the ability to parse content due to encryption. Once the SSH connection sub-protocol begins, a client’s mode-of-use can be inferred from the structure of the packet sequences.

I began by visualizing the first 30 encrypted packets of all SSH connections from sample SSH traffic. These encrypted packets are exchanged immediately after NewKeys messages are sent. In Figure 1 each line represents a single SSH connection. The x-axis represents the order of the packets while the y-axis represents the direction and size of the packet. Positive values are packets sent by the client and negative values are packets sent by the server.

Figure 1 – The first 30 encrypted packets of a few hundred SSH connections

By making each connection’s line slightly transparent, natural clusters visually emerge. These clusters were then manually teased out (sounds like a job for machine learning) and labeled. The SSH RFCs were then reviewed to attempt to identify what each cluster of connections could be representing. Recall that SSH consists of three sub-protocols, and these sub-protocols interact in a specific way. Figure 2 illustrates an approximate overlay of the three sub-protocols on the SSH connection’s sequence of packet lengths.

Figure 2 – Figure 1 withapproximate SSH sub-protocols overlayed

The next step in creating inferences on SSH traffic was to identify patterns and codify them. Some examples of patterns follow:

  • A file transfer reaches maximum packet size very quickly and often lasts the entire connection.
  • Keystrokes exhibit an echo pattern where a client transmits a keystroke to the server and the server echos the keystroke back to the client.
  • Exploits and other odd traffic exhibit patterns that demonstrate the SSH sub-protocols did not properly interact, for example a file transfer beginning before authentication has occurred.

As our SSH traffic sample was small and manageable, it provided a good initial data set for building institutional knowledge around SSH behavior. Once we had some understandings of SSH behaviors we tested those understandings across our Polaris deployments. Evaluating our inferences built from a few hundred connections against a few hundred thousand connections exposed many assumptions we made during the initial prototype of this package. Iteratively testing the package and incorporating changes from real networks allowed us to develop robust and scalable inferences. We’d like to thank all of our Polaris partners and champions for helping shape the foundation of this package as well as other research ideas.

Wrapping Up

Corelight is releasing the SSH Inference package to customers as part of the Encrypted Traffic Collection preview. We’re calling it a preview because more is to come. While length, order, and direction were used to build the SSH Inference package, we did not incorporate timing into the analyses; doing so potentially unlocks additional inferences. New features based on timing are currently in the planning stages for the next release of the package.

How frequently is SSH being used on your network? Are the connections long-lived or short? Are the SSH connections primarily interactive or bulk transfers? Do you want to know more about the existing SSH traffic on your network? If so, Corelight can provide you with the insights outlined above.

Leave a comment

Your email address will not be published. Required fields are marked *