Corelight Blog

We make the world's networks safer.

First, Do No Harm — March 14, 2019

First, Do No Harm

By Richard Bejtlich, Principal Security Strategist, Corelight

When we hear the phrase “first, do no harm,” most of us think of the Hippocratic Oath and its guidance for physicians. I was surprised to learn that the phrase as translated does not actually appear in the Greek, and that the origins are more modern, dating from the 17th century, and likely simply congruent with the Greek theme. I pondered these deep mysteries because I realized “first, do no harm” should be a requirement for anyone responsible for network security monitoring (NSM). In this post I will examine its relevance and discuss why implementing NSM via passive instrumentation delivers on the promise of the Hippocratic Oath.

I was inspired to write this story after reading an online security discussion. The original poster (OP) said he wanted to use a Raspberry Pi to monitor a family member’s home network. The OP was not sure how to do this. As I read the post, I imagined the common home network environment consisting of an Internet service provider (ISP) all-in-one access router and WiFi gateway. One end of the box connects to the ISP line from outside the residence, and users can either connect to the router via a few (often four) Ethernet ports, or more common, WiFi on several bands.

I was disturbed by several of the replies to the OP’s question. One person recommended making the Pi a new gateway, such that all traffic had to route through it. In this scenario, the Pi is either a so-called “one legged router” with its one native interface, or by adding a second network interface card (NIC) via a USB port, it works like a regular multi-interface router. Once the Pi is routing traffic (and I’m not even sure how this might work, given that the ISP device must still provide WiFi access), one could run NSM software on the Pi to inspect network traffic.

Another person recommended ARP spoofing the gateway, such that the Pi pretends to be some upstream device in the path from the Wifi client to the ISP network. In this scenario, the Pi is again either a one-legged or two-legged router, except that it is using essentially a “hacking” technique to fool clients that it is the device to which traffic should be sent when a client wishes to visit the Internet. Again, once it is seeing traffic, the Pi inspects it by running NSM software.

Another person contributed to the conversation by saying the OP could run a full packet capture application on the Pi once it saw traffic. Remember that NSM suggests three main data types for collecting and analyzing data — full packet capture (FPC), transaction or session logs, and alert data. FPC is self-explanatory, whereas transaction data is the sort of logging provided by Zeek and Corelight, and alert data is provided by an intrusion detection system like Suricata or Snort.

As a NSM professional, to be honest, I was horrified by these recommendations. I realized that I had internalized the moral “first, do no harm” when introducing NSM to an environment.

NSM should strive to be a passive, non-interfering element of network security. Ideally, NSM applies zero impact to the network, aside from a safe physical access point. This is most effectively created by a network tap, preferably one engineered to fail open in the absence of electrical power. The NSM platform connects to a passive interface on the tap, and in no way interferes with the network. I do not recommend using network taps with ports that allow injecting traffic on the monitoring ports. I would much rather be able to tell the C-suite that their new NSM enterprise sensor grid is incapable of interfering with the network, by design. If one cannot deploy taps, then a secondary option is to use switch span ports. These must be carefully configured and monitored for changes. One must be careful that network administrators neither disable the span port nor affect the behavior of their switch due to the load introduced by spanning network traffic.

With this in mind, it is easy to see why using a Raspberry Pi in the manner just described is a terrible idea for NSM. One should not deploy a low-cost, albeit cool, device into a network and force it to become the weakest link in the Internet access chain by putting it physically or virtually inline. In the gateway or the even more terrible ARP spoofing scenarios, if and when the Pi fails, the family member loses Internet access. If the family member is remote, returning him or her to the network will require a walk at best or a flight at worst.

A device like a Pi might run Zeek or Suricata to produce logs, but it is likely to suffer as a full packet capture device. A cheap SD card is not designed for constant writing, as required by a FPC solution. Robert Graham posted a series of Tweets on SD cards recently, if you would like to know more about SD cards for Pi usage.

To summarize, NSM is always your next best move, assuming you didn’t start with NSM by building visibility into your enterprise. However, when making that move, never forget to “do no harm.” Don’t introduce fragility into your architecture by making Internet or other network access dependent on the NSM hardware or software. Ideally, NSM is completely passive, and better yet, invisible to users and intruders. In this way, security staff enjoy the benefits of knowing they are improving visibility on a platform that is more trustworthy, thanks to the way it was designed and deployed.

The Elephant in the SIEM War Room — March 12, 2019

The Elephant in the SIEM War Room

By Brian Dye, Chief Product Officer, Corelight

Last week’s RSA announcements included a pair of new entrants in to the SIEM space, Google Chronicle’s Backstory and Microsoft’s Azure Sentinel. While the entry of larger players in to the SIEM space is an eyebrow-raiser on its own, in conjunction with the existing competitive fray it is pretty amazing. The good news is that this level of competitive intensity is a very good thing for customers and defenders. That said, it is worth looking at the main angles of innovation that are playing out across all the form factors (on-prem, MSSP, and SaaS) … and the elephant in the room that goes with them:

  • Ecosystem: Under the “decentralized innovation” theme, Splunk (and more recently PANW) has focused on creating a range of complementary analytics solutions to help get the most out of the aggregated data (and acquiring a few them as well).
  • Analytics: Most notably players like Exabeam have put a real premium on novel analytics focused on key IR issues. In the more recent announcements, Google Chronicle touts not only their internally-developed analytics but also a real focus on query speed (the phrase “coffee break query” is unfortunately an industry term at this point).
  • Pricing: While Splunk made tremendous success with their “buy as you go” pricing model, it is no secret that is a struggle for infosec budgets now. Elastic and Humio offer structurally different alternatives, and Chronicle’s Backstory announcement squarely focused on this issue as well.

What’s missing in this discussion? The DATA ITSELF. As any data scientist will tell you, the best tools in the world are accelerated (or limited!) by the data. Furthermore, getting the data “right” is the most time consuming part of many data-intensive projects … and the SOC is one big data analysis project. In talking to customers, I’ve seen three key trends that underscore how important the data is to the success of defenders using any of these technologies:

  1. Security Data Science teams: In the large enterprise, we are seeing true data science teams (in many cases seeded with folks from other internal data science efforts) being staffed for security. This helps defenders up their game, and use the same spread of analytics tools that the attackers are taking advantage of already.
  2. Career Development: Even when full data science teams aren’t being staffed, I see defenders taking their own steps through classes – most commonly in Python analytics frameworks like SciKit-Learn or TensorFlow. For the same reasons above, this is a great step and an unqualified positive for both the career of the individual and the defensive posture of the organization.
  3. Post-processing: As defenders use those data analytics skills, they often work to improve, augment, or customize the data in their environment. This often starts by getting *really* good at data joins in their SIEM, but can extend to tools like Kafka Streams or full ETL-style post-processing environments.

All three of these often result in teams looking for an alternative to the “by-product data” they have today. What does that mean? Most of the logs in the SOC were never meant for large scale security analytics … they are operational or alerting logs from a protection or detection technology. This search for better data often leads defenders and data scientists to Corelight (based on the Zeek (fka Bro) open source project), because it has:

  • Security Effectiveness: because Corelight is deployed passively behind a TAP, you benefit from a fast  and non-disruptive deployment that gives very broad environment coverage. Just as importantly, the data itself is highly compact so organizations can cost-effectively keep data for years of coverage, not weeks or months.
  • Native structure: Speaking of which, Corelight’s data spans dozens of protocols, and results from 20 years of evolution focused on the needs of incident responders and threat hunters – enabling great insight into everything from behavioral movement to encrypted traffic to extracted files. The key is that it is all linked with a common identifier, allowing both analysts and machines to deterministically connect what used to be isolated pools of insight (breaking a historical SIEM problem of “raw vs. normalized format” tradeoffs). This saves people time (less chair swiveling!) and dramatically streamlines data science and analytics work.
  • Extensibility: Zeek was built from the start to be changed and improved, both to create new data types and derive new insights from the data already there. Defenders throughout the open source community already take advantage of this, as (a) fixing the data is often far easier at the source than with heroic post processing and (b) without the right incoming signal, no amount of post-processing will ever succeed.

In the end, the increased competition in the SIEM space is a great thing for people and organizations charged with defending networks and information, and we at Corelight are happy to partner with all of them. No matter which technology you are using today (or considering tomorrow) for your SOC to remediate critical security-related outcomes, come check out Corelight. Getting the right data from the start accelerates almost everything in your IR process, from tools to people. That’s why we believe Corelight is your next best move in security. Put succinctly in the the words of one of our customers, “If I didn’t have this data I wouldn’t sleep well at night. I like to sleep well at night.”

#winning — March 5, 2019

#winning

By Alan Saldich, Chief Marketing Officer, Corelight

2018 was undoubtedly a banner year for Corelight. We closed out 2018 with many successes under our belt that reflect the hard work of our people: We more than quadrupled our sales year-over-year and more than doubled our customer base and employee count; we strengthened our balance sheet and board of directors with a $25 million Series B round; we added new leaders with deep security industry experience to our executive team; we expanded our product portfolio with the Corelight AP 200 Sensor and AP 3000 Sensor as well as the the Corelight Virtual Sensor.

With our largest customers deploying fleets of Corelight Sensors approaching 100 units, managing them has become much more challenging. To help them, last week we introduced the Corelight Fleet Manager to enable the management of up to 250 sensors from one central console. We also announced new product features in Corelight v16. And, we’re expanding our global footprint with a new sales office in EMEA.

On top of all that, today, we are happy to share some industry recognition and a few awards that are now under our belt as well:

First of all, BusinessInsider asked a group of venture capitalists to identify open source startups that are likely to “blow up” in 2019 (in a good way), and I’m happy to report Corelight was included in the list of 19 open source software startups to watch in 2019.

We were also pleased to be recognized with a couple of awards:

The Corelight Sensor was named the “Most Innovative Network Security and Management” solution by CyberDefense Magazine. The awards program, now in its seventh year honors unique and compelling products in the information security space.

winners-1024x829

Corelight received the “Cybersecurity Excellence Award – Silver” for Network Traffic and Analysis (NTA) products. Cybersecurity Excellence Awards recognize companies, products and professionals that demonstrate excellence, innovation and leadership in information security.

WINNER_silver_2019
As we head into the RSA Conference this week it’s important to realize that although we saw spectacular growth last year, we have only just begun the climb toward becoming a major enterprise security vendor, and we cannot do it without the support of great customers, partners and employees.

If you’re interested in learning more about Corelight and will be in San Francisco for RSA this week, stop by to see us at booth 4308 (we’re between the North and South Halls, underneath Howard Street). Or, if you’d like to join the Corelight team, check out our career opportunities. You can also find us on Twitter @corelight or LinkedIn: https://www.linkedin.com/company/corelight/.`

Astronomers and Chemists — February 28, 2019

Astronomers and Chemists

By Brian Dye, Chief Product Officer, Corelight

Scale is a great word, because its meaning is truly in the eye of the beholder.  To an astronomer, it might mean millions of light years. To a chemist, nanometers.  In the network security monitoring (NSM) world, Corelight is enabling scale in two different senses of the word: management (at enterprise scale) and data (when 30% less is a beautiful thing).

Management is the more straightforward of the two. As NSM deployments grow, and in particular as they expand beyond physical sensors to include virtual and cloud environments, the ability to administer those environments easily is critical.  From its beginning, Corelight has helped our customers focus on their data … accelerating incident response, finding advanced threats, uncovering new behavioral patterns … rather than systems administration.

Now that our largest customers are approaching hundreds of deployed sensors, our mission is broadening. It has led us to develop and launch our new Corelight Fleet Manager, which allows organizations to (wait for it!) manage fleets of Corelight sensors. In doing so, we are not trying to reinvent the wheel of distributed systems management. Quite the opposite – we have taken the highest impact workflows and delivered them in a streamlined user experience. That includes grouping sensors across the environment, providing role-based access control to those groups, and then automating deployment of configuration policies to them. And yes, since you asked – of course we have an available dark mode. Across light or dark, we measure our success by how few clicks are required and whether you need to open the manual. This is a user experience designed for the administrator, not the sales demo!  

Enabling data scale is more nuanced, but just as compelling. We already provide tremendous flexibility in how data is both generated and exported, including:

  • Fork-and-filter architecture, so you can send some or all of your logs to different destinations in parallel (a SIEM vs. S3 for example)
  • Filter language, so you can easily customize the results of any given log – just keeping the entries you need
  • Custom packages, which allow you to create logs with new or unique information as you need it

These options are critical to helping organizations deliver the data they need to where they need it – which is especially important for those on a volume-based SIEM pricing model.

Even with those capabilities, we heard customers ask for more – not just capabilities, but out-of-the-box content based on real-world experience. Content that looks within the data streams themselves to screen out log entries with lower security value and maximize the ROI of their data. As a result, we have created the Data Reduction packages. They create new versions of our six most popular logs (conn, http, dns, ssl, files, and weird) that apply a cost-driven filter to the information created. For example, if a host is making repeated DNS queries hundreds of times per second, then we temporarily stop producing repeats of that log entry. If identical files and certificates are moving through the network, we stop repeating those log entries for a while as well. These targeted data reductions were developed in cooperation with incident responders at some of the world’s leading organizations, and (for many) represent an attractive trade-off between downstream SIEM cost and security data coverage. Specifically, to date we have seen a ~30% reduction in data with little loss of security insight, which is a powerful combination.

As we said at the start, scale is in the eye of the beholder. If your view of data scale is “I need it all” then we can provide it. If you are budget constrained (or not on a SIEM site license!) so you want it all in nearline storage but just the reduced version in your SIEM, we can provide that, too.  To each their own, we say!

Whether you are looking at management scale or data scale, we are happy to deliver these new capabilities to you. For our current customers, thanks again for your confidence, partnership, and feedback. If you haven’t worked with Corelight (or Zeek!) yet, welcome to the movement – we are looking forward to helping you at whatever scale we can.

Examining aspects of encrypted traffic through Zeek logs — February 19, 2019

Examining aspects of encrypted traffic through Zeek logs

By Richard Bejtlich, Principal Security Strategist, Corelight

In my last post I introduced the idea that analysis of encrypted HTTP traffic requires different analytical models. If you wish to preserve the encryption (and not inspect it via a middlebox), you have to abandon direct inspection of HTTP payloads to identify normal, malicious, and suspicious activity.

In this post I will use Zeek logs to demonstrate alternative ways to analyze encrypted HTTP traffic.The goal is to reduce a sea of uncertainty to a subset of activity worth investigating. If we can resolve the issue with Zeek data, wonderful. If we cannot, at least we have decided where we need to apply additional investigation, perhaps by applying intelligence, or host-based log data, or other resources.

Because we are talking about encryption woes, I start with Zeek’s x509.log. X509 is an Internet standard which defines the format of public key certificates. These certificates are an important element of Secure Sockets Layer (SSL) and Transport Layer Security (TLS) encryption used with HTTPS traffic.

image (49)
Zeek’s x509.log

In the following example I want to profile the algorithms used to sign x509 certificates.

me@mine:/nsm/bro/logs/2019-02-09$ zcat x509*.gz | jq -c 
'[."certificate.sig_alg"]' | sort | uniq -c | sort -nr

  24549 ["sha256WithRSAEncryption"]
    646 ["ecdsa-with-SHA256"]
     31 ["sha512WithRSAEncryption"]
     20 ["sha1WithRSAEncryption"]

The last result is worrisome. I would prefer not to see any certificates signed by the SHA1 algorithm in use in my environment. As explained by Mozilla, SHA1 suffers many problems that render it unsuitable in modern environments. Is this perhaps suspicious or malicious? I could imagine a scenario where an intruder doesn’t worry about signing collisions, because his malware doesn’t care about being ranked lower by Google’s web page search algorithms.

Next I search for Zeek x509.log entries with the SHA1 algorithm.

me@mine:/nsm/bro/logs/2019-02-09$ zgrep sha1WithRSAEncryption x509*.gz

x509.12:00:00-13:00:00.log.gz:{"ts":"2019-02-09T12:12:51.826252Z","id":"FTGvvp4TC5GHCel6ad","certificate.version":3,"certificate.serial":"00","certificate.subject":"CN=http.l.root-servers.org,OU=LROOT,O=ICANN,L=New Taipei,C=TW","certificate.issuer":"CN=http.l.root-servers.org,OU=LROOT,O=ICANN,L=New Taipei,C=TW","certificate.not_valid_before":"2018-11-27T17:33:54.000000Z","certificate.not_valid_after":"2028-11-24T17:33:54.000000Z","certificate.key_alg":"rsaEncryption","certificate.sig_alg":"sha1WithRSAEncryption","certificate.key_type":"rsa","certificate.key_length":2048,"certificate.exponent":"65537","basic_constraints.ca":true}
...trimmed…

I collect several bits of important information here, in addition to a specific log containing a match. First, I get a file identifier, FTGvvp4TC5GHCel6ad, which I will leverage shortly. This file identifier uniquely identifies the x509 certificate that Zeek observed during an encrypted session. Second, I see this certificate was issued by CN=http.l.root-servers.org,OU=LROOT,O=ICANN,L=New Taipei,C=TW. I do not know if that is a problem in and of itself. I also note that the certificate appears to have been issued in late 2018, which is odd given the warnings against using SHA1 for x509 certificates.

Using the file ID, I begin looking for other Zeek log entries. This demonstrates the real power of Zeek logs: they can be linked by entries like the file ID. I will examine each in turn as they appear. (Note that I could have searched for the certificate identifier for other log entries. I could have also turned to sources outside by logs for more information on this identifier.)

me@mine:/nsm/bro/logs/2019-02-09$ zgrep FTGvvp4TC5GHCel6ad *

files.12:00:00-13:00:00.log.gz:{"ts":"2019-02-09T12:12:51.826252Z","fuid":"FTGvvp4TC5GHCel6ad","tx_hosts":["199.7.83.80"],"rx_hosts":["10.10.40.48"],"conn_uids":["CJDF553HmA2WdUq1Af"],"source":"SSL","depth":0,"analyzers":["X509","SHA1","MD5"],"mime_type":"application/x-x509-user-cert","duration":0.0,"local_orig":false,"is_orig":false,"seen_bytes":1036,"missing_bytes":0,"overflow_bytes":0,"timedout":false,"md5":"59783b47c36f4360f3dea9f075fea5e4","sha1":"f341046ee214c2dfdcdd1685f66a2b45686cc947"}

Above we have a files.log entry.

image (50)
Zeek’s files.log

This was generated by Zeek in the process of tracking the encrypted session and writing the x509.log. This is a key log because it provides the connection ID, CJDF553HmA2WdUq1Af, which we can use to look for additional Zeek logs. The files.log also contains the source and destination IP addresses, but we will rely on linked logs for more information on the session.

ssl.12:00:00-13:00:00.log.gz:{"ts":"2019-02-09T12:12:51.631894Z","uid":"CJDF553HmA2WdUq1Af","id.orig_h":"10.10.40.48","id.orig_p":36780,"id.resp_h":"199.7.83.80","id.resp_p":443,"version":"TLSv12","cipher":"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","curve":"secp256r1","resumed":false,"established":false,"cert_chain_fuids":["FTGvvp4TC5GHCel6ad"],"client_cert_chain_fuids":[],"validation_status":"self signed certificate","ja3":"f42d3af1334b98cbe5690adfc5b574a2","ja3s":"174e7e4992a63f6d419626d97363adb8"}

Next we have the ssl.log.

image (51)
Zeek’s ssl.log

This log entry offers details on the nature of the encryption used in the session of interest. We have the same IP addresses seen earlier, as well as ports. Again, I will turn to these later. Note momentarily the last two bolded entries, for the ja3 and ja3s fields. I will return to those shortly as well. The most important part of this log, for immediate use once we finish the results of this search, is the uid of CJDF553HmA2WdUq1Af. This is a connection identifier that we will search for shortly.

x509.12:00:00-13:00:00.log.gz:{"ts":"2019-02-09T12:12:51.826252Z","id":"FTGvvp4TC5GHCel6ad","certificate.version":3,"certificate.serial":"00","certificate.subject":"CN=http.l.root-servers.org,OU=LROOT,O=ICANN,L=New Taipei,C=TW","certificate.issuer":"CN=http.l.root-servers.org,OU=LROOT,O=ICANN,L=New Taipei,C=TW","certificate.not_valid_before":"2018-11-27T17:33:54.000000Z","certificate.not_valid_after":"2028-11-24T17:33:54.000000Z","certificate.key_alg":"rsaEncryption","certificate.sig_alg":"sha1WithRSAEncryption","certificate.key_type":"rsa","certificate.key_length":2048,"certificate.exponent":"65537","basic_constraints.ca":true}

The last log is the x509.log again. I show it here to demonstrate that searching for the file ID results in the three types of logs just shown — files.log, ssl.log, and x509.log. In order of logical creation, they would be listed as ssl.log, x509.log, and files.log.

Returning to the results of the ssl.log, you will remember we found a connection ID. Let’s search for it and see what we find. Again, I will show one entry at a time and explain the pertinent aspects.

me@mine:/nsm/bro/logs/2019-02-09$ zgrep CJDF553HmA2WdUq1Af *

conn.12:00:00-13:00:00.log.gz:{"ts":"2019-02-09T12:12:51.440728Z","uid":"CJDF553HmA2WdUq1Af","id.orig_h":"10.10.40.48","id.orig_p":36780,"id.resp_h":"199.7.83.80","id.resp_p":443,"proto":"tcp","service":"ssl","duration":0.576199,"orig_bytes":123,"resp_bytes":1477,"conn_state":"SF","local_orig":true,"local_resp":false,"missed_bytes":0,"history":"ShADadFf","orig_pkts":7,"orig_ip_bytes":423,"resp_pkts":5,"resp_ip_bytes":1689,"sensorname":"mine-ens33"}

The conn.log is sort of the “top level” Zeek log.

image (52)
Zeek’s conn.log

Zeek creates conn.log entries for “connections,” whether they are connection-oriented (like TCP) or connectionless (like UDP). This entry shows us flow details about the connection, like the source IP (10.10.40.48), the destination IP (199.7.83.80), and the source and destination ports (36780 and 443), along with the IP protocol (TCP).

conn-summary.12:00:00-13:00:00.log.gz:Invalid starting time on line: {"ts":"2019-02-09T12:12:51.440728Z","uid":"CJDF553HmA2WdUq1Af","id.orig_h":"10.10.40.48","id.orig_p":36780,"id.resp_h":"199.7.83.80","id.resp_p":443,"proto":"tcp","service":"ssl","duration":0.576199,"orig_bytes":123,"resp_bytes":1477,"conn_state":"SF","local_orig":true,"local_resp":false,"missed_bytes":0,"history":"ShADadFf","orig_pkts":7,"orig_ip_bytes":423,"resp_pkts":5,"resp_ip_bytes":1689,"sensorname":"mine-ens33"}

files.12:00:00-13:00:00.log.gz:{"ts":"2019-02-09T12:12:51.826252Z","fuid":"FTGvvp4TC5GHCel6ad","tx_hosts":["199.7.83.80"],"rx_hosts":["10.10.40.48"],"conn_uids":["CJDF553HmA2WdUq1Af"],"source":"SSL","depth":0,"analyzers":["X509","SHA1","MD5"],"mime_type":"application/x-x509-user-cert","duration":0.0,"local_orig":false,"is_orig":false,"seen_bytes":1036,"missing_bytes":0,"overflow_bytes":0,"timedout":false,"md5":"59783b47c36f4360f3dea9f075fea5e4","sha1":"f341046ee214c2dfdcdd1685f66a2b45686cc947"}

ssl.12:00:00-13:00:00.log.gz:{"ts":"2019-02-09T12:12:51.631894Z","uid":"CJDF553HmA2WdUq1Af","id.orig_h":"10.10.40.48","id.orig_p":36780,"id.resp_h":"199.7.83.80","id.resp_p":443,"version":"TLSv12","cipher":"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","curve":"secp256r1","resumed":false,"established":false,"cert_chain_fuids":["FTGvvp4TC5GHCel6ad"],"client_cert_chain_fuids":[],"validation_status":"self signed certificate","ja3":"f42d3af1334b98cbe5690adfc5b574a2","ja3s":"174e7e4992a63f6d419626d97363adb8"}

I have slightly reordered these three results in order to group and skip them. The conn-summary log is basically a repeat (in this instance) of the conn.log, and we have already seen the files.log and ssl.log. Let’s continue our interpretation with the next unique result.

notice.12:00:00-13:00:00.log.gz:{"ts":"2019-02-09T12:12:57.016988Z","uid":"CJDF553HmA2WdUq1Af","id.orig_h":"10.10.40.48","id.orig_p":36780,"id.resp_h":"199.7.83.80","id.resp_p":443,"proto":"tcp","note":"SSL::Invalid_Server_Cert","msg":"SSL certificate validation failed with (self signed certificate)","sub":"CN=http.l.root-servers.org,OU=LROOT,O=ICANN,L=New Taipei,C=TW","src":"10.10.40.48","dst":"199.7.83.80","p":443,"actions":["Notice::ACTION_LOG"],"suppress_for":3600.0,"dropped":false}

Above we see the notice.log. Zeek generated this entry for the connection of interest because it was a self-signed certificate. By itself, this does not tell us if the event is normal, suspicious, or malicious, but it is still unwanted.

If we wanted to think about these logs as a chain, I would order them thusly (ignoring the conn-summary.log as it is a “meta” log in most cases).

conn.log, ssl.log, x509.log, files.log, notice.log

Let’s pivot on two items of interest from the ssl.log, the ja3 and ja3s entries. JA3 refers to a wonderful addition to the Zeek code base, donated by engineers from Salesforce.com. JA3 fingerprints connections based on aspects of the client and server TLS connections. A ja3 entry reflects the client and a ja3s entry reflects the server. For our ssl.log, we had these elements:

"ja3":"f42d3af1334b98cbe5690adfc5b574a2","ja3s":"174e7e4992a63f6d419626d97363adb8"

First we will look for the ja3 client fingerprint. What systems are offering the same sort of aspects of a TLS session to their servers? I omitted the server we already looked at in the following results, and showed only new information.

me@mine:/nsm/bro/logs/2019-02-09$ zgrep f42d3af1334b98cbe5690adfc5b574a2 * | grep -v 199.7.83.80

ssl.05:00:00-06:00:00.log.gz:{"ts":"2019-02-09T05:03:39.177330Z","uid":"C3egFw3XMtTnvpPZF6","id.orig_h":"10.10.40.48","id.orig_p":39122,"id.resp_h":"193.0.6.139","id.resp_p":443,"version":"TLSv12","cipher":"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384","curve":"secp256r1","resumed":false,"established":false,"cert_chain_fuids":["FuiHJD2G1gdzQaP3vk","FR3p934xqAWpKYrSH"],"client_cert_chain_fuids":[],"validation_status":"ok","ja3":"f42d3af1334b98cbe5690adfc5b574a2","ja3s":"7770094a92b1cbfa5a6de2017cfb682a"}

ssl.05:00:00-06:00:00.log.gz:{"ts":"2019-02-09T05:04:42.160828Z","uid":"CHdKRn27IavFmqSHx3","id.orig_h":"10.10.40.48","id.orig_p":36806,"id.resp_h":"193.0.6.158","id.resp_p":443,"version":"TLSv12","cipher":"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384","curve":"secp256r1","resumed":false,"established":false,"cert_chain_fuids":["FlyIxckLGAHiM1Hf9","FyGulK2VyBjEK1ibrk"],"client_cert_chain_fuids":[],"validation_status":"ok","ja3":"f42d3af1334b98cbe5690adfc5b574a2","ja3s":"7770094a92b1cbfa5a6de2017cfb682a"}

It looks like our host of interest, 10.10.40.48, is the only system on our network in play, but we have found two other servers to whom 10.10.40.48 communicates — 193.0.6.139 and 193.0.6.158. We could pivot on those IP addresses if we so chose. Note the new ja3s values also.

Now let’s look at the server side to see if any other servers offer similar TLS connection aspects to their clients. We grep for the jas3 value from the earlier ssl.log.

me@mine:/nsm/bro/logs/2019-02-09$ zgrep 174e7e4992a63f6d419626d97363adb8 * 

ssl.02:00:00-03:00:00.log.gz:{"ts":"2019-02-09T02:04:15.713472Z","uid":"Ch1pX52BJ8igoekQgg","id.orig_h":"10.10.40.26","id.orig_p":49170,"id.resp_h":"17.154.66.73","id.resp_p":443,"version":"TLSv12","cipher":"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","server_name":"p17-buy.itunes.apple.com","resumed":true,"established":true,"ja3":"244b38f0cab862325e0574a86f6d8854","ja3s":"174e7e4992a63f6d419626d97363adb8"}

ssl.02:00:00-03:00:00.log.gz:{"ts":"2019-02-09T02:30:38.366012Z","uid":"CYHv9u3rWcPo7VEWMk","id.orig_h":"10.10.40.13","id.orig_p":59865,"id.resp_h":"17.154.66.154","id.resp_p":443,"version":"TLSv12","cipher":"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","server_name":"p22-buy.itunes.apple.com","resumed":true,"established":true,"ja3":"244b38f0cab862325e0574a86f6d8854","ja3s":"174e7e4992a63f6d419626d97363adb8"}

ssl.02:00:00-03:00:00.log.gz:{"ts":"2019-02-09T02:43:19.510856Z","uid":"CWYYzb2YqYM3B5fBA3","id.orig_h":"10.10.40.21","id.orig_p":62475,"id.resp_h":"17.173.66.180","id.resp_p":443,"version":"TLSv12","cipher":"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","server_name":"p8-buy.itunes.apple.com","resumed":true,"established":true,"ja3":"7570245c781d7d7a68e31419177e728d","ja3s":"174e7e4992a63f6d419626d97363adb8"}

How interesting! It appears we have three Apple iTunes servers which use the same TLS connection aspects as those accepting connections from 10.10.40.48, and we have three unique clients connecting to each of them. This is likely normal, but interesting nevertheless. Remember that if we wanted to pivot off these results, we could pick one session and search for the connection ID. In the following example I look for the connection ID of the first of the last three results (which was bolded).

me@mine:/nsm/bro/logs/2019-02-09$ zgrep Ch1pX52BJ8igoekQgg *

conn.02:00:00-03:00:00.log.gz:{"ts":"2019-02-09T02:04:15.638062Z","uid":"Ch1pX52BJ8igoekQgg","id.orig_h":"10.10.40.26","id.orig_p":49170,"id.resp_h":"17.154.66.73","id.resp_p":443,"proto":"tcp","service":"ssl","duration":31.178606,"orig_bytes":3163,"resp_bytes":1202,"conn_state":"SF","local_orig":true,"local_resp":false,"missed_bytes":0,"history":"ShADadFf","orig_pkts":13,"orig_ip_bytes":3707,"resp_pkts":11,"resp_ip_bytes":1654,"sensorname":"mine-ens33"}

conn-summary.02:00:00-03:00:00.log.gz:Invalid starting time on line: {"ts":"2019-02-09T02:04:15.638062Z","uid":"Ch1pX52BJ8igoekQgg","id.orig_h":"10.10.40.26","id.orig_p":49170,"id.resp_h":"17.154.66.73","id.resp_p":443,"proto":"tcp","service":"ssl","duration":31.178606,"orig_bytes":3163,"resp_bytes":1202,"conn_state":"SF","local_orig":true,"local_resp":false,"missed_bytes":0,"history":"ShADadFf","orig_pkts":13,"orig_ip_bytes":3707,"resp_pkts":11,"resp_ip_bytes":1654,"sensorname":"mine-ens33"}

ssl.02:00:00-03:00:00.log.gz:{"ts":"2019-02-09T02:04:15.713472Z","uid":"Ch1pX52BJ8igoekQgg","id.orig_h":"10.10.40.26","id.orig_p":49170,"id.resp_h":"17.154.66.73","id.resp_p":443,"version":"TLSv12","cipher":"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","server_name":"p17-buy.itunes.apple.com","resumed":true,"established":true,"ja3":"244b38f0cab862325e0574a86f6d8854","ja3s":"174e7e4992a63f6d419626d97363adb8"}


As you can see, Zeek provides a wealth of identifier-linked logs to make it possible to pull on various threads.

In this example, I was not able to determine the nature of the usage of the SHA1 certificate signing algorithm from within the Zeek logs themselves.

However, the Zeek logs provided information that I could use to do additional investigation. I have the source and destination IP addresses as well as information about the encryption certificates in play. At the very least, I have found a way to focus a microscope on a problem; I’m not stuck wondering where I should look for problems.

For example, I could simply choose to look at other Zeek logs for the odd host in question, 10.10.40.48. What other protocols does it use? To whom does it connect, and how? The Zeek dns.log could be specifically interesting. Perhaps we will turn to those in the next blog entry.

This concept of using network-level data in the face of encryption to identify issues of interest is my main point, and I hope you enjoyed the review of Zeek logs along the way!

Network security monitoring is dead, and encryption killed it. — January 29, 2019

Network security monitoring is dead, and encryption killed it.

By Richard Bejtlich, Principal Security Strategist, Corelight

This post is part of a multi-part series on encryption and network security monitoring. This post covers a brief history of encryption on the web and investigates the security analysis challenges that have developed as a result.

I’ve been hearing this message since the late-2000s, and wrote a few blog posts about network security monitoring (NSM) and encryption in 2008.

I’ve learned to recognize that encryption is a potentially vast topic, but often a person questioning the value of NSM versus “encryption” has basically one major use case in mind: Hypertext Transfer Protocol (HTTP) within Transport Layer Security (TLS), or Hypertext Transfer Protocol Secure (HTTPS).

Those worrying about NSM vs encryption usually started their security career when websites mainly advertised their services over HTTP, without encryption. Gmail, for example, has always offered HTTPS, but only in 2008 did it give users the ability to redirect access to its HTTPS service if they initially tried the HTTP version. In 2010, Gmail enabled HTTPS access as the default.

Today, Google strives to encrypt all of its web properties, and the  HTTPS encryption on the web section of Google’s Transparency Report makes for fascinating reading. Unfortunately, properly implementing HTTPS seems to be a challenge for most organizations, as shown by the prevalence of “mediocre” and outright “bad” ratings at the HTTPSWatch site. The 2017 paper Measuring HTTPS Adoption on the web offers a global historical view that is also worth reading.

Prior to widespread adoption of HTTPS, security teams could directly inspect traffic to and from web servers. This is the critical concern of the “encryption killed NSM” argument. For example, consider this transcript of web traffic taken from a presentation David Bianco and I delivered to ShmooCon in 2006. (Incidentally, when we spoke at this conference, it was the first time we had ever met in public!)

David investigated a suspected intrusion, and was able to systematically inspect transcripts of web traffic to confirm that a host had been attacked via malicious content but not compromised. (His original blog post is still online.)

nsm vs encryption slide example bianco

Using the Zeek network security monitor (formerly “Bro”), we could have produced similar analysis using the conn.log, the http.log, and possibly the files.log.

Because David could see all of the activity affecting the victim system, and directly inspect and interpret that traffic, he could decide whether it was normal, suspicious, or malicious.

Encryption largely eliminates this specific method of investigation. When one cannot directly inspect and interpret the traffic, one is left with fewer options for validating the nature of the activity. Encryption, however, did not introduce this problem. One could argue that modern web technologies have rendered many web sites incomprehensible to the average security analyst.

Consider the “simple” Google home page. Looking at the page in a web browser, it looks fairly simple.

 

google home page rendered

Inspecting the source for the web page shows a different story: over 33 pages, or nearly 100,000 characters, of mostly Javascript code.

google home page javascript

How could any security analyst visually inspect and properly interpret the content of this page? I submit that the very nature of modern websites killed the security methodology that allowed an analyst to manually read web traffic and understand what it meant. Yes, tools have been introduced over the years to assist analysts, but the web content of 2018 is vastly different from that of 2006.

Even if modern websites were unencrypted, they are generally beyond the capability of the average security analyst to understand in a reliable and repeatable manner. This means that without encryption, security teams would need alternatives to direct inspection and interpretation to differentiate among normal, suspicious, and malicious activity involving web traffic.

In the next article I will discuss some of those alternative models, placed within the context of HTTPS. I will likely expand beyond HTTPS in a third post. Please let me know if you want to see me discuss other aspects of this problem as well in the comments below or over on Twitter. You can find me at @taosecurity.

Monitoring. Why Bother? — January 15, 2019

Monitoring. Why Bother?

By Richard Bejtlich, Principal Security Strategist, Corelight

In response to my previous article in this blog series, some readers asked “why monitor the network at all?” This question really struck me, as it relates to a core assumption of mine. In this post I will offer a few reasons why network owners have a responsibility to monitor, not just the option to monitor.

Please note that this is not a legal argument for monitoring. I am not a lawyer, and I can’t speak to the amazing diversity of regulations and policies across our global readership. I write from a practical standpoint. I consider how monitoring will help network owners fulfill their responsibilities as custodians of data, computational power, and organizational assets.

I learned a lot about network security monitoring when I started as a midnight shift analyst at the Air Force Computer Emergency Response Team (AFCERT). Monitoring the network was integral to our operations. However, it wasn’t always the case.

Prior to 1993, each Air Force base was responsible for its own security. There was no centralized “managed security service provider” (MSSP) offering global visibility. When the AFCERT deployed trial versions of Todd Heberlein’s Network Security Monitor (NSM) software in the early 1990s, officials were shocked to find intruders in their enterprise.

From a practical standpoint, monitoring is a way to validate the assumptions one makes about the computing environment. In the case of the Air Force in the 1990s, officials assumed that intruders weren’t active in the enterprise. The Air Force had just pummeled the world’s fourth largest army in the first Gulf War. How could intruders be present? The AFCERT’s deployment of Todd’s NSM software provided irrefutable evidence to the contrary.

The first responsibility to monitor, then, is to provide evidence to support or deny one’s assumptions. Assumptions matter because they are the basis for decision making. If leaders make decisions based on faulty assumptions, then they will likely make poor choices. Those decisions can result in harm to the organization and its constituents. Significantly, that constituency can extend well beyond the organizational boundary, to include customers and other third parties who may unknowingly depend on the decisions made by the network owner.

Beyond understanding what is happening on the network, one has a duty to know what is not happening on the network. This sort of “negative knowledge” becomes critical when one is accused of nefarious activities that they did not commit, or when one is accused of ignoring activity that did not occur.

Let’s address the first case. Consider instances where rogue actors flood false Border Gateway Protocol (BGP) routes into the Internet routing plane. If other service providers carry those routes, then the parties can perform BGP hijacking. From the perspective of downstream network users whose ISPs carry the rogue routes, the BGP hijacker is, for all intents and purposes, the owner of the hijacked Internet protocol (IP) addresses. This means that if a victim sees an attack from another party’s hijacked IP addresses, the victim may accuse the authorized owner of the IP addresses as being the perpetrator.

In this BGP hijack scenario, which occurs on a daily basis, monitoring egress traffic from the hijacked IP address space can show, by omission, that no attack took place. Remember, in reality the offending traffic is generated by the party conducting the BGP hijacking. Records of traffic from the legitimate network owner would not show any attack traffic. One could argue that the BGP hijack victim could have altered his or her logs to remove evidence of attack. However, various means, if necessary, could be applied to show that, while possible, altering the evidence would have introduced forensic artifacts tipping a forger’s hand.

Now imagine the second scenario: ignoring activity that did not occur. My first work after the AFCERT involved helping to create a managed security service provider in Texas. One Monday morning, one of our clients, a financial institution, called me to complain that we had not caught the penetration test they had scheduled for the previous weekend. They were quite upset with me, but I managed to review all of the activity to their IP address space over the weekend, thanks to our deployment of NSM software and processes. I found a single instance of an Nmap scan that occurred on Saturday afternoon, which our analysts had reported as a reconnaissance event with no need for follow-on reporting. NSM data showed no other unusual activity to the customer that weekend.

I asked my customer if their “penetration tester” used a cable modem registered to a certain provider, and I offered the IP address. The customer confirmed that I had located the correct IP address, and I explained to them that the totality of the activity that my customer had paid to the “penetration tester” was an Nmap scan. I asked how much money that scan had cost, and I remember the answer being a five digit number. The customer then excused himself to make another call, which was to the firm that had tried to pretend a Nmap scan was indeed a penetration test.

In these instances, NSM data is the best way to show not only what has happened, but what has not happened. This benefit derives from the fact that NSM is not alert-centric or alert-dependent. While one should incorporate detection methods into NSM operations, remember that NSM does not depend upon alerts alone.

I have advocated NSM for two decades because I found that the decision to capture network activity details, in a neutral way, is an incredibly powerful tool. To understand why, consider an alternative that depends upon alert creation. If one’s operation assumes alerts will always provide information on network activity, what happens when activity does not trigger an alert? Similarly, how does one expect to address the “negative knowledge” question — by not generating an alert?

In brief, because network operators have a responsibility to make decisions based on proper assumptions, and because operators also have a responsibility to know what is, and what is not, happening on their networks, implementing NSM via Corelight and Zeek data is indispensable.

Network Security Monitoring: Your best next move — December 11, 2018

Network Security Monitoring: Your best next move

By Richard Bejtlich, Principal Security Strategist, Corelight

Welcome to the first in a regular series of blog posts on network security monitoring (NSM).

In 2002 Bamm Visscher and I defined NSM as “the collection, analysis, and escalation of indications and warnings to detect and respond to intrusions.” We were inspired by our work in the late 1990s and early 2000s at the Air Force Computer Emergency Response Team (AFCERT), and the operations built on the NSM software written by Todd Heberlein. Although NSM methodology applies to any sort of evidence or environment, these posts will largely describe NSM for network traffic in the enterprise.

As might be appropriate for the first post in a series on NSM, I will explain why I believe NSM is the first step one should take when implementing a security program. This may sound like a bold claim. Shouldn’t one collect logs from all your devices first, or perhaps roll out a shiny new endpoint detection and response (EDR) agent? While those steps may indeed benefit your security posture, they are not the first steps you should take.

In 2001 Bruce Schneier neatly summarized a shared vision for security: “monitor first”. I concur with this strategy, because I advocate basing security decisions on evidence, not faith. In other words, before making changes to one’s security posture, it is more efficient and effective to determine what is happening, and address the resulting discoveries first. My 2005 post Soccer Goal Security expands on this concept.

If one accepts the need to gather evidence, and identify what is happening in one’s environment as a necessary precursor to making changes, then we must determine how best to gather that evidence. Elsewhere I have advocated for four rough categories of intelligence, which I repeat here. They are ordered by increasing difficulty of implementation, but also likely increasing granularity of information.

The first way to identify what is happening in your environment is to rely on third party notification. As Mandiant’s M-Trends reports have been documenting for years, as of 2018, 38% of the firm’s incident response workload began with the victim learning of an intrusion via a third party. This is a cheap way to get insights into your security posture, as law enforcement, or worse, reporter Brian Krebs, is acting as your free threat intelligence provider. However, you are already days or weeks behind the intruder, and you must soon hire a consultancy to instrument and protect your network. It is important to maintain good relations with law enforcement and the media, but you should not rely on them for network intelligence

The second method, and the focus of this blog series, is network security monitoring. Begin by deploying a NSM sensor collecting, at a minimum, Zeek data at the gateway connecting your environment to the public Internet. This will see so-called “north-south” traffic (visibility for “east-west” traffic will be covered in a later post). By collecting NSM data, one has not interrupted daily IT operations or users, other than perhaps a brief outage to install a network tap. If administrators decide to (temporarily) use a switch SPAN port to see network traffic, users will suffer no interruption of service whatsoever. With a simple deployment, security teams gather a wealth of data about their environment and threat activity. I will address the specific benefits in future posts.

The third method is to collect logs from systems, servers, architecture, and other devices throughout the network. This step requires deploying not only a log management platform to collect, store, and present the data, but also reconfiguring each device to send its logs to the log management platform. Unlike the NSM deployment, installing and configuring a log management system is a demanding project. While the benefits are ultimately worthwhile, the project is much more involved, hence its status as the third step one should take.

The fourth way to learn about threat activity in the enterprise is to instrument the endpoints with an EDR agent. This is even a bigger project than the log collection effort, as the EDR agent could interfere with business operations while trying to observe and possibly interdict malicious activity. As with log management, I am not arguing against EDR. EDR is a tool that yields wonderful benefits for visibility and control. EDR is especially attractive the more mobile and distributed one’s workforce is, and the greater the amount of encrypted network traffic one encounters. However, the level of effort and return associated with NSM means I prefer network-centric visibility strategies prior to installing log management or EDR.

At this point you may ask “isn’t third party visibility the first step when trying to learn about threat activity? You listed NSM as second!” That is true, but I don’t consider third parties as a reliable method, or an especially proactive one. When called by the FBI, one should be able to reply “yes, thank you for calling, but I already detected the activity and we are handling it now.”

Some of you may also ask “how can NSM be first, when I already have a security program?” In that case, I suggest you make “NSM next!” In other words, augment your existing environment with NSM, and let the data help guide future security decisions.

Finally, you might ask if this is a workable solution. Has anyone ever done this? I’ve used or recommended the methodology in this blog series to dozens of organizations, from small start-ups of less than 100 people, to the largest corporate entities of half a million identities under management with global presence.

In future posts I will expand upon all things NSM. I look forward to you joining me on this journey.

The last BroCon. It’ll be Zeek in 2019! — November 5, 2018

The last BroCon. It’ll be Zeek in 2019!

By Robin Sommer, CTO at Corelight and member of the Zeek Leadership Team

I’m back in San Francisco after the last ever BroCon! Why the last BroCon? Because the Bro Leadership Team has announced a new name for the project. After two years of discussion, no shortage of suggestions, and a final shortlist going through legal review, it was time to commit: It’ll be Zeek! For an explanation of the rationale & background behind the choice, make sure to read Vern Paxson’s blog post or watch him skillfully revealing the new name at the conference.

By holding BroCon in the Washington DC area this year, we were hoping to broaden participation—and that worked: 260 people attended, up over 35% from last year.  We also had the support of eleven corporate sponsors—more than ever!-—which we deeply appreciate. These companies offered attendees a chance to learn about a variety of products and services helping people use and implement Zeek, either in its open source form or as part of commercial offerings.

I think BroCon’s program was particularly strong this year. Marcus Ranum kicked it off with an entertaining and provocative keynote. The main technical program then offered a terrific set of presentations covering a variety of organizations and topics. Some of the conference highlights for me were:

  1. The sheer number of use cases. In the sessions, we saw things like:
    1. using weirds to diagnose split routing problems
    2. using the conn_long log to identify exfiltration / C2 / rogue IT activity
    3. using JA3S to extend SSL fingerprinting to the server side
    4. using SMB logs to find named pipes in the Belgacom attack.  
  2. Watching Salesforce and Morgan Stanley stand up and explain how they use Bro to defend themselves was inspirational.
  3. The depth of technical expertise among attendees was really impressive. Folks keep pushing the boundary of how to scale Zeek clusters and come up with clever use cases of its various frameworks.
  4. Selling Bro posters to benefit Girls Who Code was fantastic.
  5. Vern’s “Zeek” name reveal moment and the positive reception of the name change by the broader community.

We received permission to record most of the talks and are currently editing the material to synchronize videos with slide sets. As soon as that’s finished, we’ll upload them to the Bro YouTube channel.

As we look to next year, the Zeek Leadership Team will begin planning the 2019 event soon. If you have attended this year, please take a moment to fill out the attendee survey; you should have received a link to provide us with feedback about program and logistics. In 2019, we’ll also do another European workshop as well. Registration details will come soon, but you can save the date already: We’ll be at CERN, Switzerland, from April 9-11.

Lastly, it will take some time to really make the change from Bro to Zeek. The soon-to-be-released version 2.6 will still be “Bro”—from then on it’ll be “Zeek.” Over the coming weeks and months you will start seeing changes, but rest assured we’ll be careful: There’s a lot to update, and we certainly don’t want to break your deployments.

Thanks for attending the last ever BroCon!

IMG_2198 2

Log enrichment with DNS host names — October 25, 2018

Log enrichment with DNS host names

By Christian Kreibich, Senior Engineer, Corelight

One of the first tasks for any incident responder when looking at network logs is to figure out the host names that were associated with an IP address in prior network activity. With Corelight’s 1.15 release we help automate the process and I would like to explain how this works.

Zeek (formerly known as Bro) provides a logging framework that gives users great control over summarization and reporting of network activity. Equipped with dozens of logs by default, it provides convenient features to extend these logs with additional fields, filter log entries according to user-defined criteria, create new log types, and hook new activity into logging events. Several log types provide identifiers that allow convenient pivoting from one log type to another, such as conn.log’s UID that many other log types use to link app-layer activity to the underlying TCP/IP flows.

Other information is only implicitly linked across log types, so analysts need to reveal it in manual SIEM-based post-processing. One example of such implicitly available information is host naming, which lets analysts look past IP addresses like 216.218.185.162 to corresponding (and often revealing) DNS names like ujkwwvftddjk.ru, a recent example from Spamhaus’s DBL. While Zeek’s dns.log closely tracks address-name associations, other logs do not repeat this information. Manually establishing the cross-log linkage can prove tedious since offline resolution of those names generally does not provide accurate results. Instead, one needs to identify historic name lookups that temporally most closely preceded TCP/IP flows to/from resulting IP addresses. (Other approaches, such as leveraging HTTP Host headers, also exist but here we were looking for the most generic approach.)

Zeek’s stateful network-oriented scripting language makes it ideally suited to automate such linkage: we can enrich desired logs with DNS host names in response to network events unfolding in real time. In Corelight’s 1.15 release we provide this ability via the Namecache feature. When enabled, Zeek starts monitoring forward and reverse DNS name lookups and establishes address-name mappings that allow subsequent conn.log entries to include names and the source of the naming (here, DNS A or PTR queries). For analysts requiring immediate access to host names, conn.log now readily provides this information. The following (slightly pruned) log snippet using Zeek’s JSON format shows an example:

{
“ts”:1531868622.082572,
“uid”:”C4J4Th3PJpwUYZZ6gc”,
“id.orig_h”:”192.168.1.113″,
“id.orig_p”:38194,
“id.resp_h”:”192.150.187.12″,
“id.resp_p”:80,
…,
“id.orig_h.name.src”:”DNS_A”,
“id.orig_h.name.vals”:[“christian.local”],
“id.resp_h.name.src”:”DNS_A”,
“Id.resp_h.name.vals”:[“icir.org”]
}

Our data analysis shows that for the most relevant addresses — those outside of local networks — Namecache can establish names in more than 90% of log entries. In addition to the conn.log enrichment the feature adds a separate log, reporting operational statistics (powered by the SumStats framework) such as the cache hit rate in various contexts. Starting with the 1.16 release, you’ll see local vs non-local hit rates for your network as well.

None of the above required patching the core Zeek distribution. All functionality exists in form of new event handlers and state managed via the scripting language. Nevertheless, implementing Namecache posed some interesting technical challenges. Most immediately, Bro’s multiprocessing architecture and flow distribution mean that in a cluster setting (which we do use in our Sensors) the Zeek worker observing a DNS lookup most likely is not the one observing the TCP/IP connection to the resulting IP address. Moreover, since their respective processing is fully asynchronous we also cannot guarantee that processing the DNS query finishes prior to that of the subsequent TCP/IP connection. Finally, to approach global visibility of the address–name mappings, we need to communicate the mappings across the cluster via Bro events, raising questions about event communication patterns, sustainable event rates, and processing races.

One key observation immediately simplified the problem: Zeek writes conn.log entries only when it expires its state for a given flow, i.e., at the very end of the flow’s lifetime. This means we have at least several seconds to propagate naming information for this flow across the cluster before needing to access it.

This left the event flow to tackle. In a first iteration we decided to centralize mapping ownership in the manager process: workers communicate new mappings to the manager process, which propagates additions to other workers and tracks mapping size and age. When mapping state needs to get pruned, the manager sends explicit pruning events to the workers. This proved clearly inferior to a distributed approach where the workers manage mappings autonomously, including expirations, and only communicate new mappings to the manager. The manager in turn only relays additions across the workers, saving the memory needed for an extra copy of the mappings. This approach worked quite well but induced a few percent of packet loss on our most heavily loaded AP-1000 appliances. In a final tweak, we tuned the rate at which workers transmit mapping additions. With this change we no longer observed any operational overhead of the activated Namecache feature while preserving its effectiveness.

The Namecache feature is only one example of a wide range of log enrichments we envision. We’ll soon migrate the cluster communication to the new Broker framework, add improved multicast DNS support, and we’re considering other sources of naming as well as inverse mappings where names get enriched with corresponding IP addresses.