How we decide what Bro capabilities to include in our Sensor

By Seth Hall, Co-Founder & Chief Evangelist at Corelight

We started Corelight to bring the power of Bro network monitoring to an audience that is interested in security, stability, and long-term sustainability. Even though we created and built Bro over the last 20 years, when we developed our commercial product we made some design decisions that make running the Corelight Sensor slightly different from running open-source Bro… changes to improve performance, security, and deployability.  Here we’d like to explain the rationale behind a few of these decisions.

We take a diligent approach to new features on our platform because Bro’s biggest benefit of programmability can also be a liability.  From the beginning, we wanted to expose the full richness of Bro analysis to our customers, but we also chose to limit some functionality at first, because they did not meet the high quality and security standards required by an enterprise-class commercial product.  As we continue along our development roadmap, we’re revisiting these items to find new and better ways to implement these features in our sensors and potentially contribute back to open-source Bro.

Let’s go through a few of these differences, along with some of the extra features that you get with the Corelight sensor that distinguish it from open source Bro.


Users of open source Bro know how to run Bro with broctl (BroControl) because it helps with managing everything from a single process to large multi-system or even multi-location clusters. This works very well in the open-source community where users are familiar with running software at the command line, but we felt that we shouldn’t offer this same interface for our commercial customers, because with it comes a lot of complexity. Our appliance is fully API-driven, and we currently expose that API through two mechanisms: directly from our command-line client (, or over SSH through our terminal-based GUI. Broctl gives our open-source users a great deal of capability, but we took the approach of wrapping most of that functionality in other mechanisms to simplify management for our customers. We are currently exploring how to take what we’ve learned from offering this experience to our customers back into the open-source world because a core mission of the company is to continue improving Bro.

Intelligence (Intel) framework

The Intel framework is used by Bro to read in indicators of compromise (IOCs) from external sources at runtime and do matching deep into network traffic. For example, if you load in an email address, Bro will watch for that email address in, for example, the “emailAddress” object identifier of X.509 certificates used in SSL/TLS session establishment (as well as a number of other places).

The problem is that the way people typically load intelligence data into Bro is with a format that I specified in 2013 when I created the Intel framework which has literal tabs in the file separating fields. It’s like CSV, but using an invisible character! This causes trouble for people who are hand-creating these files, as you can imagine. We left the intelligence integration out of our Bro appliance from the beginning because of the number of problems we’ve seen people encounter with formatting the files correctly. We also noticed that many of our enterprise customers were instead applying their IOCs to the logs in their downstream data analysis system.

Eventually, we will want to cleanly integrate with threat intelligence management platforms where data can be pulled in more quickly and there are no concerns with file format accuracy. In the interim, we are now working on integrating the Intel framework into our appliance with some guardrails. Our platform API will enable you to load in the “normal” Bro intelligence files, but our system will extensively sanity-check them so you get a response immediately from the API if you made a mistake. This should provide a nice mix of people that want to sit as close to the metal of Bro as possible, while still getting the enterprise approach to stability that we press so hard for at Corelight.

Bro scripts and Sandboxing

One of the absolute best features of Bro is the scripting language. It’s the true differentiator between it and many other network monitoring systems. Users can write their own scripts which add new chunks of functionality to Bro, even as far as creating logs that reflect their own environment and unique challenges. For an example of this, take a look at the script Salesforce published for fingerprinting SSL/TLS clients ( The team at Salesforce was driven by their internal needs to understand SSL/TLS usage on their network, and that need resulted in a script that any Bro users can now load.

The problem at Corelight was that the intense customizability provided by Bro could be a liability for our customers because scripts can have adverse effects on the stability and correct behavior of the software. Due to that, our stance on custom scripts was to initially not make them available, but while keeping an eye toward eventually doing so, in a safe and robust manner. Ultimately, after a lot of discussion, we designed a sandbox that the scripts run inside so that you can feel confident that the scripts you are loading on our appliance are safe. The sandbox restricts a number of functions from being used, along with some behaviors that we know to be non-performant. For example, we bar the new_packet event from use due to its potential for causing major performance issues. This trades a small reduction in capability for removing a feature fraught with side effects.

The sandbox took quite a bit longer to create than we initially expected but we felt that rushing it out would fail to live up to the level of quality we strive to always provide. One thing I’ve always found interesting is that sometimes through restrictions, the most creative results arrive. By creating an environment that doesn’t give you every possibility in the world we’ve created a bounding box where creativity and problem solving can thrive.

Specialized Hardware

Bro doesn’t natively support any specialized hardware. Typically, when Bro users have something like a specialized NIC (network card) they take advantage of it through libpcap wrappers that hide the NIC complexity behind an API that Bro does natively support. This tends to work fine, but it doesn’t provide the ability to take advantage of the full specialized capabilities that the NIC provides. At Corelight, we viewed this as our chance to work with a vendor to really push the state of the art in terms of integration. We formed a great partnership with Accolade Technology where we’ve pushed each other to develop ideas for offloading processing and creatively using their NICs in previously unexpected ways.

Our customers benefit from the tight integration with the specialized NIC hardware in two ways.  First, there are performance benefits due to our use of specialized capabilities on the card such as high performance injection of packets into system memory.  Secondly, our customers get to experience the benefit of this non-commodity NIC without having to spend the time understanding it and integrating it into their own deployment.


Bro’s flexibility has given us the ability to create a network monitoring appliance that is truly ready to use “out of the box” but continues to have a number of doors that can be opened for further exploration and modification.  As we move into the future and continue developing Bro and the Corelight Sensor, we will continue our deliberative approach to providing production-ready features with the full programmability of Bro.

Announcing The New Corelight for Splunk App

We’re proud to announce the Corelight for Splunk app is available!  Using the new app (and its associated Technology Add-on (TA)), you can now monitor the health and performance of Corelight Sensors in Splunk and explore the rich data Bro provides through a series of dashboards.

pasted image 0

The Corelight for Splunk App, associated TA, and Q&A page are all on Splunkbase now.

If you’re using open-source Bro and you want to use Corelight’s app, you need to send your Bro logs to Splunk in a streaming format using JSON. To do so, install the json-streaming-logs Bro package using the Bro Package Manager, also directly available via GitHub.

In the next few months, we’ll be publishing more information about the app, including an FAQ and a longer blog post dedicated to highlighting its functionality and benefits.  

In the meantime, let us know if you have any questions or concerns installing or using the new app:

The Corelight Team

Joining a New Company Selling 20 year-old Software

By Brian Dye, Chief Product Officer at Corelight

I’ve enjoyed meeting many companies and leaders in the Bay Area over the past few months. The best surprise I had in doing so was with Corelight (where I recently joined as their chief product officer). Despite many years in security, when they proudly proclaimed “we’re bringing an easier, faster, commercially supported version of Bro to the market” I had to respond with a less than glorious “OK … but what is Bro?”

To find out, the first people I talked to were top incident responders … the ones with battle scars, the SANS trainers, the folks you call when “it” hits the fan. This was my first surprise:  The immediate answer was “of course I know Bro. I use it all the time, even to teach SANS security incident investigations classes.” It turns out, Bro creates a uniquely useful set of insights out of network data; insights that are far richer than NetFlow but far more concise and searchable than a full PCAP. Bro is the “Goldilocks” insight level for security investigations.

Next, I talked to CISOs I especially respected. They knew about Bro too, for how valuable the data was and for what it helped their teams do. These CISOs knew that Bro was taking off as part of the industry focus on improving SOC effectiveness and better arming their investigators. What they didn’t like was that open source Bro was a complex “roll your own” solution and deploying it required expert-level UNIX people, so getting access to the valuable data Bro provides meant taking their (scarce!) talent and putting them on infrastructure management. Those same people were often the very incident responders and threat hunters who should be focusing on defending networks, not installing technology. That is where Corelight comes in: the Corelight Sensor radically simplifies the deployment and operation experience, resulting in a lower hardware and operational cost. The number and caliber of customers signing on with Corelight as we speak is proof positive of that value.

After all that, there was a lingering question in my mind… if Bro is so awesome, why isn’t everyone using it? (While Bro is well known by some, its adoption by enterprises has lagged behind government agencies, universities and web-scale companies). The answer is actually pretty simple: Bro’s capabilities, while critical for 20 years at national labs, intelligence agencies and other organizations with existential threats from determined adversaries, were not needed by typical enterprises in the 1990s and even 2000s. Bro was created before the cloud, mobile, SAAS and high bandwidth links were in common use by “normal” companies. And most companies didn’t have SOCs or the level of security + technical expertise to get Bro working. Obviously, all that has changed — the problems faced by enterprises have grown into the long-extant capabilities of Bro.

My last question was “where can we go from here?” One of the clearest long-term trends in cyber is that better data enables better security. As a result, at Corelight there is a wide range of opportunities to both give organizations new insights and solve existing problems in far better ways. One example, driven by a mindset shift: many organizations wrestle to make their data analysis and investigation as seamless and effective as possible … but they treat the incoming data as immutable.  Corelight proves that it isn’t, and by improving both the quality and structure of that data the entire investigation stack gets better – and we can continue enriching the quality of that data. It reminds me of the old BASF ads … “we don’t make the things you buy, we make the things you buy better.”

This, of course, is just the beginning. I’m excited to join the Corelight team, and can’t wait to show you what we can do for you. Better security starts with better data.

Runtime Options: the Bro Configuration Framework

By Johanna Amann, Senior Engineer at Corelight

If you are familiar with Bro scripts you have probably encountered redefs, which allow you to change a number of Bro settings. One commonly used redef is Site::local_nets, which lists the networks that Bro considers local.

As the name redef implies, redefs allow the re-definition of already defined constants in Bro. This is often done in local.bro (but can be done in any loaded script-file). To modify Site::local_net, you can use code similar to this:

redef Site::local_nets = +={};

A disadvantage of redefs is that these redefinitions can only be performed when Bro first starts. Afterwards, Site::local_nets is just a normal constant and can no longer be modified.

However, it is clearly desirable to be able to change many of the configuration options that Bro offers at runtime. Having to restart Bro causes Bro to lose all connection state and knowledge that it accumulated. To solve this problem, Corelight has created the Bro configuration framework, which allows changing configuration options at runtime. We designed the configuration framework in a way that is easy to use and unobtrusive, while also giving great power and flexibility when needed. Using the configuration framework in your script only requires minimal changes. To declare a configuration option, you just prefix it with the newly introduced option keyword:

module OurModule;
export {
    option known_networks: set[subnet] = {};
    option enable_feature: bool = F;
    option system_name: string = "testsystem";

Options lie in between variables and constants. Like constants, options cannot be assigned to at runtime; trying to manipulate an option will result in an error. However, there are special calls that can be used to modify options at runtime; these are also used internally by the scripts that power the configuration framework; we discuss this further below.

Given those three options defined above, we just need to tell Bro where to find the configuration file. Simply add something akin to this to local.bro:

redef Config::config_files += { "/path/to/config.dat" };

config.dat contains a mapping between the option names and their values:

OurModule::enable_feature  T
OurModule::system_name  prod-1

Now the options are updated automatically each time that config.dat is changed. Additionally, a new log file, config.log contains information about the configuration changes that occurred during runtime.

Behind the scenes, the config framework uses the Bro input framework with a new custom reader. Users familiar with the Bro input framework might be aware that the input framework is usually very strict about the syntax that it requires. This is not true for configuration files: the files need no header lines and either tabs or spaces are accepted as separators.

For more advanced use-cases, it is possible to be notified each time an option changes:

function system_change_handler(ID: string, new_value: string): string
    print fmt("Value changed from system_name to %s", new_value);
    return new_value;
event bro_init()
    Option::set_change_handler("OurModule::system_name", system_change_handler);

This code registers a change handler for the OurModule::system_name option. Each time that the option value is changed, the system_change_handler function will be called before the change is performed. As you might already have deduced from the function signature, the change handler also can change the value before it is finally assigned to the option. This allows, for example, checking of parameters values to reject invalid input. It is also possible to chain together multiple change handlers: Option::set_change_handler takes an optional third argument that can specify a priority for the handlers.

Note that change handlers are also extensively used internally by the configuration framework. If you look at the script level source code of the config framework, you can see that change handlers are used for logging the option changes to config.log.

If you inspect the scripts further, you will also notice that the script-level config framework simply catches events from the Input framework and then calls Option::set to set an option to the new value. If you want to change an option yourself during runtime, you can call Option::set directly from a script.

The following figure shows the data flow and the different components that make up the entirety of the config framework:

unnamed (4).png
You can try the configuration framework today! It has been merged into Bro and will be part of Bro 2.6. To try it, either install Bro from source or install one of the nightly builds.

That’s a Wrap! The Bay Area’s First Open-Source Bro Meetup

By John Gamble, Director of Marketing at Corelight

Last Tuesday Corelight hosted the Bay Area’s first meetup for the open-source Bro network security monitor and we saw a great turnout of Bro fanatics and first-timers alike at our San Francisco headquarters.

Meetup attendees mingled over pizza, salad and drinks before Vern Paxson, the creator of Bro, kicked off the discussion, followed by engaging Bro lightning talks by Aashish Sharma of Lawrence Berkeley National Laboratory (Berkeley Lab) and Seth Hall, a core contributor to the open-source project.

Notably, all three individuals are members of the Bro Leadership team.  

Aashish walked the audience through Berkeley Lab’s network architecture and showed how Bro plays a critical role, providing them with network insights for cybersecurity. They have had Bro running in their environment since 1996!

bromeetup 1
Aashish explaining Berkeley Lab’s network architecture

Aashish observed that security vendors and incident responders tend to focus on specific threat indicators, but vendor alerts don’t usually explain WHY they fired, leaving the analyst to fill in the gap as part of a lengthy investigation.  He urged attendees to evolve from an indicator-centric detection approach to a more attack-centric approach that attempts to identify malicious behaviors at every step of the attack, from scanning to data exfiltration and misuse. 

“Bro allows us to design attack-centric detections,” Aashish said. He used the example of a phishing attack to show how Bro can see every step of the attack as it passes through the network, from the URL click to the phishing form, to the victim’s entry of stolen credentials. With corresponding Bro detection scripts, you could alert at every stage of the attack and light it up “like a Christmas tree”. Aashish closed his talk by calling on the Bro community to share more intel and best practices, including attacker M.O.s and methods so that we can collaboratively develop more effective Bro detection scripts.

Seth Hall’s lightning talk covered the use of flame graphs to analyze Bro performance and resource utilization trends and anomalies that are not readily apparent from looking at the logs alone. Flame graphs are an open-source visualization tool developed by Brendan Gregg (currently at Netflix) and they can help identify the most frequent code-paths of an analyzed piece of software.  

bromeetup 2
Seth walking through a flame graph!

Seth remarked that “real traffic is never like sampled PCAP…there is always something to surprise you” and showed attendees a number of flame graphs he produced of Bro processes running on real network traffic in production.

Seth showed an eye-catching plateau in one flame graph that revealed a particular Bro process behaving abnormally by spending 80% of its execution time in a single function. When he dug into the issue he said he realized a set of tables were filling up, causing this issue, and was able to successfully troubleshoot it.

Corelight has made strong commitments to supporting and promoting the open-source Bro project and community: we’re a sponsor of the project and recently hired our first employee whose sole responsibility is open-source project development. You can learn more about Bro at and sign-up for the mailing lists there to get in touch with other Bro enthusiasts and experts.

If you’re in the Bay Area, I’d encourage you to join our open source Bro group: and attend the next meetup event!

Extensibility as a Guiding Principle

By Christian Kreibich, Senior Engineer at Corelight

If you’ve ever used Bro, you’ve likely noticed that it’s rather more flexible than other network monitoring solutions. This is not coincidence — it reflects a core principle that has underpinned the evolution of the Bro platform since its beginnings two decades ago. This principle has afforded users a wealth of benefits that continue to shape today’s product vision here at Corelight.

Let’s start with Bro’s basic design. It uses a traffic-parsing core to feed protocol events into its built-in script interpreter, which separates mechanism (the nitty-gritty parsing of traffic) from policy (what to do in response to observed activity). These events (450+ different types) cover a vast semantic range and span the entire protocol stack. Combined with the Bro scripting language — designed specifically to simplify network-typical compute and state-keeping tasks (a key difference to other popular languages such as Lua) — extensibility is not just a feature: it defines the system. Let that sink in for a moment: while Bro is a fantastic network flight recorder (consider the breadth and depth of the traffic logs the Corelight Sensor produces), this log production is merely one configuration of the system. Behavioral profiling of your end-hosts? Check.

Arbitrary cross-flow/protocol state-keeping? Check. Lateral movement? Check. The system provides the building blocks, you provide the analysis — either in real-time on the appliance or in the form of actionable data in your broader analytics pipeline.
Extensibility doesn’t stop at Bro’s core design. It’s designed from the ground up to support clustering at the process and machine levels and features a powerful communication and data-persistence infrastructure to scale your deployment as your network grows.

Your needs go beyond those 450 event types? Bro has you covered. Its plugin architecture supports adding compiled code that allows you to add new functionality — for example a new protocol analyzer or packet source — on your own and without ever needing to patch the Bro source tree. No more worrying about release cycles, licensing, or development workflow. The core is designed to support extensibility. At the script level, you can derive new event types as needed. Concerned about managing those extensions? The Bro package manager makes it just as easy to maintain your in-house Bro feature set as it is to manage other distributions in your infrastructure.

What about integrating Bro with the rest of your infrastructure? BroControl provides a handy remote control for your Bro installation and supports plugins. On the network side, the NetControl framework provides a wide range of connectors to mesh Bro into your enforcement infrastructure. Naturally, the framework is designed for extensibility so you can add additional connectors with ease. Thinking of leveraging your existing threat intel feeds in Bro? The input framework makes it easy. Your file processing pipeline to check whether those PDFs are malicious? Bro’s file analysis framework feeds right into it. Finally, Bro’s forthcoming osquery integration allows it to include host-based events.

As a Corelight customer, you benefit from all of these features: we’re committed to running open-source Bro on our Sensors. We’ve streamlined custom script installation via our APIs, taken care of data export to Splunk, Kafka, or your favorite SIEM, and taken the hassle out of managing and tuning fast packet analysis pipelines for you. To ensure stability, the Sensor doesn’t yet expose all of Bro’s functionality — for example, customers cannot currently deploy their own plugins — but we’re aiming for feature convergence over time.

To us, extensibility is not an afterthought that we try to tuck on in a few release cycles. It permeates the way we think about network monitoring and has enabled scalability, visibility, profiling, learning, and detections battle-tested over two decades of real-world use in some of the world’s most demanding network environments.

Finding Very Damaging Needles in Very Large Haystacks

By Vern Paxson, Chief Scientist at Corelight

Some of the most costly security compromises that enterprises suffer manifest as tiny trickles of behavior hidden within an ocean of other site activity.  Finding such incidents, and unraveling their full scope once detected, requires far-ranging network visibility, such as provided by Corelight Sensors, or, more broadly, the open-source Bro system at the heart of our appliances.

Wading through vast amounts of data to find genuine threats can prove intractable without aids for accelerating the process. An important technique for enabling detection is the development of highly accurate algorithms that can automate much of the task.  In this post, I sketch one such algorithm that I recently worked on at UC Berkeley along with two Ph.D. students, a member of the security team at the Lawrence Berkeley National Laboratory (LBL), and a fellow UCB computer science professor.  The research concerns detection of spearphishing and we published it last month at the USENIX Security Symposium, where it won both a Distinguished Paper award and this year’s Internet Defense Prize, a $100,000 award sponsored by Facebook.

As you probably know, spearphishing attacks are a form of social engineering where an attacker manually crafts a fake message (sent via email or social networking) targeting a specific victim.  The message often includes carefully-researched details that improve the likelihood that the victim will believe the message is legitimate, when in fact it isn’t.  We call this facet of spearphishing the lure.  The message entices the victim to take some unsafe action (the exploit), such as providing login credentials to a fake web page posing as an important site (e.g., corporate GMail), opening a malicious attachment, or wiring money to a third party.

In our work, we collaborated closely with the cybersecurity team at LBL to tackle the problem of detecting spearphishing attacks that result in victims entering their credentials into fake web pages.  The Lab – an enterprise with thousands of users – maintains rich and extensive logs of past network activity generated by Bro, and for real-time detection operates numerous Bro instances. (I worked at LBL when I first developed Bro in the mid-1990s, and the Lab has been a key user of it ever since.)
For our recent work, we drew upon LBL’s Bro logs of 370 million emails, along with all HTTP traffic transiting their border, and LDAP logs recording the authentications of Lab users to the corporate Gmail service.  (To get a sense of the richness of Bro data, check out the information it provides for SMTP and HTTP.)  We were also able to cross-reference with their security incident database to assess the accuracy of the different approaches we explored and developed.  All of this data spanned 4 years of activity.

The key idea we leveraged was to extract all of the URLs seen in incoming emails and then look for later fetches of those URLs by Lab users.  Such a pattern of activity matches that of a common type of credential spearphishing, where the lure is a forged email seemingly from a trusted party.  The lure exhorts the recipient to follow a link to take some sort of (urgent) action.  However, this activity pattern also matches an enormous volume of benign activity too.  Thus, the art for successfully performing such detection is to find ways to greatly winnow down the raw set of activity matching the behavioral pattern to a much smaller set that an analyst can feasibly assess – without discarding any actual attacks during the winnowing.

In previous projects, I’ve likewise tackled some needle-in-haystack problems.  I worked with students and colleagues on developing detectors for surreptitious communication over DNS queries and for attacks broadly and stealthily distributed across many source machines.  From these efforts, as well as this new effort on detecting spearphishing, several high-level themes have emerged.

First, it is very difficult to apply machine learning to these problems.  The most natural machine learning techniques to use “supervised” ML, require labeled data to work from.  For spearphishing, this would be examples of both benign email+click instances and malicious ones, which comprise two separate classes. The ML would then analyze a number of features associated with email+click instances to find an effective classifier that, when given a new instance, uses that instance’s features to accurately associate it with either the benign or the malicious class.

While supervised ML can prove very effective for some problem domains, such as detecting spam emails based on their contents, for needle-in-haystack problems it runs into major difficulties due to the enormous “class imbalance”.  For the spearphishing problem, for example, we can provide an ML algorithm with 370 million examples of benign email+click instances, but fewer than 20 malicious instances, since such attacks only very rarely succeed at LBL.  In these situations, the ML will very often overfit to the class with very few members, emphasizing features among its instances (such as the specific names used in emails) that have no general power.

Detectors for highly rare attacks, whether or not based on ML, face another problem concerning the base rate of the attacks.  A simple way to illustrate this is to consider a detector for email-based spearphishing that has a false positive rate of only one-in-1,000 (0.1%).  When processing 370 million emails, this seemingly highly accurate detector will generate 370,000 false positives, completely overwhelming the security analysts with bogus alerts.

In working on past needle-in-haystack problems, I’ve come to appreciate the power of (1) not trying to solve the whole problem, but rather finding an apt subset to focus on, (2) devising extensive filtering stages to reduce the enormous volume of raw data to a much smaller collection that will still contain pretty much all of the instances of the (apt subset of the) activity we’re trying to detect, and (3) aiming not for 100% detection, but instead to present the analyst with high-quality “leads” to further investigate.

For our spearphishing work, the subset of the problem we went after was email-based attacks that involve duping the target into clicking on a URL, and for which the target indeed did wind up clicking.  We framed our approach around an analyst “budget” of 10 alerts per day.  That is, on average, on any given day our detector will not produce more than that many alerts.  We set the number of alerts to 10 because the LBL security staff deals with a couple hundred monitoring alerts per day, so getting 10 more does not add an appreciable burden – assuming that alerts that are false positives are cheap enough to deal with, a point I return to below.
We then identified a set of features to associate with emails containing URLs and any subsequent clicks on those URLs.  This part took a great deal of exploration – indeed, the entire project spanned two years of effort.

For the most part, the features draw upon the site’s history of activity.  For example, for emails one feature is for how many days in the past a given email From name (e.g., “Vern Paxson”) was seen together with a given From address (e.g., “<>”).  For clicked URLs, one example of a feature is how many clicks the domain that hosts the URL received prior to the arrival of the associated email.  To detect spearphishing sent from already-compromised site accounts, we also analyze LDAP logs to correlate emails with the preceding corporate Gmail authentication used by the account that sent the email, drawing upon features such as how many of the site’s employees have previously authenticated from an IP address located in the same city as was used this time.

We wound up identifying eight such features (though it turns out we don’t use all of them together).  For each email+click instance, we compute the values of the relevant features and score the combination in terms of how many previous instances were strictly less anomalous than it (i.e., had more benign values for every one of the features).  Our detector then flags the instances with the highest such scores for the analyst to investigate further, staying within the budget of an average of no more than 10 alerts per day.

The detector has proven to be extremely accurate.  In our evaluation of 4 years of activity, it found 15 of the 17 known attacks in LBL’s incident database.  In addition, it found 2 attacks previously unknown to the site.  It achieved this with a false positive rate less than 0.005%.

Finally – and this is critical – we measured how long it takes an analyst to deal with a false positive, and it turns out that the vast majority can be discarded in just a few seconds.  This is because, for most of the false positives, it’s immediately clear just from their Subject line or their sender that surely they do not represent a carefully crafted spearphish.  An attacker will not dupe a user into clicking on a link and typing in their credentials using a Subject line such as “DesignSpark – Boot Linux in a second” (an actual example from our study).  As soon as an analyst scans that Subject line, they can discard the alert from further consideration.  As a result, it typically takes an analyst only a minute or two per day to deal with the alerts from our detector.

In summary, our work showed that we can make significant strides towards combating spearphishing attacks by (1) cross-correlating different forms of network activity (URLs seen in emails, subsequent clicks on those seen in HTTP traffic, LDAP authentication) in order to (2) find activity that we deem suspicious because, for a carefully engineered set of features, the activity manifests a constellation of values rarely seen in historical data.  We also, crucially, (3) aim not for 100% perfect detection, but to present a site’s analyst with a manageable volume of high-quality alerts to then further investigate, and (4) find that these investigations take very little time if the alert is a false positive.

This sort of detection underscores some of the major benefits Bro can provide: illuminating disparate forms of network activity, producing rich data streams that sites can archive for later examination and enabling analysts to zero-in on problematic behavior.

Another cool thing about Bro: tracking files!

By Vincent Stoffer, Director of  Customer Solutions at Corelight

You probably know that Bro generates real-time data about network flows, highly valued by threat hunters & incident responders around the world.  But Bro can do a lot more, and in this blog series, we’ll highlight lesser-known features from time to time.

Today: tracking files!


First the problem statement: how do you monitor the files that go back and forth across your network? Of course, there are logs for some of your enterprise services, and maybe you’re getting info in the form of URLs or hashes from your proxies or other security tools…but what about everything else?  If you were given the hash of a file that you knew was malicious, how would you figure out if it had ever been on your network? What if that file never triggered an alert or system log?

Visibility into all files – not just network flows – is a powerful, under-appreciated feature of open-source Bro. Bro’s file analysis capabilities are pretty amazing, and the data it captures is a great resource for detection, response, and prevention.
Here’s how the feature works: whenever a file is transferred over the network using a protocol that Bro knows about, the file is tracked, hashes are created, and detailed data is logged to the file and its associated connections.

As an example, here’s a visit to the Slashdot web page by a browser, including an AJAX post as recorded by Bro’s files.log:

This is all part of a single HTTP connection and includes the HTML, favicon, some plain text, plus the JSON.  All these components have been recorded in the Bro logs with a number of important details:

  • The first field is the UNIX timestamp.  The Corelight Sensor outputs this in a standard ISO 8601 date/time format and it’s super precise, helping to pinpoint exactly when a specific event happened in regards to a file.

  • The second field is the file UID.  This is a unique ID/string generated per file seen.  You can reference this to look up other connections which transferred the exact same file.
  • Third and fourth fields are the transmit and receive hosts for the file.
  • The fifth field is a list of all the connections UIDs which this file was transferred over, often it’s just one but it could be part of a series of connections.  This same UID can be used to track an individual connection across any of Bro’s logs.
  • The sixth field is the protocol source that Bro’s analyzers saw the file and extracted it from.

A few other interesting fields include file type, file name (if available), byte counts of various types, calculation of the entropy of the file, and hashes – MD5, SHA1, and optionally on the Corelight Sensor, SHA256.

Without delving into all the details of what’s available from Bro’s file analyzer, we see that a whole lot of actionable info is created for each of these files.  Remember that this same detail is recorded for EVERY FILE on your network.  And even better, it doesn’t matter what protocol the file was transferred over… as long as Bro can decode it, the file can be extracted – that includes HTTP, SMTP, FTP, IRC, SMB, etc.  In fact, Bro has 50 protocol analyzers. You can perform indicator matching and hunting across everything from web traffic to email attachments.

That’s an amazing amount of data, and as an incident responder, I relied heavily on the files log to help paint a picture of what might have happened for a particular event or series of file transfers.

But what if you need more than just the derived data about the transfer?  The Corelight Sensor can also extract all of the associated files and export them to a file server.  You can leave them for future investigations, and plumb them into a static or dynamic analysis pipeline – providing not just data about the connection and transfer but indicators and data extracted from the file itself.

Bro doesn’t stop there. The same level of forensic detail is available for individual protocols as well…we’ll get into some of the other logs in a future blog post.  
Do you have some unique ways you use the files.log or questions about how it could help your security team?  Drop us a line –

Securing the Corelight Sensor

By Steve Smoot, VP Customer Success @ Corelight

Have you ever considered how security tools can be a source of risk? They process untrusted data 24/7, have access to sensitive flows, and (like everything on the Internet) can be exploited if not patched regularly.  

At Corelight, we want our products to be a source of visibility and insight, not risk, and we’ve done a lot of thinking about how to secure them. In my first post, I’d like to take the opportunity to explain some of the techniques we use.

We work hard to limit the attack surface for each sensor.

Except for initial login over a physical connection, all access is disabled by default. Even after initial configuration, access is limited to SSH, HTTPS or the Corelight API (which uses TLS), according to the options you choose.  Each of these modes is password/key protected, and all disk volumes are encrypted.

Corelight’s user environment is sandboxed from the execution environment, and we use features in the Linux kernel to provide security isolation between major functional components. We don’t run Bro as root, a habit we’ve seen in many open-source deployments. Scripts run in a separate, security-isolated environment from Bro itself.  And we’ve implemented dozens of additional features for your protection, features we prefer to keep under the hood.

Third-party software requires special care, and we monitor and address vulnerabilities actively.  Fortunately, most CVEs don’t even apply to our products, because we strive to minimize the number of packages used.  In the spirit of transparency, we list resolved CVEs on the Corelight support site and announce high-profile CVEs on the public web site.  In addition, we call out high-profile but inapplicable CVEs on the customer support for extra assurance (i.e., the ones that hit the news, but we’re not vulnerable to).

Everyone knows that unpatched systems give rise to many breaches, and automatic updates are a real boon to operational security. The Corelight Sensor can of course help you find those unpatched systems on your network, but we’ve also made automatic updates simple and painless.  In fact, we default to automatically updating our software when new releases are available. Although we give customers configuration knobs to control when to apply updates, most of them choose to update immediately after each new release.  That’s good!

We also strive to be transparent.  It’s really disappointing when a partner you trust hides a potential problem from you. Responsible disclosure requires a careful balancing act, but it’s important for us to be as transparent as we can about risk – hence we will be posting to this blog and other customer-alerting mechanisms we employ including regular email updates / security bulletins.

Finally, we set high standards internally to protect our development environments and to shield you from third-party attacks or undesired leakage of information through your relationship with Corelight.  We’ll itemize those internal standards like code signing, peer review, security best practices for internal infrastructure, etc in an upcoming blog post. Stay tuned.

Please rest assured, we’re thinking about how to improve the security of our products constantly.

What’s the riskiest part of your Bro deployment? It may be you.

By Seth Hall, Co-founder & Chief Evangelist at Corelight

Don’t overlook the obvious: the answer may be you 😉

Let me explain, because I’ve watched the following story unfold many times.  A curious person gets super excited about Bro, deploys it widely in their organization, and makes a big impact on the local SOC.  Everyone on that team becomes more effective, because Bro data helps them understand and respond to security incidents so much faster. Over time, this Bro advocate becomes the local Bro expert – responsible for configuration, tuning, documentation, patching, integration, etc.  It’s a full-time job.  And that’s OK, until the local Bro expert is hired away for the experience he or she just acquired!

It happens. Just think of all the skills that person gained along the way: about Bro itself, specialized network cards, BIOS/UEFI firmware options, network stack tuning, file systems, memory allocators, etc.  This means your ‘local Bro expert’ is an asset but also a risk.  Because so many companies are looking for Bro experts now, it cuts both ways.  I’ve seen wonderful Bro deployments fall into disrepair when a key person leaves.

In fact, it was watching that pattern unfold several times that led us to develop the enterprise ready, turn-key Corelight Sensor about 18 months ago because we had identified that just creating Bro wasn’t quite enough.  We sweated many, many details so that customers could confidently deploy Bro in less than 30 minutes, focusing effort on incident response, forensics, and threat hunting.

As a very small example of the tiny details we take care of for you on the Corelight Sensor and because I’d like to provide some useful tidbit for people running Bro on their own, I’d like to finish the post with a note about using tcmalloc.  Tcmalloc is an alternative memory allocator that was originally created by Google as part of their Google Perftools package for memory debugging.  The package has since been renamed to gperftools (found here: and is no longer officially maintained by Google. It’s intended to perform especially well in multithreaded applications and it has a number of other tweaks that make it an appealing choice as a memory allocator.  A number of years ago we discovered that Bro performs noticeably better when tcmalloc is the memory allocator.  This led to a change in the build system to use tcmalloc by default on Linux if it is discovered.  Bro has been doing this for a long time but we’ve never publicly told everyone that they should be using it.

You should use whatever package system your OS uses to install gperftools and tcmalloc.  On CentOS, it’s named “gperftools” and on Ubuntu it’s named “google-perftools”.  After you install the package, you will want to reconfigure Bro with whatever configure arguments you used previously.  If tcmalloc was found, you will see the following toward the end of the configure output:


If it show that gperftools is found and tcmalloc is found then you’re all set to build and reinstall.  If you’ve had trouble getting rid of the last few percentage points of packet loss in your own Bro deployment, this easy change could possibly get rid of it right away!  As you remove more and more of these small problems and Bro’s output becomes better, all of your downstream analysis is improved.  Better data in equals better data out.

On the Corelight Sensor we are already using tcmalloc along with many other specialized configurations and an accelerated FPGA network card.  This is all maintained and updated with zero effort from you so that you can focus on data and discovering intrusions.
And that’s just one example of how you’re covered if your Bro expert disappears one day.