Crosspost: [Not Really About] Top 10 SIEM Log Sources in Real Life

This was originally posted at Anton on Security blog.

One of the most common questions I received in my nalyst years of covering SIEM and other security monitoring technologies was “what data sources to integrate into my SIEM first?”

And of course the only honest answer to this question is: it depends on your security monitoring use cases and how you prioritize them.Naturally, some people then ask “ok, so then what are my use cases?” (and then there are these challenges too). Finally, perhaps in this paper, we made a list of popular log sources aggregated from many organizations. Admittedly, the list may end up being useless for organizations with different security needs and challenges.

Joking aside, big organizations often make the decision to integrate a log source into their SIEM / UEBA based on factors other than the pure security necessity.

Overall, such factors may include:

  • Necessity for detection
  • Necessity for alert triage and incident response
  • Necessity as context data for another log source
  • Compliance requirements to collect and retain this log type
  • Compliance requirements to monitor this data source and/or system
  • Ease of integration of the log source
  • Parser availability from the vendor
  • Ability to actually transfer the log data to a SIEM
  • Other planned log sources that compete for attention
  • Data volume of the log source

And of course for users of those sad SIEM products that charge per gigabyte or EPS [oh… wait … this is still almost everybody! :-)], the cost of introducing a new data source into the platform may be one of the BIG deciding factors.

Be honest: will you include a data source that will eat up 10% of your overall SIEM license if you only plan to use it as context — valuable though it may be — for another data source? Namely, if you don’t plan to write any detection rules or other logic based on this telemetry (DHCP being my favorite example here — how many detections rely solely on DHCP logs? None or very few at most).

As a result, my experience with SIEM deployments (going back to 2002, if you are curious) taught me that few people will include DNS or DHCP logs during their initial phases of SIEM roll-out. In fact, some will never include them in their SIEM! When asked why, those people explain that while they are convinced of the general utility of DNS logs, they do not see much value in each individual message that costs money to collect. And there are so many of those messages! Over the years, I’ve usually called them “sparse value logs” where the value is in getting the bulk rather than in getting some particularly valuable messages like say Windows Security Event ID 1102 …

As a result, SIEM operators have doubts about paying for inclusion of this data into their SIEM. The same doubt occasionally appears even for firewall logs, netflow records and many other high volume sources. Thus, web proxy logs, netflow, DNS, DHCP historically ended up in few SIEMs. I recall a client story from a few years back where adding web proxy logs would have 3X’d the volume of log data flowing into a SIEM. That is, web proxy logs were twice the volume of all other logs they collected.

Even more so, very few people will toss all EDR telemetry into a SIEM, and usually limit themselves to EDR alerts. Admittedly, sysmon records are becoming more popular, but perhaps more so in “free” Elastic vs paid SIEM (and this will still cost you in either hardware or public cloud costs — sometimes eye-watering cloud costs at that).

In fact, this gave rise to an architecture where one product is used for high-value logs while another product augments it by storing more voluminous logs. However, such as architectures usually have no technical merit and bring up complexity and fragmentation and thus fragility. They do work if there are good APIs in the products (such as to query one telemetry repository from another), but it is useful to remember that they do not offer advantages other than cost.

To summarize, in some perfect world I want to make log integration decision based ONLY on the value of such logs to my security goals and, specifically, use cases. However, today’s “popular” licensing models make this very hard.

Let’s change something!

Agreed that SIEM vendors need to encourage thorough data collection in their pricing... SIEM vendors also need to be prescriptive about how they’re adding value to incoming data sources. Making EDR telemetry searchable is cool, but often prohibitively expensive for the value (hunting suspicious execution/persistence). Leaving the expertise onus on the customer = disconnect to their needs. Same reason DNS/DHCP wasn’t fed in until the advent of user/entity behavior analytics.

Amen...but if it isn't the cost of ingressed data per timeunit, it is the case of cost of log data per retention time...logging will always cost a fair bit of dirhams and we need to find a way to justify that investment against whatever reasons we ha e to do or not to do: security, compliance, etc etc etc...maybe rollback a bit and start building those use-cases, maybe use fancyness like ATT&CK matrices to map out what we are interested in and build upon that....before you know it you have a decent log solution rolled out with dashboards & alerts for the stuff you should care about...and it doesn't have to cost your leg and arm to get there...so yeah, totally agree but it's a bumpy ride to get there from where we are today...

To view or add a comment, sign in

More articles by Anton Chuvakin

Explore content categories