Moar and Moar Agents – sthap!

July 27, 2019 in EDR, Preaching

$Vendors love agents.

  • One does the AV
  • One does the DFIR
  • One does the EDR
  • One does the CIDS
  • One does the DLP
  • One does the FIM
  • One does the IAM
  • One does the SSO
  • One does the Event Forwarding
  • One does the Asset Inventory
  • One does the Client Proxy
  • One does the Managed Updates
  • One does the Vulnerability Management
  • One does the Employee Monitoring, on demand
  • One does the Conferencing
  • etc.

Some claim they are agent-less, but under the hood they use WMI, psexec, GPO, SCCM, etc.

Every single agent adds to a list of events that are generated and collected by a system/and often other agents. Every single one steals CPU, RAM, HDD cycles. Almost every single agent runs other programs. Almost every single agent works by spawning multiple processes at regular intervals. Almost every agent that is noisy renders all Mitre Att&ck’s Discovery tactic detections useless.

A quick digression: I used to have a work laptop with 4GB RAM. At least once a day my work would come to a halt. I always had Outlook, Chrome, and Microsoft Teams opened. At that special time of a day an agent would kick off its work and my computer’s CPU/RAM usage would jump to 100%. I couldn’t switch between apps, and literally had to wait each time for good 5-10 minutes for the agent to stop, before I could resume my work.

This has to sthap.

We all know that we need that Magic Unicorn single-vendor solution that works for Win/OSX/Lin + offers AV+EDR+DFIR+FIM+DLP+CIDS+VM+SSO+IAM in one + uses minimum resources + is cheap :). Atm all of these features are typically addressed by solutions from different vendors & the moar of them make a claim to your box, the worse the performance will be.

Let me focus on EDR here for a moment as they ARE one of the worse resource hogs, especially these ‘solutions’ that rely on polling. IMHO tools that primarily use this approach to collect data have to go, and pronto & I would personally never (re-)invest in them; polling is not only very 2011, but it literally misses stuff, adds a lot of stress to the endpoint, data synchronization and accuracy are questionable, and so on and so forth — ah, and these solutions often piss off analysts a lot – it’s so often that they want to do triage the system & they can’t, cuz the system is offline.

To elaborate on the ‘synchronization and accuracy ‘ bit:

  • system offline or on a different network –> no data accessible at all –> delays in triage/analysis
  • if you are doing env sweeps, you end up polling a few times to ensure you collect data from ‘all’ systems; the ‘all’ is just a wishful thinking — you have no control over it; also, as a result, some systems that are always online end up being polled more than once (resources wasted)
  • datasets are not synchronized & you got duplicates since you will get a few batches with different timestamps

So… IMHO polling will always give you an imperfect data to work with; it just doesn’t work in a field that is so close to Digital Forensics + doesn’t help to answer questions that will be asked by management:

  • how many systems in our env have this and that artifact present? you will never be able to answer with a 100% certainty
  • is our env. clean? yeah, right… 75% of boxes replied to our query with a negative result, others didn’t, so… we are 75% clean

Plus, they often rely on third party/OS binaries to do the job + are often using interpreted language (slow, cuz interpreters are often executed as external programs that add to the event noise, especially the ‘Process creation’ event pool).

What I find the most hilarious is the fact actual malware can squeeze in system info collection, password grabbing, screen grabbing, video recording, vnc modules, shell, etc in <100KB of code; most of vendors use RAD, Java, scripts and end up with awful bloatware.

What I am trying to say is that EDR tools that are worth looking at are:

  • tools that integrate with the OS on the lowest possible level — AV is integrating on a low-level for a reason (also, look at Sysmon)
  • collect all real-time events
  • send data off the box ASAP (any data stored on the box can be compromised/deleted/modified)
  • send data out by any means necessary (multiple protocols?)
  • send stuff to cloud anytime box goes online (no matter what network)
  • use native code (machine code) for main event collector modules instead of interpreted language –> performance / minimal footprint
  • single service process (supported by kernel driver, when necessary) instead of multiple processes
  • doesn’t spawn other processes — native code-based modules collect data as per need, loaded as DLL or always present (the interception of events is a code that can be VERY lean; the bulkier the code, the crappier the solution; red flags: .NET, Java, Powershell, VBScript, Python, WMI, psexec, etc.)
  • run queries on data / analyze outside of the endpoint

Basically: the agent intercepts, collects, caches, sends out to cloud when any network is available & asap, then sleeps until the next event occurs.

Of course, the solution may have extra modes for deploying heavy-weight stuff e.g. scripts, DFIR modules (memory dumping, artifacts collection, etc) + prevention modules etc., but this is used only during actual analysis, not triage.

So, what I covered is a basic architectural requirement:

  • An agent acts as a event forwarder ONLY & sends them to a Collector + can launch heavy ‘forensic’ modules / programs as per necessity
    • Events that are collected must be configurable, ideally (pre-processing –> less events –> better performance/less storage/less bandwidth)
  • Collector acts as a repository of events
    • Just store & index
    • Perhaps apply some generic out of the box rules/tests (VT, vendors’s IOCs, yara, etc.) and trigger alerts
  • Console allows to query Collector events, set up watch lists, manage rulesets, etc.

Coming back to agents as a whole — it’s time for some consolidation to happen… As usual, big players will be the winners as only they can afford to acquire and integrate.

Comments are closed.