Blue teaming – it’s DATa complicated…

A decade ago blue teaming was … easy (this is a really bad joke, I know!).

In fairness, we had less targets, less programming languages to deal with, less platforms, less architectures, consoles, less … of everything…

In 2023 the life of a SOC/CERT person is a nightmare.. In this Twitter thread I tried to summarize the state of the affairs when it comes to data that comes our way… in many forms…

It comes in a binary form, it comes in a textual form, using a variety of data formats, data encodings, encryption schemes, protocol-driven encapsulations, languages of telemetry, languages of defense, languages of offense, hidden, manipulative and driving us both nuts and making us all loving it…

There are so many forms in which information arrives to us today:

  • assembly: x86, x64, arm, sparc, ppc, IoC-specific
  • bytecode: IL, python, java, autoit, nullsoft, inno
  • actual executables: PE, ELF, COM, SYS, DRV, OCX, DLL
  • archives/images: ZIP, TAR, GZ, RAR, 7z, Xz, Bzip2, KGB, ARJ, LHA, ISO, BIN, NRG, DMG, PKG, RPM, DEB, MSI, DLL, OVR, VMDK
  • macros: VBA, OpenOffice BASIC
  • c, cpp, C#, other .NET languages, vb, delphi, rust, go, nim
  • scripts: bat, vbs, js, applescript, mof, idc, idl, rc, bash, powershell
  • encrypted scripts: jse, vbe
  • web scripts: php, perl, asp, jsp
  • python (IDAPython), perl, ruby, winbatch, autoit
  • exotic malware files: fas (AutoDesk/AutoCAD)
  • autorun scripts: autoruns.inf
  • Sigma
  • SPL
  • KQL
  • AQL
  • PowerQuery
  • Linq
  • SQL (including cache files)
  • Yara (*.yar, *.yara)
  • Detect It Easy
  • Snort
  • ClamAV
  • Tanium Signals
  • Synapse Storm
  • Sublime Security email rules language
  • R
  • pseudo-code (IDA, Ghidra, etc.)
  • config files: ini, yaml, linux config files (/etc/*), program-specific config files (too many to list)
  • event logs: evt, evtx
  • URL shortcuts: url
  • binary shortcuts: lnk files
  • data formats: sql, csv, tsv, json, xml
  • plug-ins: from total commander, nmap, burp, windbg, notepad++, xdbg, etc. to regripper, kape, plaso, etc.
  • network dumps: pcap
  • files using character encoding: ascii, utf7, utf8, utf16, utf32, ebcdic, KOI etc.
  • files and streams using data encodings: base64, Ascii85, uuencode, etc.
  • message encodings: mime
  • memory dumps: raw, core, dmp (per process and full-physical)
  • highlight files: uew, tmLanguage, bt
  • registry files: .reg
  • quarantined files
  • EDR logs in many formats, offering different level of telemetry
  • web logs (f.ex. both HTTP and HTTPS)
  • mail logs
  • mailbox files (ost, pst, mbox, msg, eml)
  • (S)ftp logs
  • aws CloudTrail logs
  • aws GuardDuty logs
  • command line syntax: lin, win, mac
  • ‘randomly accessible (per company)’ feeds: f.ex. jamf
  • proprietary and less-known log streams (msad, ossec, SaaS, FIM, etc.)
  • browser extensions: xpi, crx
  • microsoft / office files (rtf, doc*, xls*, ppt*, pps*, one, mdb, accdb)