Filighting (FIle highLIGHTING) is a proof of concept idea that I implemented in perl as a naive clustering and data reduction algorithm modeled on the way software is built on Windows platform.
TL;DR; The algo is as follows:
- enumerate all the files in a directory
- read all the files one by one and try to see if any of them contain actual references to other files
- cross-reference these
- profit
Yup. It’s that simple.
How Windows Software is built?
Windows software can be built in many ways, using various programming languages, platforms and frameworks.
For the purpose of this post we will focus on the most typical software packages that contain a couple of components:
- Main program file – the actual program – portable executable (.exe)
- Additional executable files – typically libraries, but sometimes other .exe and kernel mode drivers (.exe, .dll, .sys, .ocx, etc.)
- Localization/Language files (e.g. .lng, .mui, etc.)
- Configuration files (.cfg, etc.)
- Templates (.template, .theme, etc.)
- Databases (.db, .sql, etc.)
- Readme files, Help files (.txt, .hlp, etc.)
- GFX files (.jpg, .png, etc.)
- Plug-ins (.dll, etc.)
- and whatever else that is required for the program to offer some functionality
- + Registry entries (which I skip in this post)
Notably, there are programs that are basically a single executable – many OS programs used to be just simple .exe f.ex. Notepad.exe, or Calc.exe. In newer versions of Windows they rely on additional localization files ( .mui), or are just links to other programs f.ex. Calc.exe on Windows 10 linking to a Metro application. While the programs that are just single executables are not the focus of this post, they certainly could be highlighted as possible ‘orphans’ by the very same algorithm, or its spin-offs.
Okay, what can we do with this knowledge?
Knowing that software contains many files gives us a hint that there must be some links between them all that are somehow established during the compilation, installation, or program use phases.
- The building process may compile hardcoded file names into the final main program file and/or its libraries, configuration files, etc..
- The installation program drops the files in respective folders and creates configuration files, registry entries, etc.
- The program use is the activity that user or application performs and it affects how the files are created, added, modified, etc.
While it is hard to keep a track of it all, it certainly makes sense to try to imagine these interconnections and attempt to create a hidden graph that connects all these components together.
It is also tempting to imagine that recognizing these connections would allow us to cluster files into buckets that could be then hidden from the ‘view’ during analysis!
This is not an easy thing to do for the whole file system, but it works pretty well for selected case-scenarios and in particular – directories. And there is really a lot of ways to improve this especially if file format is considered and links not only between files, but also between files and the Registry are considered.
As usual: subject to a further research!
Weaknesses
It’s very easy to abuse it. You just need to drop files that self-reference each other and to make it even more tricky, reference ‘good’ files on the system.
Installations that cover more than one folder are also problematic (‘Common Files’ subfolder is a good example for ‘multi-folder’ installation).
Protected files – usually compressed, virtualized main program executable files won’t reveal references to other files.
There are probably more…
Still… I do believe this is the future of DFIR tools, even if the possible implementations may vary a lot from the idea I am discussing here.
‘Known hashes’ is good.
‘Known hashes+files’ is good+.
Time for a simple example
Okay, just writing about stuff is not enough.
Let’s see how it works in practice.
In this test I install Total Commander – the latest 32-bit version from http://www.ghisler.com/download.htm
Once installed, the installation folder contains the following list of files:
- CABRK.DLL
- CGLPT64.SYS
- CGLPT9X.VXD
- CGLPTNT.SYS
- DEFAULT.BAR
- DESCRIPT.ION
- FRERES32.DLL
- HISTORY.TXT
- KEYBOARD.TXT
- NO.BAR
- NOCLOSE.EXE
- REGISTER.RTF
- SFXHEAD.SFX
- SHARE_NT.EXE
- SIZE!.TXT
- TC7Z.DLL
- TC7ZIPIF.DLL
- TCMADMIN.EXE
- TCMDLZMA.DLL
- TCMDX64.EXE
- TCUNINST.EXE
- TCUNINST.WUL
- TCUNZLIB.DLL
- TcUsbRun.exe
- TOTALCMD.CHM
- TOTALCMD.EXE
- TOTALCMD.EXE.MANIFEST
- TOTALCMD.INC
- UNACEV2.DLL
- UNRAR.DLL
- UNRAR9X.DLL
- WC32TO16.EXE
- WCMICONS.DLL
- WCMICONS.INC
- WCMZIP32.DLL
- WCUNINST.WUL
- wcx_ftp.ini
- wincmd.ini
- LANGUAGE\WCMD_CHN.INC
- LANGUAGE\WCMD_CHN.LNG
- LANGUAGE\WCMD_CHN.MNU
- LANGUAGE\WCMD_CZ.INC
- LANGUAGE\WCMD_CZ.LNG
- LANGUAGE\WCMD_CZ.MNU
- LANGUAGE\WCMD_DAN.INC
- LANGUAGE\WCMD_DAN.LNG
- LANGUAGE\WCMD_DAN.MNU
- LANGUAGE\WCMD_DEU.INC
- LANGUAGE\WCMD_DEU.LNG
- LANGUAGE\WCMD_DEU.MNU
- LANGUAGE\WCMD_DUT.INC
- LANGUAGE\WCMD_DUT.LNG
- LANGUAGE\WCMD_DUT.MNU
- LANGUAGE\WCMD_ENG.MNU
- LANGUAGE\WCMD_ESP.INC
- LANGUAGE\WCMD_ESP.LNG
- LANGUAGE\WCMD_ESP.MNU
- LANGUAGE\WCMD_FRA.INC
- LANGUAGE\WCMD_FRA.LNG
- LANGUAGE\WCMD_FRA.MNU
- LANGUAGE\WCMD_HUN.INC
- LANGUAGE\WCMD_HUN.LNG
- LANGUAGE\WCMD_HUN.MNU
- LANGUAGE\WCMD_ITA.INC
- LANGUAGE\WCMD_ITA.LNG
- LANGUAGE\WCMD_ITA.MNU
- LANGUAGE\WCMD_KOR.INC
- LANGUAGE\WCMD_KOR.LNG
- LANGUAGE\WCMD_KOR.MNU
- LANGUAGE\WCMD_NOR.LNG
- LANGUAGE\WCMD_NOR.MNU
- LANGUAGE\WCMD_POL.LNG
- LANGUAGE\WCMD_POL.MNU
- LANGUAGE\WCMD_ROM.INC
- LANGUAGE\WCMD_ROM.LNG
- LANGUAGE\WCMD_ROM.MNU
- LANGUAGE\WCMD_RUS.INC
- LANGUAGE\WCMD_RUS.LNG
- LANGUAGE\WCMD_RUS.MNU
- LANGUAGE\WCMD_SK.LNG
- LANGUAGE\WCMD_SK.MNU
- LANGUAGE\WCMD_SVN.INC
- LANGUAGE\WCMD_SVN.LNG
- LANGUAGE\WCMD_SVN.MNU
- LANGUAGE\WCMD_SWE.LNG
- LANGUAGE\WCMD_SWE.MNU
This is quite a lot of files. If you come across it during exam, you won’t be able to tell which ones are legit and which are not. You need to browse through it all. It takes a lot of human cycles away.
Using a simple script which implements the aforementioned algo I was able to generate the following list of links established between all these files (files are sorted in order of ‘what file is the most popular’, or – in other words – ‘which file is referenced by others the most frequently’:
- wcmzip32.dll 21
- DESCRIPT.ION
- HISTORY.TXT
- WCMD_CHN.LNG
- WCMD_CZ.LNG
- WCMD_DAN.LNG
- WCMD_DEU.LNG
- WCMD_DUT.LNG
- WCMD_ESP.LNG
- WCMD_FRA.LNG
- WCMD_HUN.LNG
- WCMD_ITA.LNG
- WCMD_KOR.LNG
- WCMD_NOR.LNG
- WCMD_POL.LNG
- WCMD_ROM.LNG
- WCMD_RUS.LNG
- WCMD_SK.LNG
- WCMD_SVN.LNG
- WCMD_SWE.LNG
- TCUNINST.WUL
- TOTALCMD.EXE
- tcuninst.exe 20
- DESCRIPT.ION
- HISTORY.TXT
- WCMD_CHN.LNG
- WCMD_CZ.LNG
- WCMD_DAN.LNG
- WCMD_DEU.LNG
- WCMD_DUT.LNG
- WCMD_ESP.LNG
- WCMD_FRA.LNG
- WCMD_HUN.LNG
- WCMD_ITA.LNG
- WCMD_KOR.LNG
- WCMD_NOR.LNG
- WCMD_POL.LNG
- WCMD_ROM.LNG
- WCMD_RUS.LNG
- WCMD_SK.LNG
- WCMD_SVN.LNG
- WCMD_SWE.LNG
- TOTALCMD.EXE
- descript.ion 18
- HISTORY.TXT
- WCMD_CHN.LNG
- WCMD_DEU.LNG
- WCMD_DUT.LNG
- WCMD_ESP.LNG
- WCMD_FRA.LNG
- WCMD_HUN.LNG
- WCMD_ITA.LNG
- WCMD_KOR.LNG
- WCMD_NOR.LNG
- WCMD_POL.LNG
- WCMD_ROM.LNG
- WCMD_RUS.LNG
- WCMD_SK.LNG
- WCMD_SVN.LNG
- WCMD_SWE.LNG
- TCUNINST.WUL
- TOTALCMD.EXE
- totalcmd.inc 14
- DESCRIPT.ION
- HISTORY.TXT
- WCMD_CHN.INC
- WCMD_CZ.INC
- WCMD_DAN.INC
- WCMD_DEU.INC
- WCMD_FRA.INC
- WCMD_FRA.LNG
- WCMD_HUN.INC
- WCMD_KOR.INC
- WCMD_ROM.LNG
- WCMD_RUS.INC
- TCUNINST.WUL
- TOTALCMD.EXE
- wcx_ftp.ini 6
- HISTORY.TXT
- WCMD_CZ.LNG
- WCMD_RUS.LNG
- WCMD_SK.LNG
- TCUNINST.EXE
- TOTALCMD.EXE
- noclose.exe 5
- DESCRIPT.ION
- HISTORY.TXT
- KEYBOARD.TXT
- TCUNINST.WUL
- TOTALCMD.EXE
- unrar.dll 5
- DESCRIPT.ION
- HISTORY.TXT
- TCUNINST.WUL
- TOTALCMD.EXE
- UNRAR9X.DLL
- totalcmd.exe 5
- DESCRIPT.ION
- HISTORY.TXT
- TCUNINST.EXE
- TCUNINST.WUL
- TcUsbRun.exe
- tc7z.dll 4
- DESCRIPT.ION
- TC7ZIPIF.DLL
- TCUNINST.WUL
- TOTALCMD.EXE
- sfxhead.sfx 4
- DESCRIPT.ION
- HISTORY.TXT
- TCUNINST.WUL
- TOTALCMD.EXE
- tcmdx64.exe 4
- DESCRIPT.ION
- HISTORY.TXT
- TCUNINST.WUL
- TOTALCMD.EXE
- wcmicons.dll 4
- DEFAULT.BAR
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- cglptnt.sys 4
- CGLPT64.SYS
- DESCRIPT.ION
- HISTORY.TXT
- TCUNINST.WUL
- tcmadmin.exe 4
- DESCRIPT.ION
- HISTORY.TXT
- TCUNINST.WUL
- TOTALCMD.EXE
- unrar9x.dll 4
- DESCRIPT.ION
- HISTORY.TXT
- TCUNINST.WUL
- TOTALCMD.EXE
- tcunzlib.dll 4
- DESCRIPT.ION
- HISTORY.TXT
- TCUNINST.WUL
- TOTALCMD.EXE
- tcusbrun.exe 3
- DESCRIPT.ION
- HISTORY.TXT
- TCUNINST.WUL
- freres32.dll 3
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- share_nt.exe 3
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- cabrk.dll 3
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- wc32to16.exe 3
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- wincmd.ini 3
- HISTORY.TXT
- TCUNINST.EXE
- TOTALCMD.EXE
- default.bar 3
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- tc7zipif.dll 3
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- unacev2.dll 3
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- tcmdlzma.dll 3
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- cglpt9x.vxd 3
- DESCRIPT.ION
- TCUNINST.WUL
- TOTALCMD.EXE
- wcuninst.wul 2
- DESCRIPT.ION
- TCUNINST.WUL
- history.txt 2
- DESCRIPT.ION
- TCUNINST.WUL
- tcuninst.wul 2
- DESCRIPT.ION
- TCUNINST.EXE
- register.rtf 2
- WCMD_FRA.LNG
- TCUNINST.WUL
- size!.txt 2
- DESCRIPT.ION
- TCUNINST.WUL
- totalcmd.chm 2
- TCUNINST.EXE
- TCUNINST.WUL
- totalcmd.exe.manifest 2
- DESCRIPT.ION
- TCUNINST.WUL
- cglpt64.sys 2
- DESCRIPT.ION
- TCUNINST.WUL
- wcmd_deu.lng 2
- HISTORY.TXT
- TCUNINST.WUL
- wcmicons.inc 2
- DESCRIPT.ION
- TCUNINST.WUL
- no.bar 2
- DESCRIPT.ION
- TCUNINST.WUL
- wcmd_deu.mnu 1
- TCUNINST.WUL
- wcmd_pol.mnu 1
- TCUNINST.WUL
- wcmd_hun.mnu 1
- TCUNINST.WUL
- wcmd_kor.inc 1
- TCUNINST.WUL
- wcmd_dut.lng 1
- TCUNINST.WUL
- wcmd_rom.inc 1
- TCUNINST.WUL
- wcmd_swe.lng 1
- TCUNINST.WUL
- wcmd_swe.mnu 1
- TCUNINST.WUL
- wcmd_svn.inc 1
- TCUNINST.WUL
- wcmd_cz.lng 1
- TCUNINST.WUL
- wcmd_dut.inc 1
- TCUNINST.WUL
- wcmd_kor.lng 1
- TCUNINST.WUL
- wcmd_kor.mnu 1
- TCUNINST.WUL
- wcmd_cz.inc 1
- TCUNINST.WUL
- wcmd_fra.inc 1
- TCUNINST.WUL
- wcmd_rus.inc 1
- TCUNINST.WUL
- wcmd_cz.mnu 1
- TCUNINST.WUL
- wcmd_fra.mnu 1
- TCUNINST.WUL
- wcmd_ita.mnu 1
- TCUNINST.WUL
- wcmd_nor.mnu 1
- TCUNINST.WUL
- wcmd_esp.mnu 1
- TCUNINST.WUL
- wcmd_rom.mnu 1
- TCUNINST.WUL
- wcmd_dan.inc 1
- TCUNINST.WUL
- wcmd_deu.inc 1
- TCUNINST.WUL
- wcmd_rus.mnu 1
- TCUNINST.WUL
- wcmd_hun.lng 1
- TCUNINST.WUL
- wcmd_chn.mnu 1
- TCUNINST.WUL
- wcmd_eng.mnu 1
- TCUNINST.WUL
- wcmd_ita.lng 1
- TCUNINST.WUL
- wcmd_dan.mnu 1
- TCUNINST.WUL
- wcmd_sk.lng 1
- TCUNINST.WUL
- wcmd_pol.lng 1
- TCUNINST.WUL
- wcmd_sk.mnu 1
- TCUNINST.WUL
- keyboard.txt 1
- TCUNINST.WUL
- wcmd_dan.lng 1
- TCUNINST.WUL
- wcmd_esp.lng 1
- TCUNINST.WUL
- wcmd_chn.inc 1
- TCUNINST.WUL
- wcmd_nor.lng 1
- TCUNINST.WUL
- wcmd_fra.lng 1
- TCUNINST.WUL
- wcmd_rom.lng 1
- TCUNINST.WUL
- wcmd_esp.inc 1
- TCUNINST.WUL
- wcmd_chn.lng 1
- TCUNINST.WUL
- wcmd_svn.lng 1
- TCUNINST.WUL
- wcmd_ita.inc 1
- TCUNINST.WUL
- wcmd_rus.lng 1
- TCUNINST.WUL
- wcmd_dut.mnu 1
- TCUNINST.WUL
- wcmd_hun.inc 1
- TCUNINST.WUL
- wcmd_svn.mnu 1
- TCUNINST.WUL
The simple example – what does it tell us?
While simple, the example above allows us to link all of the files produced during the installation of Total Commander and build a cluster which we could call ‘totalcmd’.
I’d love to see a DFIR tool that would allow me to implement this sort of clustering and then help me to hide such filighted files with a click of a mouse. And then applying the same logic to other directories (f.ex. Program Files) one by one could allow us to build such clusters automatically and exclude these files from the ‘view’ as well.
Utilizing such automatically generated clusters + clusters of whitelisted/blacklisted software (potentially focused on problematic cases) could allow to significantly reduce analysis time (on top of other data reduction techniques).
See the second part here.