Using Guids to guide the ID of samples’ capabilities or unique (attributable) properties…

A few days ago Karsten asked me what tool did I use for GUID extraction. I replied that it was my own old tool written waaaay before yara’s birth.

In this post I will elaborate on this bit a bit…

That old GUID extraction tool was written in perl – yeah, I know… … and… it was basically reading the content of the whole sample to memory, and then, within that content, it was searching for… known GUIDs…. It was badly written, superslow, but… at that time… superuseful!

Why?

Because my short GUID list was curated. My tool looked only for GUIDs associated with known adware/spyware + popular GUIDs associated with COM interfaces abused by malware at that time. So, it was very ‘focused’ as it was helping me to quickly ID samples belonging to 180Solutions, Zango, BetterInternet, Ezula, Bonzi, ClearSearch, VirtuMonde and many others, and… was also highlighting to me some potentially interesting features of triaged samples like them including references to COM interfaces operating on shortcut files (IShellLink) or generic (IPersistFile) methods for saving files…

A GUID itself is a very interesting IOC on its own. In theory, it is supposed to act as a global, unique identifier. In practice, it is not only just an identifier, but also a capability determinant, amongst other things.

in my old post I dumped a lot of ‘GUID to <something>’ mappings that any data hoarder should find useful… For example, taking just that list, validating it (it actually had some bugs!), and converting it to a set of yara rules is a step we can take to kinda partially duplicate the features of my old perl tool.

The conversion process walks through all GUIDs from the input file and creates a small yara rule for each of these GUIDs, where each of them is converted to 3 strings:

  • GUID string as an ASCII
  • GUID string as a Wide string (UTF16)
  • binary representation of the GUID

The resulting file looks like this.

The rules written this way take care of any textual references to GUID present inside the sample (ASCII and Unicode/Wide), plus it recognizes the most popular way of storing GUIDS – the 16-bytes long binary form. That is, it will pick up known GUID references inside the resources, embedded IDL files, as well as any actual code/data strings and of course, the binary form of GUID that programmers (often unknowingly) introduce to their programs.

Now that we have this yara file, we can test it by applying it to f.ex. win11’s Notepad.exe:

yara guids.yar notepad.exe

The results are:

guid_IUnknown notepad.exe
guid_IMarshal notepad.exe
guid_IAsyncInfo notepad.exe
guid___FIAsyncOperationCompletedHandler_1_Windows__CSystem__CLaunchQuerySupportStatus notepad.exe
guid_IPropertyDescriptionList notepad.exe
guid___FIAsyncOperationCompletedHandler_1_Windows__CSecurity__CEnterpriseData__CFileProtectionInfo notepad.exe
guid___x_ABI_CWindows_CStorage_CIStorageItem notepad.exe
guid_IFileDialog notepad.exe
guid_IShellItem notepad.exe
guid___x_ABI_CWindows_CFoundation_CIUriRuntimeClassFactory notepad.exe
guid___FIEventHandler_1_Windows__CSecurity__CEnterpriseData__CProtectedContentRevokedEventArgs notepad.exe
guid___x_ABI_CWindows_CSecurity_CEnterpriseData_CIFileProtectionManagerStatics notepad.exe
guid___x_ABI_CWindows_CStorage_CIStorageFileStatics notepad.exe
guid___x_ABI_CWindows_CSystem_CILauncherStatics2 notepad.exe
guid_IAccPropServices notepad.exe
guid_IFileSaveDialog notepad.exe
guid_IAgileObject notepad.exe
guid_CAccPropServices notepad.exe
guid___x_ABI_CWindows_CSecurity_CEnterpriseData_CIProtectionPolicyManagerStatics2 notepad.exe
guid_FileSaveDialog notepad.exe
guid___x_ABI_CWindows_CSecurity_CEnterpriseData_CIProtectionPolicyManagerStatics notepad.exe
guid___FIEventHandler_1_IInspectable notepad.exe
guid___x_ABI_CWindows_CApplicationModel_CDataTransfer_CIClipboardStatics notepad.exe
guid_IFileOpenDialog notepad.exe
guid___x_ABI_CWindows_CApplicationModel_CDataTransfer_CIDataPackagePropertySetView3 notepad.exe
guid_FileOpenDialog notepad.exe
guid___FIAsyncOperationCompletedHandler_1_Windows__CStorage__CStorageFile notepad.exe
guid_IFileDialogCustomize notepad.exe
guid_LocalAppData notepad.exe

Even without a single second spent in a disassembler or decompiler we can already see what sort of GUIDs the Notepad.exe references. Some of them are related to COM functionality (f.ex. guid_IFileSaveDialog), some are just GUIDs used as function arguments to functions (f.ex. guid_LocalAppData).

Is it very useful?

I guess… it depends….

If you had a good adware/spyware GUID database back in 2005-2008 you could quickly identify a lot of adware/spyware samples w/o even looking at their code. It worked really nicely.

There are also existing plug-ins for disassembler/decompilers that try to recognize existing GUIDs inside the code/data and rename these data chunks that look like known GUIDs with appropriate names of classes/interfaces or associated artifacts (f.ex. Known Folder IDs).

The GUID values are present inside the PDB / RSDS structure included inside some of the PE files – they link the .EXE file with the .PDB file. The Module Version ID (MVID) and TypeLib ID are both GUIDs that are present inside compiled .NET assemblies and can be extracted & collected. Their unique values can be used to link samples coming from the same Visual Studio instance, and/or build environment. Last, but not least – it was allegedly a GUID that linked the first iteration of Melissa virus to its author who eventually got arrested.

GUIDs are great artifacts and it’s wise to both collect all the extractable instances of it, and look for the presence of the known ones in the analyzed samples.

Copyright banners – re-visited

Over a decade ago I posted some random copyright banner stats from my (relatively small by today’s standards) malware repo. I really liked these stats back then and I still like them today.

Why?

These banners are great ‘low hanging fruits’ that may immediately help with sample analysis as they immediately draw analyst’s attention to features responsible for data compression/decompression, data coding/encoding, media coding/encoding, archive file creation/processing, etc.

So I decided to check what has changed since.

One of the obvious and expected changes was that banners now cover years 201x and 202x:

  • 1995-2013 Jean-loup Gailly and Mark Adler
  • 1995-2017 Jean-loup Gailly and Mark Adler
  • copyright 1997-2021 Simon Tatham
  • Copyright (c) 2021 Richard L. Wolf
  • Copyright (C) 2006-2021 WIBU-SYSTEMS AG
  • Copyright 2021 Google Inc. All rights reserved.

I also noticed that some malware authors try to modify some of these very recognizable copyright banners to make them less useful for yara signatures and static detection engines that rely on hardcoded strings f.ex.

Copyright 1935-2022 Jean-loop Gai1ly and Merk Adler

Not only the starting year is waaaaay beyond acceptable norm, there is also a modification of authors’ names. You can see the sample doing so here.

We also see more ‘novelty’ copyright banners f.ex. associated with cryptomining:

Copyright (C) 2016-2017 xmrig.com
Copyright (C) 2016-2018 xmrig.com
Copyright (C) 2016-2019 xmrig.com
Copyright (C) 2016-2020 xmrig.com
Copyright (C) 2016-2021 xmrig.com

and lots more Google banners:

Copyright (C) 2011 Google Inc. All rights reserved.
Copyright 2012 Google Inc. All rights reserved.
Copyright (C) 2013 Google Inc. All rights reserved.
Copyright 2016 Google Inc. All Rights Reserved.
Copyright 2017 Google Inc.
Copyright 2017 Google Inc. All rights reserved.
Copyright 2018 Google LLC
Copyright 2019 Google LLC. All rights reserved.
Copyright 2020 Google LLC. All rights reserved.
Copyright 2021 Google LLC. All rights reserved.

and there are also some random copyrights like the ones below:

  • Copyright 2017 Gr0wh4x All rights reserved.
  • Copyright (c) Black.Hacker
  • Copyright 2021 InsiderHack Inc. All rights reserved.
  • Copyright (C) 2016 Weijie Gao hackpascal@gmail.com

In general though, we see less and less reliance on old, well-established, statically linked libraries and less and less copyright banners as a result. Times are changing, and the old protectors, packers, packing, compression libraries are now out of fashion…