Analyzing malicious Word documents usually focuses on using 2 different types of tools. Some can analyze a file structure with the intention of extracting macros for further analysis. Others support dynamic analysis with the aim of extracting the run-time IOCs. Some sandboxes and reversers try to deobfuscate the VBA code as well, but it’s quite a tedious job if done manually. Luckily, this area of research is now so advanced that reversing tools supporting this mundane task exist, and sandboxes (e.g. Joe Sandbox) use this approach to trace the VBA code execution in a manner similar to a classic API Monitor for a while now.
There is one more approach, or should I say, one more additional avenue we can pursue – and it is by using the VBA code itself. I used it many times in the past and found it pretty useful.
Once you load a malicious document into Word and access the malicious VBA code (and provided there is no trickery that prevents or makes it harder), you can inject your own snippet of code into the analyzed document. You can then run it in a context of the active document structure. It’s super trivial – just add a new module, or paste the code in the same module where the malware code is present. While such injected code cannot answer all the reversing questions, sometimes it can help to extract, and also attribute certain values to specific objects (e.g. strings or blobs seen in the file can be hidden in some less-obvious property) – it’s an important association that otherwise may be much harder to establish.
For instance, the below snippet walks through all shapes in the document and prints out all URLs associated with them:
Dim S As Shape Dim ish As InlineShape For i = 1 To ActiveDocument.Shapes.Count Set S = ActiveDocument.Shapes.Item(i) Debug.Print i & " " & S.LinkFormat.Type On Error Resume Next Debug.Print S.LinkFormat.SourceFullName Next i For i = 1 To ActiveDocument.InlineShapes.Count Set ish = ActiveDocument.InlineShapes.Item(i) Debug.Print i & " " & ish.LinkFormat.Type On Error Resume Next Debug.Print ish.LinkFormat.SourceFullName Next i
One can easily expand this snippet to add all the possible objects that include URL-related properties. And of course, we can go a step further and add enumeration of all major document objects and their properties in general (e.g. normal and custom or advanced properties, their names, values, including blobs of data that may be well hidden in some objects’ properties, etc.). As such, bypassing the UI limitations (plus, it is much faster to do it this way).
Apart from the specifically malicious items, such script can help with a possible actor attribution as well. Some of the file properties may be stored in different languages, or refer to a different language (names of the properties, text encodings, hidden texts, historical entries, etc.) – they are normally not that easy to discover using the UI, yet with the direct access to the document structure they become immediately available, once enumerated.
And we don’t need to use VBA all the time. Using VBS to open Office documents is a trick as old as Office itself. We can open malicious documents via the word application object and apply the data extraction code immediately. (note that it will obviously trigger the execution of malware in most cases).
Some examples of useful VBS code that can be re-purposed for this task are here, and here. Lots of old code doing this sort of stuff can be easily found via Google.
Of course, the technique can be used for all Office documents supporting automation / VBA.
Last, but not least – I mentioned in my older post that sometimes opening macro-malware in Office 2003 version will help accessing data more than we can get through the eye-candy interface of the latest Office versions (newer versions remove access to some properties that you could still see in the Office 2003 via GUI). Using VBS we can run a quick scripted conversion to other formats so a malware could be opened and saved as .rtf, .doc, .docx, .txt in one a simple batch job. The resulting files can be made available to other tools for further processing.