Beyond good ol’ Run key, Part 107

This is a persistence, and a code injection trick in one. It affects only environments where NVIDIA CUDA Toolkit is present. If it is the case, the system will have these two environment variables present:

  • CUDA_INJECTION32_PATH
  • CUDA_INJECTION64_PATH

They typically point to legitimate NVIDIA DLLs, but one could replace them with anything. The DLLs are loaded via LoadLibrary.

This is not a backdoor of any sort – just a legitimate profiler interface.

Playing with Delay-Loaded DLLs…

Delay-Loaded DLLs is PE file feature that is almost obsolete today. Programmers who wanted to benefit from this mechanism in the past would write a program that would use many Windows API same as usual. During the linking process though they would enforce some restrictions: forcing some DLLs and their imports to be resolved some time later after the actual program start. This was meant to speed up the loading of the program and use less memory. In practice, it’s just a convenient mechanism that resolves APIs dynamically ‘just-in-time’ so that programmers can use API calls transparently & don’t need to write their own wrappers (or use LoadLibrary/GetProcAddress directly).

We can take the advantage of this mechanism to implement a simple beacon by adding a DLL name to delayed import table, and then using hex editor (or a quick script) to replace the name of this DLL with a UNC path in a same way as the PDB example:

Once program is executed, and the function that is resolved dynamically is encountered, the program’s library function will attempt to load that particular DLL (i.e. in our case it will try to resolve the UNC path and as such will ping the destination address). The DLL could be of course present on that remote site, or the the dynamic loader could be executed within an exception handler wrapper so that the program can continue…

Interestingly, Dependency Walker shows the UNC path when the program is told to view the imports used by the test exe file. It doesn’t stop it though from trying to load the delayed DLL from that UNC path. And because of that, the actual ‘call home’ ping can be made w/o execution of the main sample!

This is again should act as a warning against testing samples on systems that are online. Even static analysis could sometimes be harmful, especially if you use tools that blindly trust the input, or even worse, utilize LoadLibrary to load the DLLs that programs are linked to.