Dictionary files (.dctx)

This is not a very important research really. Just a ‘blurb’ of what I observed during my quick tests.

So…

First of all, I noticed that .dctx files are being handled by this program:

  • C:\Windows\System32\IME\shared\IMEWDBLD.EXE

These are dictionary files (source) and are compiled to some other binary format (.dctc AFAICT). These dictionaries seem to be heavily used (and needed?) for Asian languages, so most of info on them can be found online on forums discussing Japanese and Chinese language keyboard input.

Examples: here, and here.

When you open a .dctx file on Windows 10 you will be presented with this dialog box:

When we click OK, we will see another dialog box:

I have not figured out what that means, but it seems to be a highly prevalent error and many users report it. I couldn’t bypass it despite toying around with various parameters embedded inside my test .dctx file. I tried to use variations of English language (US vs. UK), different encoding, etc., but it always comes back with the same error.

Also, after looking at IMEWDBLD.EXE, I noticed that it takes a -v <logfile> command line argument (where -v stands for -verbose, I guess). Using it during testing is a better alternative to that non-descriptive dialog box shown above. After trying to open the very same .dctx with IMEWDBLD.EXE and -v flag enabled I observed this in the ouput of the log file:

Error: Encountered fatal error(0x80070057:The parameter is incorrect.).
Error: There is a problem with the dictionary file. Please try to download again.

Unfortunately, this error is very prevalent inside the binary (IMEWDBLD.EXE), so I didn’t spend too much time trying to figure it out. Okay, if you must know, 0x80070057 stands for an invalid argument. Would be really handy to know which argument triggered it… hmm…..

So, that’s it really.

If you want to play around, this is a minimalistic sample .dctx file you can try to import on your Windows 10 system. Download, and double click. That’s it.

Bonus

I think the IME components are not very well researched and can potentially offer mechanisms that will allow for less-known attacks focused on:

  • persistence
  • bypassing security controls
  • RCE

Why?

They seem to be developed for a niche (but not negligible due to number!) group of users in Asia (Japanese, Chinese), and most likely have been poorly tested. The last IME-related research I could find is here.

Why?

If you look at IMEWDBLD.EXE binary you will notice a bunch of flags that are not documented anywhere on the internet. Hence, they could be limited to a test environment at MS, or only taken into account on OS versions that require IME. The lower the scope, the lower the testing priority. A.K.A. if it is not documented on the Internet, then it’s likely internal.

Some food for a thought:

  • HKLM\SOFTWARE\Microsoft\IME\PlugInDict
  • EncryptAllPlugInDict
  • DisableAllPlugInDict

Command line arguments for IMEWDBLD.EXE:

  • -encrypt <unknown>
  • -pluginguid <guid>
  • -w <unknown>
  • -pm <unknown>
  • -v <logfile> – saves the verbose info to logfile
  • -nofilter <unknown>
  • -testing <unknown>

Enter Sandbox part 24: Intercepting Buffers #3 – The Punto H & magic points

I mentioned that monitoring buffers is the key to quickly understand the software inner workings. It doesn’t work all the time, but in majority of cases it does. More so, in ‘desperately’ challenging cases it may help to gain access to the internals of a highly obfuscated code, sometimes even virtualized, and may help to understand large, bulky programs that are really hard to analyze using ‘static’ tools.

Now, we are so used to primarily monitor APIs, and the buffers that these APIs handle, that we often forget that there are many additional places where the monitoring could take place.

I listed a lot of examples in the past. And there are always more ideas. Think of it – your sandbox is your baby. You know every single bit of it. You control its existence. You can extract hard-coded addresses for certain functions, or patch some code. You can modify the OS any way you want. You can even replace every OS single file, disable OS anti-tampering code, introduce clever redirections, callbacks – sky is the limit really. It is a controlled environment. Let’s be adventurous with that.

And yes, this is hard, and perhaps sounds like a very abstract idea, but these are many of available possibilities that may actually work well, if applied to modern sandboxes leveraging techniques that typically focus on inspecting the guest system from the outside (as opposed to old API monitors).

You may ask – it all sounds nice, why don’t I provide some more specific example? I am glad you asked. This is the topic of this post.

I personally find the Punto.H / Point.H trick to be one of the best examples of such cleverly placed breakpoints. The trick was developed by a community mainly focused on an art of software cracking and…. very looooooong time ago (the trick is often attributed to Ricardo Narvaja). And yes, it sounds archaic, and it really is.

How does it work?

The old shareware applications usually asked for a serial key. Most of them, especially in the early days, would just ask for a string provided by the user. Once entered, the serial would be retrieved from the UI control (edit box), and would be tested with the program’s serial verification routine. If the serial was OK, program would be reconfigured as ‘registered’.

Shareware programs were very popular back then, but many of them were quite bulky, plus there was no decompilers yet, and it was quite a pain to analyze them. The observant reversers noticed that by intercepting the calls to the internal function called ‘hmemcpy’ they could see all the data being sent between the program UI and its internals. The first letter of the function gave the name to the actual technique: ‘Punto.H’ (since it was very popular among Spanish-speaking crackers, I opted to use the Spanish name in this article, instead of English ‘Point.H’).

So, catching these buffers pretty much was the first step to crack serials. Once you got the buffer, you could track it and eventually reach the actual routine that was processing it. And then, either patch the code to bypass the serial check, or more advanced reversers would write a serial generator, one that would generate strings that the program would accept. It sounds pretty simple, but typically required many hours of work. The Punto.h trick simplified the cracking process a lot.

Again, it’s really different now with regards to software protection, but this technique still illustrates the point: you need to look for good places where you can add breakpoints for monitoring. Punto.h was so popular that even today there are still many plug-ins that implement this technique and its clones, often introducing many other and additional breakpoints for other software platforms, for example: