You are browsing the archive for Preaching.

A short wishlist for signature/rules writers

January 21, 2019 in Preaching

Oh, no… yet another rant.

Not really.

Today I will try to discuss a phenomenon that I observe in a signature/rules writing space – one that used to be predominantly occupied by antivirus companies. And today it is a daily bread for malware analysts, sample hoarders, threat intel folks, and threat hunters crowd as well. Plus, lots of EDR and forensics/IR solutions use these, and they come really handy in memory analysis, as well as retrohunting and sorting samples collections.

The phenomenon I want to talk about relies on these two factors:

  • writing signatures / rules is very easy
  • there is no punishment for writing bad ones

This phenomenon is, as you probably guessed by now, the uneven quality of signatures / rules. And I am very careful with words here. Some of these rules are excellent, and some, could be better.

Before I proceed further, let me ask you a question:

What is the most important part of a good yara signature?

  • Is it a unique string, or a few of them?
  • A clever Boolean Condition?
  • A filter that cherry-picks scanned files, e.g. for the Windows Executables looks for the ‘MZ’ files, then ‘PE’ header?

These are all equally important, and used in most of the yara rules you will find today. I find it interesting though that most rules don’t include the ‘filesize’. And I wonder why? This filter helps to exclude tones of legitimate files, and malicious files that are outside of the file size range used by the family the specific yara rule covers. If applied properly, it will potentially skip expensive searches inside the whole file.

Update: I stand corrected, it would seem the ‘filesize’ is checked _after_ the strings are checked (thx Silas and Wesley — <hit this thx bit to visit twitter convo>). This is a poor performance optimization choice, in my view. Still, see below what I wrote below about this scenario exactly – it doesn’t matter if yara performs poorly on this condition today, they may improve it tomorrow. Additionally, the way we use yara rules and how they are compiled matters! In a curated ruleset the issues I am referring to don’t make much difference. It does make a difference with individual scans on e.g. file system. In my experience, many of the rules I get from the public sources can’t be combined/compiled into a one single bulky rule, because of conflicts. So I tend to run yara.exe many times, each time using different yara rule files. See the Twitter convo for some interesting back and forth between us. Thanks guys!

I think the practical reason why many analysts forget about this condition is pretty basic. It’s very rare for any of us to write rules that must be optimized, and checked for quality. While the signatures in AV industry go through a lot of testing before they are released, our creations are deployed often as soon as they are written and tested on a bunch of sample files only, and very rarely on larger sampleset that include lots of files, including large ones, clean ones, and tricky ones (intended to break parsers & rules e.g. corrupted files).

Update: Important to mention that based on our Twitter convo, it again depends very much on the circumstances. It is possible that in your environment, or your needs do not require checking this.

Our rules are typically for a local consumption, so performance or accuracy are not necessary a priority. But performance is important. And even more – a different mindset.

We write rules to detect i.e. include matching patterns, but not to exclude non-matching ones. And the latter is important – the faster we can detect that the file doesn’t match our rule, the faster the yara engine can finish testing the file and move on to the next.

And even if yara engine was the worst searching engine ever, and was actually reading the whole file, and the ‘filesize’ condition was not really helping performance, it would still make sense to write rules in ‘the best effort’ way. There is always a new version of the engine, the authors take the feedback in, and one day a future version may optimize code and comparisons for exactly this condition.

Coincidentally, this is actually one of the principles that most antivirus engineers learn very early in their career: learn to exclude non-matching stuff, and learn to do it early in your detection rule.

The sole intention of this post is to highlight the importance of thinking of signatures / rules not only in a category our ways to quickly detect stuff, but in a wider context – a way to ignore the non-matching stuff. The earlier in the rule, the better.

Update: After the Twitter convo I now know I chose a wrong example to illustrate my point. I should have used f.ex. common ‘good’ strings that we can sometimes find in public yara rules (these strings can be found in both malware and good files because they are part of a library). The hits that these strings generate on ‘good’ files can be avoided by testing rules on larger corpora of samples, including ‘good’ files. There are plenty of other examples.

A short wishlist for tool writers

January 20, 2019 in Preaching

2019-01-20: Updated to add Brian‘s suggestion.

Prologue

We have got so many tools now. It’s like almost every week there is a new tool, plug-in, or their update announced.

Happy days!

Yet, many of them still surprise us with deficiencies that should be very well understood by now. Ones that we should not really see in 2019 anymore.

Exhibit #1.

No binaries.

Rant:

Yes, we know that everyone loves compiling binaries from sources, but seriously… who does, really?

Especially that it compiles in your build environment, but it doesn’t in others. And obviously everyone who wants to test your program is an experienced developer.

Re-creating your build environment means installing same compilers, make tools, dependencies, often libraries or packages that are no longer obtainable in versions you have installed. At this stage the build environment is already different from yours.

And once the environment is ready, everyone loves spending time fixing various code issues, suppressing or ignoring warnings, adding missing header files, and often even modifying building commands.

It often takes a few hours. (In fairness, I must bow here to Mimikatz and SQLITE3 authors – their code compiles like a charm!)

Wishlist item:

If you build a tool for yourself, keep it to yourself. If you write for everyone then please ship binaries that give everyone a chance to try your software w/o wasting time on building. Not everyone who will, is a developer. Not everyone who is a developer, will.

Exhibit #2.

Missing dependencies.

Rant:

@#$%^&*

Wishlist item:

Static linking, or adding prerequisite to install Microsoft Visual C++ Redistributable Package may help in this case.

But actually…

This is not a library problem. It is a testing problem.

Yes, all works perfectly in your build environment, but have you tried running it outside of it? The dependency issue would be immediately visible if you tried to run it on a plain vanilla OS, ideally a few of them, and lo-and-behold – if you tried on non-English OS versions too.

Exhibit #3.

Portability and backward compatibility is dead.

Your program only runs on Windows version XYZ.

Rant:

So, you write the tool with the latest, shiniest compiler. It just happens to include requirements, or dependencies that make it work only on a specific Windows version, or up. Nothing in this program actually uses any of the specific features of Window version XYZ or up, as it relies on Windows API available since Windows 95, but… the program won’t work on older versions of Windows.

Wishlist item:

Of course, test it on old versions. if it doesn’t work, find out why:

  • Is it the compiler? See if you can change flags, or settings. Can you use an older version of the compiler?
  • Is it a static linking to a DLL or an API introduced in recent years? Load the DLL / resolve the API dynamically. Note that many sysinternals tools are still backward compatible, because they do it exactly this way!
  • Is it a very demanding value of MajorOperatingSystemVersion / MajorOperatingSystemVersion (OS version required to run the program that is simply too high w/o any reason) ? Adjust them during the build process in an automatic fashion.
  • Is it a dependency on a library that doesn’t work on older versions of Windows? Fair enough, you can either find a different library, or make it clear that this is the reason the software doesn’t work on older versions.
  • Also, perhaps double-check if the troublesome library provides the service that is used all the time, or only in certain, rare instances; consider making certain features available dependent on the availability of such library that could be loaded dynamically; this way the program can still do most of its work on older OS versions.
  • Test the final product on older versions of Windows.

Exhibit #4.

Tools crashing. Often during the first run.

Rant:

@#$%^&(*

Wishlist:

  • detect & resolve dependencies (f.ex. on specific .NET versions)
  • handle these exceptions; this is so much easier now than 20 years ago – it’s a built-in feature for us to use
  • more code review?
  • more error checking?
  • more testing in general?

Exhibit #5.

Tools showing so much that they eventually show nothing.

Rant:

We know that many tools are POC so it’s hard to let go. Anytime you look at the problem e.g. file format parsing, you do need to include all the possible fields, and highlight all the nuances your tool can extract or understand, and of course send it to the output. Users will enjoy it.

You also need to make the output fancy: format it, colorize it, add ASCII Art logo, a copyright banner, use Unicode output characters that show up on your terminal configured to use a Unicode font, etc.

Wishlist item:

By all means, add verbose/debug logs to your tool, but think of the users. What is that they want from your tool? How do they use it? Who are really these tool users? Noobs, experienced practitioners, advanced pros, hardcore hackers? For example, for a PE parser, will they need to know all the gore characteristics of the file when they only want to know the very basic properties of the file (that can drive their next steps in analysis)? What are the real and most common use cases?

Less is often more.

  • Add debug/verbosity, but make it optional.
  • Consider saving such logs to a file, not to standard output.
  • Add options to disable copyright banners.
  • Avoid whistles and fireworks. UI metaphores went down the drain last few years, but we can still try to make the UI user-friendly.
  • Think of the audience. Ask the audience what works, and what doesn’t.
  • Learn from other tool makers.

Exhibit #6.

Help.

Rant:

How do I use your tool? Twitter animations are cool, youtube video with your presentation is great, screenshots are fantastic. But… can I have a basic written documentation please?

What are the problems it tries to solve? This should be the first line of any documentation. You can’t assume everyone who visits your page knows.

What file formats are supported? What is the desired output? What are known issues? What is the competition, or if you don’t want to write about them – how do I verify output of your tool? What are the references to documents, functional specifications, older work that you relied on?

Finally, what are the command line arguments? Give a loooong list of examples. Be generous. Treat users like they have never seen a computer before. It will save them a lot of time.

Seriously, this is such a pain that it bogs my mind how much time I sometimes spend looking for an example usage for some tools. One that actually works. Because these that don’t — it’s endless. And how many times I actually had to reverse engineer some binary to discover a proper usage for this particular version…

And if you want a good example on how to do it right — look at programs written by Nirsoft. They all share similar GUI interface. They also have very good documentation pages, all of which are quite uniform, so seeing it once makes it easy to read others. They include very detailed information about command line arguments programs support. You will also find a lot of information about known issues, request for feedback, licensing info, and lots of lots of useful hints on how to use the programs.

Exhibit #7.

Missing Feedback opportunities.

Brian provided a good suggestion to add to the list – a suggestion that is more towards users than tool developers, but this reminds me that the feedback is not always easy to provide, because developers simply forget to tell us how.  

Rant:

If your program doesn’t work on my system, or with my samples, how can you learn about it if there is no way to provide a feedback?

Wishlist item:

Email address, Twitter handle, or enabled bug logging/comments for your repo/blog will do wonders.

And if you are the user, please provide the feedback to the tool writers!

Exhibit #8.

Not everything is a tool.

Rant:

So, you used 20 libraries, and wrote a 50 lines of code that reads JSON, converts it to XML, transforms it with AI, and then outputs movie rendering a unicorn printed with a virtual 3d printer galloping over the rainbow.

It’s great you are proving yourself. It’s great you are trying. It’s great the program you wrote works for you. There is no sarcasm here. This is how we all learn.

You are in luck. There are so many libraries available now that writing code that does extremely complex tasks in just a few lines of code is trivial. We should respect that.

AND

If there are 50 other programs doing exactly the same thing, and often do it better. If the tool is half-cooked for the sake of a demo during the con, and/or it only works well for your test data. If the whole code is so simple that any average developer could implement it w/o much trouble. If the actual snippets can be found on the StackOverflow. Then please please do not call it a tool. Calling it a POC is enough.

This is not to discourage you from coding. This is to encourage you to assess your code quality & usefulness on a grand scheme of things. If you claim that your software allows to do a specific thing, or assess certain quality of some data, or extract certain properties, and someone can quickly prove that these are not done right, because of a limited scope of the original idea that drove the development, then as a POC it’s still a very valuable asset, but as a part of somebody’s toolbox – completely useless. Hence, not a tool.

Epilogue:

Okay, it’s easy to rant and play a blame games. I coded lots of bad programs myself, and some of them are still available on this web site. Making everyone happy is very hard. Programming itself is actually very hard. Testing is even harder. And writing documentation the most undesirable task coders have to face.

BUT.

I think there is a minimum responsibility to bear for anyone that releases programs publicly, and announces them to the world. It is a basic empathy for the users, and their needs.

And if you release code, a POC, a tool, a suite, and it actually can be quickly tested, won’t crash on its first run, and will deliver expected output — you will actually make a dent in the industry.