Hunting for the warez & other dodgy stuff people install / download, part 2

In the first part of this series we explored some basic search terms that can be used to find ‘unwanted’ software being installed on company endpoints. Today, I’d like to take this research a step further and look at other ‘questionable content’.

People download pirated video content from many questionable places. Finding these downloads is not difficult because lots of this activity will reference multimedia files with the following extensions:

  • ‘3g2’, ‘3gp’, ‘amv’, ‘asf’, ‘avi’, ‘bdjo’, ‘bdmv’, ‘clpi’, ‘divx’, ‘drc’, ‘f4a’, ‘f4b’, ‘f4p’, ‘f4v’, ‘flv’, ‘gif’, ‘gifv’, ‘M2TS’, ‘m2v’, ‘m4p’, ‘m4v’, ‘mkv’, ‘mng’, ‘mov’, ‘mp2’, ‘mp4’, ‘mpe’, ‘mpeg’, ‘mpg’, ‘mpls’, ‘mpv’, ‘MTS’, ‘mxf’, ‘nsv’, ‘ogg’, ‘ogv’, ‘qt’, ‘rm’, ‘rmvb’, ‘roq’, ‘svi’, ‘TS’, ‘viv’, ‘vob’, ‘webm’, ‘wmv’, ‘yuv’

As you can guess, searching for file creation events referencing these media file extensions is a good way to discover users that download multimedia content that may need to be reviewed.

And as usual, if we dig deeper, we can create complementary control detection logic that focuses on a different file extension set – one that is VERY attached to pirated video media content:

  • ass – Advanced Sub Station Alpha
  • dfxp – Flash XML (Distribution Format Exchange Profile)
  • inqscr – InqScribe transcript
  • itt – iTunes Timed Text
  • jss – JACOsub
  • sami – Synchronized Accessible Media Interchange
  • sbv – YouTube format
  • scc – Scenarist Closed Captions
  • smi – Synchronized Accessible Media Interchange
  • srt – SubRip
  • ssa – Sub Station Alpha
  • stl – Spruce Subtitle File
  • sup – Blu-ray PGS
  • sup – SonicDVD Creater
  • ttml – Timed Text Markup Language
  • usf – Universal Subtitle Format
  • vtt – Web Video Text Tracks (WebVTT)

If you don’t know what these are, where have you been for the last 3 decades?? 🙂

These are subtitle files that often accompany the pirated media files. So, it goes without saying that a presence of these files can be seen as a low hanging fruit that can lead us to discovering other undesirable goodies in the folders that host them.

Another type of warez files we should look at are archives.

I mentioned them a few times in the past, but let’s be more systematic this time and focus on the telemetry referencing the container files created by the most popular archiving software very often used by the ‘scene’ that ‘releases’ warez to the public:

  • .rar, .7z, .zip, .cab, and
  • .arj, .lha, .kgb, .xz, and
  • multi-volume archives like
    • .7z.000, .7z.001, …,
    • .rar.000 .rar.001, …,
    • part1.rar, part2.rar, …
    • .r.01, .r.02, …,
    • .z.01, z.02, …,
    • .z01, .z02, …,
    • zx01, zx02, …,
    • .zip.001, .zip.002, …,
    • .cab, .part2.cab, …,
    • and older, or less common file archives: https://en.wikipedia.org/wiki/List_of_archive_formats

Hunting for file creation events that refer to files with these file extensions may lead to some very interesting discoveries.

And yes, as usual, there is more:

  • Any file creation event referencing .torrent file extension is of interest
  • Any command line invocation referencing “magnet:” link is of interest
  • Any DNS requests related to known torrent/magnet sites are of interest

As we explore this particular topic we may get tempted to leverage this approach to hunt for more specific content like pr0n & CSAM, but I do not want to cover these here, because handling these properly requires a completely different approach – one that is better left to experienced DFIR teams working together with Legal and HR departments. And that’s because in cases of True Positives employees lose jobs, and/or go to prison.

Now… as we come to the end of this quick & dirty hunting guide, I need to be fair and mention a little caveat. While hunting for Acceptable Use Policy violations is pretty easy, the actual remediation is extremely difficult. Some of these findings (and often in bulk) end up as items added to the company’s Risk Register. And anything that is listed there ends up being prioritized – AUP violations are always marked LOW on that priority list. Moreso, exploring AUPs in your environment will inevitably lead you to discover AUPs committed by the security personnel, including CISOs. There is no clear way to solve it long-term without some serious commitment of company’s security committee…

Hunting for the warez & other dodgy stuff people install / download, part 1

It is a sad IT fact, but employees install pirated/dodgy software on regular basis and download&execute whatever they want. There is no way to stop them… other than implementing a very strict software installation/program execution policy. Which obviously always ricochets and causes a negative user experience domino effect in any larger company, because employees simply need to install and/or use a lot of different random software, frequently, and often one that has to be installed quickly, ad hoc, to complete the task at hand, often with an aim to respond to client requests.

And support teams tasked with approving these ‘critical’ installs, even in these exemplary tightly controlled environments, are never able to keep up, let alone accurately (‘beyond any doubt’) confirm that this or that, ‘badly needed’ software is ‘safe’. And without such important and appropriate software assessment, all these installs must be treated as a risk…

The existing and more and more popular App Stores make it a bit easier for IT teams to manage this widespread issue, but the reality is that we still have to deal with a lot uncontrollable / unmanaged software installation events that are initiated when users freely download and install their ‘program du jour’ often sourced from questionable sources.

So…

How do we tackle this widespread Acceptable Use Policy (AUP) violation without being seen as a ‘nanny state’? And assuming the worst: an environment where installing a random software is a common event, how do we find the users who try to install pirated or otherwise unwanted software, specifically? And we want to find them not to punish them, but to reduce the attack surface?

The most basic, naive method will focus on telemetry searches for keywords like:

  • keygen
  • crack
  • serial
  • warez
  • patch

Of course, if you ever searched for these keywords in any larger org telemetry you are intimately familiar with lots of these False Positives:

  • red team and pentesting tools (AirCrack, L0phtcrack)
  • audio files (Nutcracker by Tchaikovsky)
  • documentation (InternetCrackUrl API)
  • local repos (sometimes pirated, admittedly) with cyber security courses – with their PDFs and tools referencing many of these keywords (f.ex. CEH materials)
  • forensic tools (password crackers)
  • random cloned github repos (easily hitting any keyword really)
  • e-books (crack <insert topic>, cracking <insert topic>, how to crack <insert topic>, etc.)
  • product names (Patchlink)
  • legitimate software patches (*patch*) including drivers, software updates, KBs, hot fixes, etc.
  • legitimate key generating tools f.ex. ssh-keygen
  • unexpected ‘false positives’ – matches on ‘wide’ keywords like ‘patch’ that catch things like ‘Dispatch’, ‘DispatcherNet’
  • etc.

When you run these naive (yet wide in scope) searches across your org you quickly realize that this is a type of threat hunting exercise where we do not immediately focus on critical/high-fidelity detections, but it’s a more advanced game of tuning the queries, and eyeballing the resulting reports looking for anomalies in an iterative way….The good news is that in my experience, basic telemetry searches focused on these 5 keywords listed above yield a lot of very interesting results.

Still, one may wonder. How can we make such analysis even more productive?

Systematic analysis of existing pirated software is a bit risky. After all, even downloading any of it can be considered a dubious endeavor. Luckily, there are many resources out there that we can leverage for our analysis w/o (hopefully) breaking any law. If you look at Github, you can find copies of old PirateBay databases out there that can be downloaded and analyzed.

Let’s clone this old repo and see where it takes us…

Running:

rg -i "(keygen|crack|serial|warez|patch)" piratebay_db_dump_2015_10_27T04_10_50_to_2019_09_14T22_09_31.csv | cut -f3 -d;> test

gives us a ‘test’ file including many interesting file names – all hits on our basic 5-words search. When we look at the resulting file we can immediately see an opportunity that is … cracking/release group names. They are ‘all over the place’, and are clearly defined (often using [group], {group} constructs incorporated into file names), and as such – easily extractable.

Running:

rg -i -o "([[a-z0-9]+]|{[a-z0-9]+})" test > groups.txt

gives us a list of many such cracking/release groups that incorporate their names into their ‘release’ torrent file names. A quick histogram of these can be found here.

There are obvious FPs on this list f.ex. [3840×2160], but most of the items seem to be good ‘search’ targets. In fact, looking for these group names incorporated into file names that we see in our ‘file creation’ telemetry may yield pretty good, accurate results.

Another observation we can make is that a lot of torrent file names seem to be incorporating our keyword list (keygen|crack|serial|warez|patch) in a very peculiar way. We see a lot of occurrences of these infixes:

  • With Crack
  • With Keygen
  • With Serial
  • With Patch
  • With Working
  • With Activation
  • With License
  • With Cheats
  • + crack
  • + patch
  • + keygen
  • + serial
  • + key
  • + cracked
  • + serials
  • + activator
  • + loader
  • + walkthrough
  • + license
  • + keymaker
  • + fix
  • + patcher
  • + working
  • + repack
  • + keyegn (yes, a typo!)
  • + preactivated
  • + multikeygen
  • + lisence (yes, another typo!)
  • + mod
  • + hotfix
  • + activated
  • [Full Installer]

These are really good keywords! It would seem that with such a simple exercise we have immediately extended our original keyword list to cover a lot more cases!

When you analyze data sets of interest it’s really important to focus on the end goal. Starting with some 5, FP-prone common ‘pirate’ words we ended up generating a much longer and accurate list of some very unique, pirating-specific, and actionable keywords…

Happy hunting!