Batch decompilation with IDA / Hex-Rays Decompiler

if you are very used to 32-bit IDA you may sometimes find yourself in a blind alley when you try to port your working solution to IDA 64-bit. This was the case with my old batch decompilation script.

The way it works is very simple – for every <file> in a folder, run IDA in its automation/batch mode mode, decompile the <file>, and finally save it in a <file>.c file – more or less like the below (I am omitting the loop):

c:\Ida\idaw.exe -A -Ohexrays:-new:%%k.c:ALL “%%k”

Nothing could be simpler.

Until you run it with the 64-bit idaw64.exe:

c:\Ida\idaw64.exe -A -Ohexrays:-new:%%k.c:ALL “%%k”

It doesn’t work. It loads idaw64 and just stays there.

The gotcha is in a plug-in name. The 64-bit decompiler’s plugin name is not hexrays, it’s not hexrays64 either. It is actually hexx64.dll.

So, you have to run this instead:

c:\Ida\idaw64.exe -A -Ohexx64:-new:%%k.c:ALL “%%k”

It’s ridiculously trivial, but it’s always the little things.

Also, interestingly, when you google hexx64.dll or hexx64.p64 you only get a few hits. As if not too many ppl ever came across the issue.

Another gotcha is that if you run it with too many files, your system’s performance will deteriorate quickly. I don’t know if it is memory fragmentation/leaks, or something else, but after running the script on a number of samples I observed my VM dying on me and requiring a restart due to low memory (despite no other process running on a 2G RAM guest). If you know what causes it I would be grateful if you could let me know.

The third gotcha is to rely on the text version of IDA for this task – it is faster than the GUI version. At least in my experience.

Finally, the last gotcha is to remove all the other plugins from the IDA’s Plugins directory, other than the one you are using e.g. hexrays. Why? This may look like nothing, but IDA enumerates and loads all of them _each_ time it starts.

It’s understood… that EU dudes… sell GDPRization…

I’ve been recently thinking of GDPR, and its influence on the non-EU websites… in particular, I was curious how the legislation affects the user experience for non-EU sites for visitors from EU. We hear about many websites in US simply denying the access e.g. LA Times:

but I was curious how many other web sites really do so…

I came up with a quick & dirty (and pretty simple) idea of checking how the popular web sites respond to the regulation… by visiting them and taking a screenshot.

Of course, manual check would be too labor-intensive, so I automated it.

First, I needed a list of top world web sites so I downloaded the Cisco Umbrella list. I know it’s biased, but don’t know any better source (since the  free Alexa top 1M is long gone, and others – I really don’t know how accurate they are).

I then created a simple script in perl to extract the first 10000 top unique domains from the list (and exclude all subdomains on the way):

use strict;
use warnings;
my %h;
my $cnt=0;
while (<>)
{
  if (/,([^\.]+\.[^\.]+$)/)
  {
    if (!defined($h{$1}))
    {
      print "$1";
      $h{$1}++;
      $cnt++;
      exit if $cnt >= 10000;
    }
  }
}

Next, I wrote a simple phantomjs script to grab a screenshot of these domains (all accessed via http and then rerunning for https for these that didn’t work):

system = require('system')
var page = require('webpage').create();
     page.viewportSize = { width: 1024, height: 768 };
     page.clipRect = { top: 0, left: 0, width: 1024, height: 768 };
address = system.args[1];
output  = system.args[2];
page.open(address, function() {
  page.render(output);
  phantom.exit();
});

And then I ran the phantomjs on domains from this data set… each page visited is saved as a png.

To my surprise, the experiment didn’t work as I anticipated.

Most of web sites visited didn’t really make any comment on GDPR and it was business as usual. Some offer an option to accept new privacy policies. In the end I only came up with a bunch of examples.

Still, it was worth trying…

Lessons learned…

  • Some web sites detect phantom JS as a bot – they will block your IP, or offer a captcha challenge
  • Lots of top domains don’t even host a web site; you can see default IIS, Nginx pages, errors (404, 403s ;))
  • Privacy banners, if they exist, are handled in many different ways – from simple OK, to more advanced settings with a multi-choice questionnaire; I include some example below
  • Many non-English web sites provide information about privacy in their native language; this is an interesting conundrum to solve in general – how a non-speaker can use the web site w/o an ability to understand the Privacy Policies? I provide some examples in French, Italian and Dutch (and of course, English)
  • Way too many advertising and marketing web sites, all united to promise you the best monetization ever; and yes, AI-based advertising is already here 🙂

I am wondering if the methodology I used was incorrect? Perhaps it would be faster to just query google for all the web sites that refer to GDPR? I couldn’t come up with a good google dork though. And searching still brings many of such geo-locked web sites and include them in ‘normal’ results. You only learn about GDPR stuff when you try to visit the actual page. Google cache is still available though in some cases. So… I guess this transitional stage will last for some time. If you have any idea on how to run a research like this better, please let me know.

And finally some screenshots

diynetwork.com

goodrx.com

chicago tribune

Collect and gather

Everquote

Fubo.tv

Gannet

Orlando Sentinel

Pandora (not sure if it is GDPR related though)

Myspace

Ebates

European Union page itself

Atlas Obscura

At Hoc

Cosmopolitan

My recipes

piwikpro

Simpsons World

Le Monde

Meteo IT

NOS

And finally NSFW, all the screenshots related to porn.