Random stats from 300k malicious samples – Entry Points

Another stats from the sampleset – this time the most common entry points. One can generate a histogram of all Entry points for all executables on the whole system and potentially discover LFO (Least Frequency Occurrence) anomalies. This is not that straightforward though – even system32 directory gives lots of variety (see at the bottom): while most OS files are generated by the same compiler, system32 directory often holds many extra clean .exes that are accumulated over some time and give funny results (thanks to java/installers/and lots of other ‘goodness’ offered by badly written apps).

55 and 8B EC are very common since they are an equivalent of

  55      push ebp

and
  8B EC   mov ebp, esp

If the entry point does NOT start with any of these (look at 1-bytes only, remember the sampleset is biased), high chances it is a polymorphic / packed sample:

Stats for first 1,2,3,4,5,6 bytes below:

 139672 55 <-- push ebp
  33107 68 <-- push xx
  14183 4D <-- MZ header
  13208 E8 <-- call
  12936 60 <-- pushad
  10048 6A <-- push xx
   5376 83 <-- various (e.g. cmp xx,yy)
   5363 E9 <-- long jump
   5222 EB <-- short jump
   4962 8B <-- mov xx,yy

 133124 55 8B <-- home work :)
  14173 4D 5A
   5037 60 E8
   4505 6A 60
   3870 55 89
   3145 83 7C
   3140 81 EC
   2377 6A 00
   1992 8B FF
   1826 64 A1

 132898 55 8B EC
   7731 4D 5A 90
   5786 4D 5A 50
   4492 6A 60 68
   3842 55 89 E5
   3191 60 E8 00
   3145 83 7C 24
   1821 81 EC 80
   1770 64 A1 00
   1584 8B FF 55

  53290 55 8B EC 83
  35846 55 8B EC 6A
  17214 55 8B EC 53
  12931 55 8B EC B9
   7712 4D 5A 90 00
   6536 55 8B EC 81
   5778 4D 5A 50 00
   3190 60 E8 00 00
   3137 83 7C 24 08
   2853 55 89 E5 83

  45104 55 8B EC 83 C4
  35729 55 8B EC 6A FF
  14775 55 8B EC 53 8B
   7711 4D 5A 90 00 03
   6638 55 8B EC 83 EC
   5775 4D 5A 50 00 02
   5258 55 8B EC 81 EC
   3190 60 E8 00 00 00
   3131 83 7C 24 08 01
   2801 55 89 E5 83 EC

  35498 55 8B EC 6A FF 68
  22712 55 8B EC 83 C4 F0
  14775 55 8B EC 53 8B 5D
   7711 4D 5A 90 00 03 00
   6959 55 8B EC 83 C4 C4
   5775 4D 5A 50 00 02 00
   3497 55 8B EC 83 C4 F4
   3190 60 E8 00 00 00 00
   3080 83 7C 24 08 01 75
   2152 55 8B EC 83 C4 B4

When I say the sampleset is biased, I mean it 🙂

Rrunning stats over executables within system32 directory, I got the following stats:

  1461 8B <-- mov xx,yy
   361 E8 <-- this is CALL
   329 4D <-- MZ header (not 'real' executable PE files)
    44 55 <-- push ebp = much lower value as for 300K malware sampleset
    32 83 <-- various (e.g. cmp xx,yy)
    23 53 <-- push ebx
    15 6A <-- push xx
    11 FF <-- various (can be CALL)
     2 E9 <-- long jump
     1 EB <-- short jump

and for 3 bytes:

  1447 8B FF 55 <-- mov edi, edi / push ebp
   329 4D 5A 90 <-- MZ header
    33 55 8B EC <-- push ebp / mov ebp, esp
    31 83 7C 24
    14 53 55 56
    11 8B 44 24
    10 55 89 E5
     9 E8 0A 00
     8 E8 DA 02
     8 6A 0C 68

If you want to quickly convert between bytes and the opcodes, you can use RTA.