Another stats from the sampleset – this time the most common entry points. One can generate a histogram of all Entry points for all executables on the whole system and potentially discover LFO (Least Frequency Occurrence) anomalies. This is not that straightforward though – even system32 directory gives lots of variety (see at the bottom): while most OS files are generated by the same compiler, system32 directory often holds many extra clean .exes that are accumulated over some time and give funny results (thanks to java/installers/and lots of other ‘goodness’ offered by badly written apps).
55 and 8B EC are very common since they are an equivalent of
55 push ebp
and
8B EC mov ebp, esp
If the entry point does NOT start with any of these (look at 1-bytes only, remember the sampleset is biased), high chances it is a polymorphic / packed sample:
Stats for first 1,2,3,4,5,6 bytes below:
 139672 55 <-- push ebp
 33107 68 <-- push xx
 14183 4D <-- MZ header
 13208 E8 <-- call
 12936 60 <-- pushad
 10048 6A <-- push xx
  5376 83 <-- various (e.g. cmp xx,yy)
  5363 E9 <-- long jump
  5222 EB <-- short jump
  4962 8B <-- mov xx,yy
 133124 55 8B <-- home work :)
 14173 4D 5A
  5037 60 E8
  4505 6A 60
  3870 55 89
  3145 83 7C
  3140 81 EC
  2377 6A 00
  1992 8B FF
  1826 64 A1
 132898 55 8B EC
  7731 4D 5A 90
  5786 4D 5A 50
  4492 6A 60 68
  3842 55 89 E5
  3191 60 E8 00
  3145 83 7C 24
  1821 81 EC 80
  1770 64 A1 00
  1584 8B FF 55
 53290 55 8B EC 83
 35846 55 8B EC 6A
 17214 55 8B EC 53
 12931 55 8B EC B9
  7712 4D 5A 90 00
  6536 55 8B EC 81
  5778 4D 5A 50 00
  3190 60 E8 00 00
  3137 83 7C 24 08
  2853 55 89 E5 83
 45104 55 8B EC 83 C4
 35729 55 8B EC 6A FF
 14775 55 8B EC 53 8B
  7711 4D 5A 90 00 03
  6638 55 8B EC 83 EC
  5775 4D 5A 50 00 02
  5258 55 8B EC 81 EC
  3190 60 E8 00 00 00
  3131 83 7C 24 08 01
  2801 55 89 E5 83 EC
 35498 55 8B EC 6A FF 68
 22712 55 8B EC 83 C4 F0
 14775 55 8B EC 53 8B 5D
  7711 4D 5A 90 00 03 00
  6959 55 8B EC 83 C4 C4
  5775 4D 5A 50 00 02 00
  3497 55 8B EC 83 C4 F4
  3190 60 E8 00 00 00 00
  3080 83 7C 24 08 01 75
  2152 55 8B EC 83 C4 B4
When I say the sampleset is biased, I mean it 🙂
Rrunning stats over executables within system32 directory, I got the following stats:
 1461 8B <-- mov xx,yy
  361 E8 <-- this is CALL
  329 4D <-- MZ header (not 'real' executable PE files)
   44 55 <-- push ebp = much lower value as for 300K malware sampleset
   32 83 <-- various (e.g. cmp xx,yy)
   23 53 <-- push ebx
   15 6A <-- push xx
   11 FF <-- various (can be CALL)
    2 E9 <-- long jump
    1 EB <-- short jump
and for 3 bytes:
 1447 8B FF 55 <-- mov edi, edi / push ebp
  329 4D 5A 90 <-- MZ header
   33 55 8B EC <-- push ebp / mov ebp, esp
   31 83 7C 24
   14 53 55 56
   11 8B 44 24
   10 55 89 E5
    9 E8 0A 00
    8 E8 DA 02
    8 6A 0C 68
If you want to quickly convert between bytes and the opcodes, you can use RTA.