Another stats from the sampleset – this time the most common entry points. One can generate a histogram of all Entry points for all executables on the whole system and potentially discover LFO (Least Frequency Occurrence) anomalies. This is not that straightforward though – even system32 directory gives lots of variety (see at the bottom): while most OS files are generated by the same compiler, system32 directory often holds many extra clean .exes that are accumulated over some time and give funny results (thanks to java/installers/and lots of other ‘goodness’ offered by badly written apps).
55 and 8B EC are very common since they are an equivalent of
55 push ebp and
8B EC mov ebp, esp
If the entry point does NOT start with any of these (look at 1-bytes only, remember the sampleset is biased), high chances it is a polymorphic / packed sample:
Stats for first 1,2,3,4,5,6 bytes below:
139672 55 <-- push ebp 33107 68 <-- push xx 14183 4D <-- MZ header 13208 E8 <-- call 12936 60 <-- pushad 10048 6A <-- push xx 5376 83 <-- various (e.g. cmp xx,yy) 5363 E9 <-- long jump 5222 EB <-- short jump 4962 8B <-- mov xx,yy 133124 55 8B <-- home work :) 14173 4D 5A 5037 60 E8 4505 6A 60 3870 55 89 3145 83 7C 3140 81 EC 2377 6A 00 1992 8B FF 1826 64 A1 132898 55 8B EC 7731 4D 5A 90 5786 4D 5A 50 4492 6A 60 68 3842 55 89 E5 3191 60 E8 00 3145 83 7C 24 1821 81 EC 80 1770 64 A1 00 1584 8B FF 55 53290 55 8B EC 83 35846 55 8B EC 6A 17214 55 8B EC 53 12931 55 8B EC B9 7712 4D 5A 90 00 6536 55 8B EC 81 5778 4D 5A 50 00 3190 60 E8 00 00 3137 83 7C 24 08 2853 55 89 E5 83 45104 55 8B EC 83 C4 35729 55 8B EC 6A FF 14775 55 8B EC 53 8B 7711 4D 5A 90 00 03 6638 55 8B EC 83 EC 5775 4D 5A 50 00 02 5258 55 8B EC 81 EC 3190 60 E8 00 00 00 3131 83 7C 24 08 01 2801 55 89 E5 83 EC 35498 55 8B EC 6A FF 68 22712 55 8B EC 83 C4 F0 14775 55 8B EC 53 8B 5D 7711 4D 5A 90 00 03 00 6959 55 8B EC 83 C4 C4 5775 4D 5A 50 00 02 00 3497 55 8B EC 83 C4 F4 3190 60 E8 00 00 00 00 3080 83 7C 24 08 01 75 2152 55 8B EC 83 C4 B4
When I say the sampleset is biased, I mean it 🙂
Rrunning stats over executables within system32 directory, I got the following stats:
1461 8B <-- mov xx,yy 361 E8 <-- this is CALL 329 4D <-- MZ header (not 'real' executable PE files) 44 55 <-- push ebp = much lower value as for 300K malware sampleset 32 83 <-- various (e.g. cmp xx,yy) 23 53 <-- push ebx 15 6A <-- push xx 11 FF <-- various (can be CALL) 2 E9 <-- long jump 1 EB <-- short jump
and for 3 bytes:
1447 8B FF 55 <-- mov edi, edi / push ebp 329 4D 5A 90 <-- MZ header 33 55 8B EC <-- push ebp / mov ebp, esp 31 83 7C 24 14 53 55 56 11 8B 44 24 10 55 89 E5 9 E8 0A 00 8 E8 DA 02 8 6A 0C 68
If you want to quickly convert between bytes and the opcodes, you can use RTA.