Another stats from the sampleset – this time the most common entry points. One can generate a histogram of all Entry points for all executables on the whole system and potentially discover LFO (Least Frequency Occurrence) anomalies. This is not that straightforward though – even system32 directory gives lots of variety (see at the bottom): while most OS files are generated by the same compiler, system32 directory often holds many extra clean .exes that are accumulated over some time and give funny results (thanks to java/installers/and lots of other ‘goodness’ offered by badly written apps).
55 and 8B EC are very common since they are an equivalent of
55 push ebp
and
8B EC mov ebp, esp
If the entry point does NOT start with any of these (look at 1-bytes only, remember the sampleset is biased), high chances it is a polymorphic / packed sample:
Stats for first 1,2,3,4,5,6 bytes below:
139672 55 <-- push ebp
33107 68 <-- push xx
14183 4D <-- MZ header
13208 E8 <-- call
12936 60 <-- pushad
10048 6A <-- push xx
5376 83 <-- various (e.g. cmp xx,yy)
5363 E9 <-- long jump
5222 EB <-- short jump
4962 8B <-- mov xx,yy
133124 55 8B <-- home work :)
14173 4D 5A
5037 60 E8
4505 6A 60
3870 55 89
3145 83 7C
3140 81 EC
2377 6A 00
1992 8B FF
1826 64 A1
132898 55 8B EC
7731 4D 5A 90
5786 4D 5A 50
4492 6A 60 68
3842 55 89 E5
3191 60 E8 00
3145 83 7C 24
1821 81 EC 80
1770 64 A1 00
1584 8B FF 55
53290 55 8B EC 83
35846 55 8B EC 6A
17214 55 8B EC 53
12931 55 8B EC B9
7712 4D 5A 90 00
6536 55 8B EC 81
5778 4D 5A 50 00
3190 60 E8 00 00
3137 83 7C 24 08
2853 55 89 E5 83
45104 55 8B EC 83 C4
35729 55 8B EC 6A FF
14775 55 8B EC 53 8B
7711 4D 5A 90 00 03
6638 55 8B EC 83 EC
5775 4D 5A 50 00 02
5258 55 8B EC 81 EC
3190 60 E8 00 00 00
3131 83 7C 24 08 01
2801 55 89 E5 83 EC
35498 55 8B EC 6A FF 68
22712 55 8B EC 83 C4 F0
14775 55 8B EC 53 8B 5D
7711 4D 5A 90 00 03 00
6959 55 8B EC 83 C4 C4
5775 4D 5A 50 00 02 00
3497 55 8B EC 83 C4 F4
3190 60 E8 00 00 00 00
3080 83 7C 24 08 01 75
2152 55 8B EC 83 C4 B4
When I say the sampleset is biased, I mean it 🙂
Rrunning stats over executables within system32 directory, I got the following stats:
1461 8B <-- mov xx,yy
361 E8 <-- this is CALL
329 4D <-- MZ header (not 'real' executable PE files)
44 55 <-- push ebp = much lower value as for 300K malware sampleset
32 83 <-- various (e.g. cmp xx,yy)
23 53 <-- push ebx
15 6A <-- push xx
11 FF <-- various (can be CALL)
2 E9 <-- long jump
1 EB <-- short jump
and for 3 bytes:
1447 8B FF 55 <-- mov edi, edi / push ebp
329 4D 5A 90 <-- MZ header
33 55 8B EC <-- push ebp / mov ebp, esp
31 83 7C 24
14 53 55 56
11 8B 44 24
10 55 89 E5
9 E8 0A 00
8 E8 DA 02
8 6A 0C 68
If you want to quickly convert between bytes and the opcodes, you can use RTA.