File Formats ZOO

In 2009 my wife drawn a lovely illustration for my upcoming book about malware analysis. Unfortunately, I couldn’t complete the book (for various reasons) and her work never saw it to the printer. I really liked that illustration though and have been always thinking that I will find a way to use it one day. Today is the day and I present it to you, together with a short information on some most popular file formats. File formats is a topic that has been discussed so many times that it is not even worth mentioning, yet I do hope that while skimming the short information below, you will still find something new there. I have more interesting file signatures to come and will publish them when I complete binary snapshots. Illustration will be there too 🙂

0x00 0x00 0x01 0x00

Windows Icon file (*.ico).

00 00 01 00 01 00 20 20 10 00 00 00 00 00 E8 02  ……  ……..

00 00 16 00 00 00 28 00 00 00 20 00 00 00 40 00  ……(… …@.

…

0x00 0x00 0x01

Mpg movie (*.mpg, *.mpe, *.mpeg).

00 00 01 BA 21 00 01 00 0F 80 0D F9 00 00 01 BB  .!………..

00 0C 80 0D F9 07 E1 FF B8 C0 20 B9 E0 28 00 00  ………. ..(..

…

0x00 0x01 0x00 0x00 Standard Jet DB

Microsoft Access database  (*.mdb, *.accdb).

00 01 00 00 53 74 61 6E 64 61 72 64 20 4A 65 74  ….Standard Jet

20 44 42 00 00 00 00 00 B5 6E 03 62 60 09 C2 55   DB……n.b`..U

…

. . 0x0D 0x0A

Python compiler script  (*.pyc).

D1 F2 0D 0A 7E 74 F3 47 63 00 00 00 00 00 00 00  ....~t.Gc…….

00 0B 00 00 00 40 00 00 00 73 FD 00 00 00 64 00  …..@…s….d.

…

0x1F 0x8B

Tar archive compressed using gzip (*.tgz).

1F 8B 08 00 03 83 74 3A 02 03 EC 3C FD 73 DB 36  ……t:…<.s.6

B2 FD D5 FC 2B 30 8E A6 B6 72 16 15 F9 2B 17 B9  ….+0…r…+..

…

!<arch>

Library file (*.lib).

21 3C 61 72 63 68 3E 0A 2F 20 20 20 20 20 20 20  !<arch>./

20 20 20 20 20 20 20 20 31 31 32 36 39 34 35 34          11269454

…

!<arch>.debian-binary

Debian software package (*.deb).

21 3C 61 72 63 68 3E 0A 64 65 62 69 61 6E 2D 62  !<arch>.debian-b

69 6E 61 72 79 20 20 20 31 32 30 36 36 34 30 32  inary   11066402

…

%PDF

PDF document File (*.pdf).

25 50 44 46 2D 31 2E 33 0D 25 E2 E3 CF D3 0D 0A  %PDF-1.3.%……

36 20 30 20 6F 62 6A 0D 3C 3C 20 0D 2F 4C 69 6E  6 0 obj.<< ./Lin

…

.RMF

RMVB movie (*.rm, *.rmvb).

2E 52 4D 46 00 00 00 12 00 01 00 00 00 00 00 00  .RMF…………

00 07 50 52 4F 50 00 00 00 32 00 00 00 1C FD E0  ..PROP…2……

…

0& 0xB2 u

ASF or WMV movie (*.asf, *.wmv).

30 26 B2 75 8E 66 CF 11 A6 D9 00 AA 00 62 CE 6C  0&.u.f…….b.l

85 02 00 00 00 00 00 00 05 00 00 00 01 02 A1 DC  …………….

…

7z

7Zip archive (*.7z).

37 7A BC AF 27 1C 00 03 11 05 8F B2 13 00 00 00  7z..’………..

00 00 00 00 54 00 00 00 00 00 00 00 8F 51 A0 B5  ….T……..Q..

…

?_

Old Windows Help format (*.hlp).

3F 5F 03 00 0C 01 00 00 FF FF FF FF 1B 39 00 00  ?_………..9..

FC 00 00 00 F3 00 00 00 00 6C 03 21 00 01 00 21  ………l.!…!

…

BM

Bitmap file (*.bmp).

42 4D 38 00 1B 00 00 00 00 00 36 00 00 00 28 00  BM8…….6…(.

00 00 00 03 00 00 40 02 00 00 01 00 20 00 00 00  ……@….. …

…

BZh

Archive compressed using Bzip2 (*.bz, *.bz2, *.bzip2).

42 5A 68 39 31 41 59 26 53 59 B6 0D 89 62 00 8F  BZh91AY&SY…b..

C8 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  ……………….

…

CWS

Compressed Flash movie (*.swf).

43 57 53 08 AD C6 00 00 78 9C E4 BD 07 5C 13 CB  CWS…..x….\..

F7 28 3E 1B 12 B2 81 D0 41 50 83 62 07 41 11 EC  .(>…..AP.b.A..

…

d8:announce

Torrent file (*.torrent).

64 38 3A 61 6E 6E 6F 75 6E 63 65 33 39 3A 68 74  d8:announce39:ht

74 70 3A 2F 2F 74 6F 72 72 65 6E 74 2E 75 62 75  tp://torrent.ubu

…

FLV

Flash Video file (*.flv).

46 4C 56 01 05 00 00 00 09 00 00 00 00 12 00 01  FLV.…………

C2 00 00 00 00 00 00 00 02 00 0A 6F 6E 4D 65 74  ………..onMet

…

…ftyp

Quicktime movie (*.mov).

00 00 00 20 66 74 79 70 71 74 20 20 20 05 03 00  … ftypqt   …

71 74 20 20 00 00 00 00 00 00 00 00 00 00 00 00  qt  …………

…

From: <Saved by Windows Internet Explorer>

MIME HTML archive which may contain various files saved in a MIME format (*.mht).

46 72 6F 6D 3A 20 3C 53 61 76 65 64 20 62 79 20  From: <Saved by

57 69 6E 64 6F 77 73 20 49 6E 74 65 72 6E 65 74  Windows Internet

20 45 78 70 6C 6F 72 65 72 20 37 3E 0D 0A 53 75   Explorer 7>..Su

…

GIF87a

Picture saved in GIF 87a format (*.gif).

47 49 46 38 37 61 59 00 6D 00 F7 00 00 00 00 00  GIF87aY.m…….

00 00 40 00 00 80 00 00 FF 00 20 00 00 20 40 00  ..@……. .. @.

…

GIF89a

Picture saved in GIF 89a format (*.gif).

47 49 46 38 39 61 01 00 01 00 80 00 00 FF FF FF  GIF89a……….

00 00 00 21 F9 04 01 00 00 00 00 2C 00 00 00 00  …!…….,….

…

ID3

Mp3 music file (*.mp3).

49 44 33 03 00 00 00 00 06 46 54 45 4E 43 00 00  ID3……FTENC..

00 01 40 00 00 00 00 00 00 00 00 00 02 00 00 00  ..@………….

…

IDA1

The database of IDA Pro disassembler (*.ida).

49 44 41 31 00 00 3E 00 00 00 43 60 01 00 48 E0  IDA1..>…C`..H.

01 00 00 00 00 00 4D 20 02 00 DD CC BB AA 01 00  ……M ……..

…

II

Image saved in TIFF (Intel) file format (*.tif, *.tiff).

49 49 2A 00 18 CA 34 00 2C 30 33 35 37 3B 34 35  II*…4.,0357;45

39 38 38 3D 38 37 3C 35 34 39 33 31 36 31 2F 34  988=87<5493161/4

…

ISC(

InstallShield Cabinet File (*.cab). Requires a separate installer called setup.exe.

49 53 63 28 0C 60 00 01 00 00 00 00 00 02 00 00  ISc(.`……….

00 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00  …………….

…

ITSF

Windows Help File (*.chm).

49 54 53 46 03 00 00 00 60 00 00 00 01 00 00 00  ITSF….`…….

40 62 C0 46 09 04 00 00 10 FD 01 7C AA 7B D0 11  @b.F…….|.{..

…

KGB_arch

Archive file created by KGB compression utility (*.kgb).

4B 47 42 5F 61 72 63 68 20 2D 33 0D 0A 32 35 30  KGB_arch -3..250

30 33 32 09 72 65 61 64 6D 65 2E 74 78 74 0D 0A  032.readme.txt..

…

L 0x00 0x00 0x00

Windows shortcut file (*.lnk).

4C 00 00 00 01 14 02 00 00 00 00 00 C0 00 00 00  L……………

00 00 00 46 CB 40 00 00 20 00 00 00 F4 AA 17 AE  …F.@.. …….

…

L 0x01 0x05

Object file (*.obj).

4C 01 05 00 67 20 93 45 76 0A 00 00 3C 00 00 00  L...g .Ev…<…

00 00 00 00 2E 74 65 78 74 00 00 00 00 00 00 00  …..text…….

…

MM

Image saved in TIFF (Motorola) file format (*.tif, *.tiff).

4D 4D 00 2A 00 00 0D 32 81 FF CD FF FB FF FF FE  MM.*…2……..

01 FF FD FA FE 06 FF FE FE FF FE FE FF FF FE FD  …………….

…

…moov

Quicktime movie (*.mov).

00 00 41 DE 6D 6F 6F 76 00 00 00 6C 6D 76 68 64  ..A.moov…lmvhd

00 00 00 00 BD 38 15 59 BD 38 15 59 00 00 02 58  …..8.Y.8.Y…X

…

MP+

Musepack Audio File (*.mpc).

4D 50 2B 07 81 35 00 00 00 00 C0 5F 00 00 00 00  MP+..5….._….

00 00 00 00 00 00 C0 80 F7 07 02 73 5A 3B 8B 80  ………..sZ;..

…

MSCF

Microsoft Cabinet File  (*.cab).

4D 53 43 46 00 00 00 00 8E 07 3E 00 00 00 00 00  MSCF……>…..

2C 00 00 00 00 00 00 00 03 01 01 00 01 00 00 00  ,……………

…

MZ

Windows/DOS executable (*.exe, *.dll, *.sys, *.cpl, *.ocx, and others).

4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00  MZ…………..

B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00  ……..@…….

…

OggS

Music file saved in OggS format (*.ogg).

4F 67 67 53 00 02 00 00 00 00 00 00 00 00 67 0B  OggS……….g.

00 00 00 00 00 00 46 7D C7 F2 01 1E 01 76 6F 72  ……F}…..vor

…

PK

Zip Archive;  used by Java (e.g. JAR files) and Microsoft Office 2007 (*.zip, *.jar, *.docx, and others).

50 4B 03 04 14 00 02 00 00 00 F8 43 36 38 00 00  PK………C68..

00 00 00 00 00 00 00 00 00 00 16 00 00 00 45 78  …………..Ex

…

Rar!

Rar Archive (*.rar, *.r00, *.r01, …, part1.rar, part2.rar, …).

52 61 72 21 1A 07 00 CF 90 73 00 00 0D 00 00 00  Rar!…..s……

00 00 00 00 31 A3 74 C0 90 2E 00 3F F9 3B 00 00  ….1.t….?.;..

…

regf

Windows registry file (*.dat, *.<no extension>).

72 65 67 66 01 00 00 00 01 00 00 00 00 00 00 00  regf…………

00 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00  …………….

…

RIFF…ACON

Animated cursor (*.cur).

52 49 46 46 50 3A 00 00 41 43 4F 4E 4C 49 53 54  RIFFP:..ACONLIST

46 00 00 00 49 4E 46 4F 49 4E 41 4D 0B 00 00 00  F…INFOINAM….

…

RIFF…AVI

AVI movie  (*.avi).

52 49 46 46 88 51 5A 01 41 56 49 20 4C 49 53 54  RIFF.QZ.AVI LIST

46 01 00 00 68 64 72 6C 61 76 69 68 38 00 00 00  F…hdrlavih8……

…

SZDD

A file compressed with Microsoft program compress.exe  (*.??_ e.g. *.ex_ for compressed *.exe).

53 5A 44 44 88 F0 27 33 41 65 00 74 00 00 FF 4D  SZDD..’3Ae.t…M

5A 90 00 03 00 00 00 7D 04 F5 F0 FF FF 00 00 B8  Z……}……..

…

0x60 0xEA

Arj archive (*.arj).

60 EA 2E 00 22 0B 01 0A 10 00 02 EB EB BC 86 3A  `...”……….:

EB BC 86 3A 00 00 00 00 00 00 00 00 00 00 00 00  …:…………

…

0x78 0x01

DMG image for Mac (*.dmg).

78 01 ED 9D 0B 80 1D 55 7D FF 67 E6 3E F7 BE 76  x..….U}.g.>..v

49 78 04 44 5C F3 8F 2B 41 B2 5D 48 08 81 50 59  Ix.D\..+A.]H..PY

…

{\rtf

Document saved in Rich Text Format (RTF) (*.rtf).

7B 5C 72 74 66 31 5C 61 64 65 66 6C 61 6E 67 31  {\rtf1\adeflang1

30 32 35 5C 61 6E 73 69 5C 61 6E 73 69 63 70 67  025\ansi\ansicpg

…

0x7F ELF

Linux executable  (*.<no extension>, *.so).

7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00  .ELF…………

02 00 03 00 01 00 00 00 00 81 04 08 34 00 00 00  …………4…

…

0x89 PNG

An image saved in PNG format (*.png).

89 50 4E 47 0D 0A 1A 0A 00 00 00 0D 49 48 44 52  .PNG……..IHDR

00 00 03 D5 00 00 02 78 08 02 00 00 00 E4 DD 57  …….x…….W

…

0xCA 0xFE 0xBA 0xBE (CAFEBABE)

Java file (*.class) or Mac Mach-O Universal binary (*.app).

CA FE BA BE 00 00 00 32 00 C0 0A 00 30 00 6A 09  …….2….0.j.

00 2F 00 6B 07 00 6C 08 00 6D 0A 00 03 00 6E 09  ./.k..l..m….n.

…

0xD0  0xCF  0x11 0xE0 (D0CF11E)

Compound OLE file from Microsoft  (*.doc, *.xls, *.msi, and others).

D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00  …………….

00 00 00 00 00 00 00 00 3E 00 03 00 FE FF 09 00  ……..>…….

…

0xED 0xAB 0xEE 0xDB

Red Hat Package Manager File (*.rpm).

ED AB EE DB 03 00 00 00 00 01 74 75 78 70 61 69  ….……tuxpai

6E 74 2D 30 2E 39 2E 32 30 2D 31 2E 66 38 5F 66  nt-0.9.20-1.f8_f

…

0xEF 0xBB 0xBF

Text encoded in UTF8 (*.txt, *.utf8, and others).

EF BB BF 54 68 69 73 20 69 73 20 61 20 73 69 6D  …This is a sim

70 6C 65 20 74 65 78 74 20 66 69 6C 65 20 2E 2E  ple text file ..

…

0xFF 0xD8…JFIF

Picture saved in a JPEG format (*.jpg, *.jpe, *.jpeg).

FF D8 FF E0 00 10 4A 46 49 46 00 01 02 00 00 64  ……JFIF…..d

00 64 00 00 FF FE 00 12 41 64 6F 62 65 20 49 6D  .d……Adobe Im

…

0xFE 0xFF

Text encoded in UTF16BE (*.txt, and others).

FE FF 00 54 00 68 00 69 00 73 00 20 00 69 00 73  …T.h.i.s. .i.s

00 20 00 61 00 20 00 73 00 69 00 6D 00 70 00 6C  . .a. .s.i.m.p.l

…

0xFF 0xFE

Text encoded in UTF16LE (*.txt, and others).

FF FE 54 00 68 00 69 00 73 00 20 00 69 00 73 00  ..T.h.i.s. .i.s.

20 00 61 00 20 00 73 00 69 00 6D 00 70 00 6C 00   .a. .s.i.m.p.l.

…

 

Update

It’s been a while since I wrote anything here. This is due to me being on holidays and moving to a new place right after coming back. I finally settled down in a new apartment and looking forward to play with some new ideas.

So, here is a short update:

  • I fixed a silly bug in HAPI – I mixed up CR & LF characters in the output and it looked awkward to say the least, not to mention potential parsing issues; Thx to Pedro L. for spotting this and notifying me
  • HAPI may occasionally print some strings that look like non-API, e.g. ‘version’; this is not a bug, but a feature 😉 it turns out that there is such an API exported by one of the Microsoft DLLs ; since I don’t want to miss any API, I made a trade off and include all of them; still… I use some little heuristics to prevent printing many of them, but some of them will sometimes go through; so, please always verify the output manually; and for the curious – some Microsoft programmers decided to name certain APIs using one, or two characters; I dunno why do you do stuff like this, but there are legitimate system DLLs exporting functions named ‘u’, ‘vo’, etc.
  • Discovered recently that Symantec’s VBN files can be encrypted not only with 0x5A, but also 0xA5; these files are still handled by DeXRAY since it relies on a XRAYS technique that searches and extracts encrypted executables without needing to know a specific key; but if you parse VBN files yourself, knowing that 0xA5 is being used may help you to save some time