Deconstructing apple doc sets


I was working on a patch for appledoc, when i stumbled across a very annoying bug: The icon of the navigation tree in Xcode does not match with the type of the document. Whatever I tried, they just showed up as folders. I couldn’t found anything in the official documentation, and even a good google session didn’t help. There were no more excuses. Time to investigate!

First, I should give you a little background information. Apple Doc Sets are what Microsoft calls compiled help files (.chm), but in a different format. I’ve spent some time with .chm files back in the days and boy are they touch. Apple’s format is actually much much simpler than that, once the veil of magic is lifted. In this article, i will try to explain everything there is to know (and perhaps a little more).

Let me give you a summary of Apple’s elaborate description of the Documentation Sets:

  1. Every page in a document set is basically just a html page (or a bundle). The pages can be grouped into folders any way you like.
  2. These pages are listed in a tree structure called the navigation tree. This tree is displayed to the left. Every entry in this tree is called a Node. Every node can have Subnodes,
  3. In order to search in the doc set, you must provide a list of tokens that refer to the pages or  one (or more) of the nodes.
  4. All meta-information of the documentation set (author, publisher, version, etc). are inside a plist.

These 4 parts are merged into a doc set bundle using the docsetutil command line tool. Mysteriously, it takes Apple more than 5 pages to explain this. But anyway, for a simple documentation set we have a directory structure like so:

com.companyname.docset  (Root Folder)
Contents (folder)
Resources (folder)
Documents (folder)

To get this directory tree properly displayed, the doc set viewer will need to know what are the actual nodes and topic to be displayed. It cannot do this by enumerating the directory structure, because some of the files are not to be displayed. So a description of the contents is needed. This description of the structure is provided on the form of a Nodes.xml file that contains the structure. To be able to actually search the documentation set, you will need to provide one or more Tokens.xml files. We’ll talk about Nodes.xml first.


node types

As said earlier, a documentation set has a tree structure that describes the different pages or content items available in the documentation.

You can see on the left, the iOS 6.1 doc set consists of one root node and several nested sub-nodes. Some of the nodes point to the same document but refer to a different section of the same page to make it easy to skip directly the relevant section. Let’s call these node anchor nodes.

It is noteworthy that Apple’s own documentation correctly uses folder icons, C (for reference: see) icons, and file icons. Not seen in this screenshot are the Bundle icons.

The documentation is quite descriptive on how to make these tree structures. Below is a sample nodes.xml file; i think you will get the picture. The file should be placed in the Resources folder next to the Documents  folder.
 <?xml version="1.0" encoding="UTF-8"?>
<DocSetNodes version="1.0">
            <Subnodes type="folder">
                        <Name>How To</Name>

As these XML files can get rather big to parse every time, the XML files must be compiled by calling the docsetutil commandline tool. docsetutil will read the nodes.xml file (and the tokens.xml files) and compiles them into a format that is more efficient to use. Note the type=”folder” in the xml to suggest the icon for the navigation tree. According to the documentation, the type can be one of “file“, “folder“, or “bundle“. Even though the documentation is very clear about itwhatever you set the value to, the icon never turns into a file icon. I have tried everything, every combination and every casing. Nothing seemed to help.  The bad thing is that you have to restart XCode after every change to reflect the changes in the navigation tree.
Wrong icons

So I started digging. Step one was searching for the schema for the nodes.xml file. A schema isn’t referenced by the NodesSchema.rng file (found in <Xcode>/Library/PrivateFrameworks/DocSetAccess.framework).  This file suggested to me that a node apparently has an undocumented attribute named documentType that can take any of three values (generic, sample code and reference). I decide to just try it. Hey it worked! I finally got the ‘C’ icon by using the documentType=”reference”!

But whatever I did, I still could get the file icon to show up. Not a combination of file and documentType made any difference. What on earth was going on and how could Apple pull this off? I just had to find a way to see how they did it. Opening up an existing docset from Apple didn’t help, the Nodes.xml file wasn’t there. Docsetutil produces 3 files, docset.dsidx and and docset.skidx. Opening the files in a hex editor didn’t help much. They are completely unreadable by hand.

After many, many, many, many google searches, i found this link which informed me that a docset is actually a set of html files and folders with a SQLite database and I thought “bingo”. Then took me just 3 hours and a lunch to find out I already have a command-line tool name sqlite3 that allowed me to peek into the databases. A little snooping around make it clear: Apple’s documentation was seriously lacking. Why? I will tell you.

I opened up the iOS6.1 docset and type ‘.schema‘ to list all the tables in the database. I quickly found a table name znode that contains all the information from the nodes.xml. A quick query revealed that Apple isn’t using the nodetype file when you expect it. It returned the following:
 sqlite> select count(*), zknodetype from znode group by zknodetype;
Only 9 ‘file‘ nodes out of more than 33000 nodes. That can’t be right. And what is that ‘section‘ type i’m seeing? That’s not in the schema or in the documentation? I decided to give it a go and modified my nodes.xml to <Node type=”section”>. Pfew and hurray. That did the trick.

Proper icons

I took a little more work to rework my findings into the sources of appledoc. By the time your read this these updates will probably have been merged work the main project for all to enjoy.

But the nice navigation isn’t all. The search and quick help has to work as well. We’ll tackle that in the next section.


Being able to browse the documentation is cool, but most of the time you have a specific issue. One time you know the name of a method but not its use, and other times you have no idea what classes you can use. If you know the method or property, you can click or opt-click it in the source editor to bring up the online help. For the other kind there’s the search bar in the Documentation Organizer.

A searchable index is created by docsetutil, so that kind of covers up the second kind. But the search can be optimized by providing additional tokens. These tokens are also used by the quick help. Using the tokens.xml you can add a huge amount of meta-data to the otherwise dumb symbols. This metadata includes availability, declaration, parameter description, etc. In essence all the information that is available in the html documentation, but tagged and enriched. The term used by Apple is API Lookup. Below is a sample of a token definition:


Closing up

That sure was a lot of technical stuff. It was worth it. Things have become much easier now I know how it works. And for you it has become easier because you didn’t have to find it out yourself. These findings have been added to the appledoc main branch by the time your read this. If not, you could alway check the pull request.

Happy Documenting! And as always, leave any comments below or use the contact page.