Archaeology

URL Bookmarks and Security-scoping

Much of this discussion is based on reverse-engineering of file formats and frameworks, but we haven't bothered to pepper it with qualifiers. Since our reverse-engineering skills are not beyond reproach, and macOS is always changing, a grain of salt is advised. If you have corrections to any details, please do get in touch.

What Is A URL Bookmark?

As introduced in Mac OS X 10.6 (Snow Leopard), a URL bookmark is a serialization of a file: URL, together with additional data that improves the chances of that URL being usefully rebuilt later — even if the actual file has been renamed or moved in the interim. In addition to the path itself, a bookmark contains inode and volume information, for example.

In Mac OS X 10.7 (Lion), to support the App Sandbox, security-scoped URL bookmarks were introduced. But in order to understand these, we need to take a detour into security-scoped URLs, which requires another detour into sandbox extensions.

Before diving into this, note that security-scoped bookmarks and security-scoped URLs are not the same thing — they are related and you can make one from the other, but there are valid reasons to, say, make a non-security-scoped bookmark from a security-scoped URL. So don't let the overuse of the term security-scoping trip you up.

A Detour Into Sandbox Extensions

A sandboxed process has a detailed list of capabilities that it is allowed or denied, such as being able to open specific files for reading and/or writing. Broadly, sandboxed processes are allowed to read system files (e.g. the root-level System or Library folders) but not files under your home folder (except for being able to read and write files inside their own container).

Typically, a sandboxed app gains access to a specific user file by the user selecting it in a standard macOS Open dialog. The Open dialog is controlled by a macOS service (com.apple.appkit.xpc.openAndSavePanelService.xpc), which is not sandboxed and has full access to user files. In order to transfer that access to the requesting sandboxed app — for the selected user file only — it uses a sandbox extension.

More generally, any process that has the ability to read or write a specific file might need to transfer that ability to a related (sandboxed) process, such as an XPC service that it uses to process the file in some way. The original process might've acquired that ability in various ways — whether through the Open dialog, or simply by virtue of not being sandboxed itself — but as long as it can access the file, it can transfer that ability to another process using a sandbox extension.

Essentially, a sandbox extension is a token, vended by the kernel, which allows any process possessing it to acquire a specific capability — such as being able to open a specific path as read-only or read-write. (Actually, extensions can be limited to a specific process, but these are less interesting to this discussion.) The original process asks the kernel to issue an extension, and hands the resulting token to some other (sandboxed) process, which asks the kernel to consume the extension, granting it the encapsulated capability.

All of this happens underneath the public APIs for security-scoped URLs, as we'll discuss below. The private API involved here is mainly in libsystem_sandbox.dylib, which has functions like sandbox_extension_issue_file() and sandbox_extension_consume(). There are also extensions that are not file-related, such as sandbox_extension_issue_mach(), which extends the capability to look up a Mach service by name; but only file-related extensions are relevant to this discussion. These sandbox functions are basically shims around a system call, which goes into the kernel and gets handled by Sandbox.kext.

The sandbox extension token itself is actually formatted as a string, which might look something like this:

1bfe955dde5d40a9395dd9f9687c9aabff654f7f3cb99b71b24357557f1e3377;00;00000000;00000000;00000000;0000000000000020;com.apple.app-sandbox.read-write;01;01000005;0000000000c23f8e;23;/users/randy/desktop/todo.txt

You can see that it has a capability (com.apple.app-sandbox.read-write) and a (downcased) file path (/users/randy/desktop/todo.txt). The other semicolon-delimited fields contain various information about the extension and the specific file (such as the volume and inode of the file), but that's all beyond the scope of this discussion.

The first hex-encoded value is worth mentioning, though: this is a message authentication code that authenticates the extension as valid. Specifically, it is an HMAC-SHA256, which is calculated on the remainder of the token string (from the first semicolon), using a 64-byte secret key that is randomly chosen by Sandbox.kext after startup. Obviously, the kernel will refuse to consume an extension unless this HMAC is deemed correct, per the private-to-the-kernel secret key.

The implication here is that sandbox extensions are transient: they survive the process that issued them (at least, extensions of the non-process-specific variety), but will be useless after system restart. Which is why Security-scoped bookmarks are a thing...

What Is A Security-scoped URL?

Now that we understand sandbox extensions, we can say that a security-scoped URL is simply an NSURL (or CFURL) that also carries a sandbox extension token, which grants accesss (read-only or read-write) to the named file path.

The sandbox extension is carried as a URL resource property named _NSURLSecuritySandboxExtensionKey, which as you might guess from the leading underscore, is strictly private. But you can (if you're not worried about App Store review or other Apple validations) query it like this:

NSURL* theURL;
id value
[theURL getResourceValue:&value forKey:@"_NSURLSecuritySandboxExtensionKey" error:NULL];
// value will be an NSData, which is just a UTF-8 encoded string

The public API for a security-scoped URL consists of two methods that wrap actual access to the file, like so:

NSURL* theURL;
if ( [theURL startAccessingSecurityScopedResource] )
{
   // access the file here
   [theURL stopAccessingSecurityScopedResource];
}

Basically, -startAccessingSecurityScopedResource fetches the sandbox extension from the resource property, and asks the kernel to consume it. If that works, the process uses the granted capability to read or write the file. Then -stopAccessingSecurityScopedResource is used to relinquish the capability (which is tracked in the kernel and would otherwise cause a memory leak).

Returning to Security-scoped Bookmarks

So with all that backstory, what actually is a security-scoped bookmark? You might think it is simply a URL bookmark in which the sandbox extension is saved, but is absolutely not that, because that would only be useful until the system is restarted, at which point the extension becomes useless.

A security-scoped bookmark is a way for an app that has access to a specific file — such as by virtue of a security-scoped URL — to save that access and regain it again later, even if the app has been quit and reopened — or the system has been restarted — in the interim.

In order to create a security-scoped bookmark, an app starts with an NSURL that gives it access to the file of interest — either because that URL is security-scoped (and -startAccessingSecurityScopedResource has been sent), or because the app is not sandboxed at all. When the app asks Foundation to make a security-scoped bookmark for the URL (using the NSURLBookmarkCreationWithSecurityScope option), the app's access is first validated, and then the request is sent to ScopedBookmarkAgent, which is the (unsandboxed) macOS service that is responsible for creating and resolving these bookmarks.

The ScopedBookmarkAgent creates a normal bookmark for the URL, but also calculates a security scope cookie, which is a SHA-256 digest that identifies the “scope” for which the bookmark should later be resolved (thus granting access to the file). We'll return to what constitutes a scope momentarily.

Later, the app holding the bookmark data asks Foundation to resolve it back into an NSURL (using the NSURLBookmarkResolutionWithSecurityScope option). This request also gets sent over to ScopedBookmarkAgent, which validates that the security scope cookie in the bookmark matches the scope of the resolution request. If the scope is valid, the agent issues a new sandbox extension for the file, and adds that as a resource property in the new NSURL. This now security-scoped URL is sent back to the app, which can use it to access the file.

The above is vague about the nature of the security scope cookie, because there are actually two forms of scoping, although one is way more common than the other...

Security Scope Cookie for App-scoped Bookmarks

Almost every security-scoped bookmark we've ever seen is of the app-scope type. These are scoped to a specific app as run by a specific user. (An app run by user X might have access to a file on that user's desktop, but this doesn't give even the same app access to that file when run by user Y.)

For an app-scoped bookmark, ScopedBookmarkAgent first calculates a crypto key from two pieces of data:

  1. The code signing identifier of the requesting app. This is almost always the same as the app's bundle identifier, but is fetched from the code signature directly.
  2. A user-specific 32-byte secret key, which is randomly chosen by ScopedBookmarkAgent and stored in your keychain. (You can find this key in Keychain Access, by searching for an item named com.apple.scopedbookmarksagent.xpc. The key is chosen the first time that a scoped bookmark is created for the user, so is quite long-lasting.)

An HMAC-SHA256 is made of the code signing identifer, by using the user-specific secret as the key, to create the 32-byte crypto key.

Then, the actual security scope cookie is calculated as an HMAC-SHA256 of the bookmark data, using the above crypto key. The resulting 32-byte value is the security scope cookie, which is added to the bookmark data. (These 32 bytes are always present in the bookmark data, but they are zeroed out before calculating the HMAC to avoid any circularity.)

Security Scope Cookie for Document-scoped Bookmarks

The other kind of security-scoped bookmark is the document-scope type. In theory, this is supposed to grant access to any process (and any user) that can access a specific document. For example, perhaps a document references some external media file, and you want that media file to be accessible to any user (and any app) that can access the document itself.

We're honestly not sure how or if this is actually used, but for completeness, we'll mention that, in this case, the equivalent crypto key is randomly chosen and attached to the document file as an extended attribute, with the name com.apple.security.private.scoped-bookmark-key. As above, this crypto key is used in an HMAC-SHA256 of the bookmark data, to yield the security scope cookie.

Assuming that the document can be shipped to another user, and that the extended attribute gets preserved, and that the referenced file can still be found by that user, the security-scoped bookmark can be resolved to provide access.

What About Non-security-scoped Bookmarks for Security-scoped URLs?

As mentioned above, a sandbox extension can be used to transfer a capability to another process, such as an XPC service. But how does one do this with the public API? This is where non-security-scoped bookmarks come in handy.

When you ask NSURL to create a non-security-scoped bookmark (i.e. omitting the NSURLBookmarkCreationWithSecurityScope option), the bookmark will contain a sandbox extension for the file, with whatever access the calling process has (i.e. read-write or read-only). This sandbox extension is newly issued at bookmark creation time, and is non-process-specific, regardless of whether the URL itself is security-scoped or if the calling process is simply not sandboxed.

When this bookmark data is sent to (say) an XPC service, and that process goes to resolve it, the sandbox extension will be preserved in the resulting NSURL, and the service can now use it like any other security-scoped URL. Of course, if the service needs to persist access past restart, it would need to make a new, security-scoped bookmark, but that's no different from any other sandboxed process.

Note that it won't work to simply use NSKeyedArchiver on the NSURL, because the sandbox extension resource property will not be preserved. Nor will it work to send a security-scoped bookmark, because the receiving process will have a different code signing identifier and thus won't be allowed to resolve the bookmark, even as the same user. A non-security-scoped bookmark for a security-scoped URL is the right way to do this, even though the overuse of the term “security-scoped” makes it sound dubious.

The Bookmark Binary Format

Based on our reverse-engineering, the bookmark binary format has the following structure.

We inferred this by examining bookmark files and by some amount of reversing of CoreFoundation, CoreServicesInternal and /System/Library/CoreServices/ScopedBookmarkAgent, mostly on macOS 10.15. The implementation may have changed since then, but as far as we know, this is still accurate.

The bookmark data starts with a fixed-length prolog in this form:

struct CFBookmarkProlog
{
    uint32_t    _magic;                                         // "book" as char[4] or 0x6b6f6f62 as Little Endian uint32
    uint32_t    _bookmarkLength;                                // total length of the bookmark data, including prolog
    uint32_t    _version;                                       // 0x10040000, at least as of macOS 10.15.7
    uint32_t    _prologLength;                                  // size of entire prolog, including cookie, currently 0x30
    uint8_t     _securityScopeCookie[ CC_SHA256_DIGEST_LENGTH ];
};

All of the integers here appear to be strictly Little Endian.

The _securityScopeCookie field is used as discussed above; if the bookmark is not security-scoped, this will be all zeroes.

The prolog is followed by an offset (in bytes from the end of the prolog) to the first CFBookmarkTOC. A number of other references are encoded as payload-relative offsets, which also means a number of bytes from the end of the prolog, so we call this point the CFBookmarkPayload:

struct CFBookmarkPayload
{
    uint32_t                _offsetOfFirstTOC;                  // payload-relative offset to first CFBookmarkTOC
};

Next come a variable number of CFBookmarkDataItems, each with a type and size:

struct CFBookmarkDataItem
{
    uint32_t                _dataSize;                          // i.e. byte length of _data[]
    CFBookmarkDataType      _dataType;                          // see below
    uint8_t                 _data[ _dataSize ];                 // the data (but it can be zero in size for some types)
    
} __attribute__( ( aligned( 4 ) ) ); // plus zero padding (not included in _dataSize) to dword-align the next data item

Note that the _dataSize can be zero for certain types. Each CFBookmarkDataItem is padded to 32-bit alignment, but the specified _dataSize does not include any such padding.

The _dataType will be one of the following, with the implied contents of _data for each shown below:

typedef enum : uint32_t
{                                            // CFBookmarkDataItem->_data will be:
    CFBookmarkDataTypeString        = 0x101, // UTF-8 string (not NULL-terminated but length is _dataSize)
    CFBookmarkDataTypeData          = 0x201, // simple data buffer, e.g. becomes a CFData
    CFBookmarkDataTypeNumber        = 0x300, // general numeric type, where subtype corresponds to the CFNumberGetType(), e.g.:
    CFBookmarkDataTypeUInt32        = 0x303, //    _data from CFNumberGetValue() with kCFNumberSInt32Type
    CFBookmarkDataTypeUInt64        = 0x304, //    _data from CFNumberGetValue() with kCFNumberSInt64Type
    CFBookmarkDataTypeDate          = 0x400, // CFDateGetAbsoluteTime(), swapped with CFConvertDoubleHostToSwapped()
    CFBookmarkDataTypeBoolFalse     = 0x500, // nothing (_dataSize==0)
    CFBookmarkDataTypeBoolTrue      = 0x501, // nothing (_dataSize==0)
    CFBookmarkDataTypeArray         = 0x601, // ( _dataSize / sizeof( uint32_t ) ) payload-relative offsets to CFBookmarkDataItems
    CFBookmarkDataTypeDictionary    = 0x701, // ( _dataSize / 2 * sizeof( uint32_t ) ) payload-relative offsets to CFBookmarkDataItems,
                                             // with keys and values alternating
    CFBookmarkDataTypeUUID          = 0x801, // bytes of a UUID (probably, not seen in practice)
    CFBookmarkDataTypeURL           = 0x901, // the URL as a UTF-8 string
    CFBookmarkDataTypeRelativeURL   = 0x902, // 2 payload-relative offsets to CFBookmarkDataItems, first a CFBookmarkDataTypeURL for the base URL,
                                             // second a CFBookmarkDataTypeString for the relative path (but not seen in practice)
} CFBookmarkDataType;

Some types are defined such that byte 1 is a primary type (e.g. number or URL) and byte 0 is a subtype (e.g. number type, absolute or relative URL).

These CFBookmarkDataItems constitute the values of the bookmark data. These are then referenced by a table of contents (or possibly multiple TOCs). The TOC is essentially a set of key-value pairs, with the values (and possibly some keys) being defined in terms of CFBookmarkDataItems.

The first CFBookmarkTOC is found via CFBookmarkPayload->_offsetOfFirstTOC, as noted above. Each TOC starts with this header:

struct CFBookmarkTOC
{
    uint32_t                _unknown1;
    uint32_t                _sentinel;                          // always 0xfffffffe
    uint32_t                _unknown2;
    uint32_t                _offsetOfNextTOC;                   // payload-relative offset of next TOC, or zero if none
    uint32_t                _tocItemCount;                      // number of CFBookmarkTOCItems that follow
};

The CFBookmarkTOC is followed by _tocItemCount of CFBookmarkTOCItems, each of which is basically a key-value pair:

struct CFBookmarkTOCItem
{
    uint32_t                _itemKey;                           // see below
    uint32_t                _itemValueOffset;                   // payload-relative offset to the CFBookmarkDataItem for this value
    uint32_t                _unknown;                           // possibly flags? generally zero
};

The _itemKey here can take one of two forms. If the high bit is clear, the key is an enumerated value: we've deduced a subset of these values as the CFBookmarkTOCItemType below.

Alternatively, if the high bit is set, ( _itemKey & 0x7fffffff ) is a payload-relative offset to a CFBookmarkDataItem of type CFBookmarkDataTypeString. This string key form seems to be used for arbitrary CFURL properties of one kind or another.

Finally, here is an undoubtedly incomplete sample of enumerated _itemKey values:

typedef enum : uint32_t
{
    // Attributes of the referenced file itself
    CFBookmarkTOCItemTypePathComponents     = 0x1004,   // array of strings for each component of the URL path
    CFBookmarkTOCItemTypeInodeComponents    = 0x1005,   // array of integers for the inodes corresponding to each path component
    CFBookmarkTOCItemTypePropFlags          = 0x1010,   // data from _CFURLGetResourcePropertyFlags()
    CFBookmarkTOCItemTypeCreateDate         = 0x1040,   // date file at URL created
    
    // Attributes of the volume that the file was on at bookmark creation time
    CFBookmarkTOCItemTypeVolumePath         = 0x2002,   // path from CFURLCopyFileSystemPath() on kCFURLVolumeURLKey, e.g. "/"
    CFBookmarkTOCItemTypeVolumeURL          = 0x2005,   // kCFURLVolumeURLKey
    CFBookmarkTOCItemTypeVolumeName         = 0x2010,   // kCFURLVolumeNameKey (the visible one, not the APFS Data volume name)
    CFBookmarkTOCItemTypeVolumeUUID         = 0x2011,   // kCFURLVolumeUUIDStringKey (but as a string, *not* as a UUID type)
    CFBookmarkTOCItemTypeVolumeCapacity     = 0x2012,   // kCFURLVolumeTotalCapacityKey as integer
    CFBookmarkTOCItemTypeVolumeCreateDate   = 0x2013,   // creation date of the *volume*, kCFURLCreationDateKey
    CFBookmarkTOCItemTypeVolumePropFlags    = 0x2020,   // data from _CFURLGetVolumePropertyFlags()
    CFBookmarkTOCItemTypeVolumeStartup      = 0x2030,   // true if boot volume (at least at bookmark creation time)

    // Attributes of the user for whom the bookmark was created
    CFBookmarkTOCItemTypeUserHomeDepth      = 0xc001,   // count of path components under home directory
    CFBookmarkTOCItemTypeUserName           = 0xc011,   // CFCopyUserName()
    CFBookmarkTOCItemTypeUserID             = 0xc012,   // _CFGetEUID(), so really the euid, but mostly the same
    
    // Attributes of bookmark creation itself
    CFBookmarkTOCItemTypeCreateOptions      = 0xd010,   // the original CFURLBookmarkCreationOptions
    
    // Other attributes
    CFBookmarkTOCItemTypeRWSandboxExtension = 0xf080,   // a re-issued non-pid-specific com.apple.app-sandbox.read-write
    CFBookmarkTOCItemTypeROSandboxExtension = 0xf081,   // a re-issued non-pid-specific com.apple.app-sandbox.read

} CFBookmarkTOCItemType;

What About macOS Alias Files?

As far as we know, macOS aliases are not a form of URL bookmark. They also have book as their first 4 bytes, but the rest of the prolog doesn't match (though, oddly, the third group of 4 bytes is mark). Perhaps there is some relationship here, but we haven't found it, and it definitely doesn't match the above binary format.

Of course, you can use a bookmark to create an alias: use the NSURLBookmarkCreationSuitableForBookmarkFile option to create a bookmark, and then feed that into +[NSURL writeBookmarkData:toURL:options:error:].