WWDC 2013: Building Efficient OS X Apps

Instead of focusing on speed/latency optimization, will focus on resource optimization (i.e. resource efficiency)

Macs obviously have more resources available than iOS devices, but is also shared amongst multiple apps (unlike iOS)

Memory

Memory pressure causes disk cache pages to be evicted (so system can reclaim that memory), but this causes performance drop in apps that read from disk

Use Instruments - "Allocates" template to profile objects, and "Leaks" template to find leaks and retain cycles - see WWDC 2013: Fixing Memory Issues

Ideally your memory testing is automated

Look for: (1) unexpected memory usage increases and (2) leaks (and always prioritize fixing leaks)
heap MyLeakyApp to view allocations and compare memory usage
leaks MyLeakyApp - but make sure to enable MallocStackLogging=1 in Xcode or set env var

Use stringdups to find duplicate objects (C strings, NSString, NSDate, etc.)
stringdups -nostacks <pid> (note that it will include duplicate objects from frameworks - ignore those)
Then use stringdups -callTrees <pid> to find more information about a specific allocation

Memory Pressure (visible in Activity Monitor) is gauge for how difficult it is for system to provide memory to application when it requests allocation

NSCache - thread-safe dictionary that will evict contents during memory pressure

Purgeable Memory (NSPurgeableData) - system can evict from memory without interacting with your app

When you actually need to use the memory, you surround with -beginContentAccess and -endContentAccess so system won't evict between those 2 calls \
If -beginContentAccess returns NO you have to reload the data

System will prefer above two instead of swapping

NSCache also behaves well when you put NSPurgeableData objects in it

VM pages are grouped into memory regions:

Anonymous regions can be named (e.g. one for ImageIO decoded images, one for CALayer rasterizations, different ones for different malloc sizes)
File back regions - also read lazily from disk on first page fault
May also only have part of region in memory

VM Tracker in Instruments tracks memory used by each region and how much of that region is "dirty"
Try to lower usage of dirty memory since dirty means it has to be written to disk before being evicted

sudo footprint -proc MyLeakyApp -swapped -categories to get a single (approximate) number of how much memory your app is using

sudo footprint -proc App -proc WindowServer to view shared memory usage (also useful if you want to get combined memory usage of your app and its XPC service)

What happens during memory pressure?

NSCache and NSPurgeableData is reclaimed
Dirty memory pages are written to disk in background to clean them / prepare for fast eviction in future
File-backed memory written to disk
Swap anonymous memory

New in Mavericks: before swapping, will now do memory compression

In Activity Monitor:

App Memory = anonymous and heap regions
Wired Memory = memory wired by OS (can't be easily reclaimed)
Compressed = memory used to store other compressed anonymous pages

For more detail than Activity Monitor: vm_stat 1 (1 to get data every 1 second)

Use time profiling tools and look for vm_fault in the call stack - this indicates that kernel had to process page fault, so if you see this frame more than a few times, then it means your app is imposing memory pressure on system (make sure to enable user & kernel call stacks in Instruments)

sudo sysdiagnose <AppName> - created since memory pressure depends a lot on what else is happening in system

Archives output in /var/tmp/sysdiagnose_TIMESTAMP.tar.gz
Includes spindump, heap, leaks, footprint, vm_stat, fs_usage
Shift-Control-Option-Command-Period keyboard shortcut (but will collect less information)

Disk

System-wide I/O contention can create huge performance cliff for your app, especially during app launch and document opening

Two main entrypoints into kernel storage layer: memory mapped I/O or VFS system calls

Make sure to test your app on both spinning hard drives and SSDs

Use Dispatch IO (part of Grand Central Dispatch) for declarative file access and it will encapsulate many best practices for you

Example: Reading a large file sequentially and doing processing concurrently
Use dispatch_io_create_with_path(), dispatch_io_set_high_water() to provide block size, and pass your processing code to dispatch_io_read()
Dispatch IO will handle all the details for you like how to parallelize I/O and compute, how often to read, etc.
Example: Reading a large number of files in parallel
Use dispatch_get_global_queue() and dispatch_io_set_low_water()

One file system anti-pattern is to store a large number of small files - instead, use SQLite or Core Data for storing large number of small items

By default, write() calls only flush to disk when close() is called on the file handle
To force flush, use close()/fsync() (on VFS) or msync() (for memory mapped I/O) \

Write buffering is helpful to coalesce multiple writes() into a single disk write
If you are trying to use these to achieve data consistency, you probably want to use SQLite or Core Data instead of trying to roll your own consistency

By default, reading data from disk keeps it in the file cache, which competes for memory
If you know data won't be needed again, can use non-cached IO: pass NSDataReadingUncached to NSData or set fcntl(fd, F_NOCACHE, 1) on the file handle

Main advantage of memory mapped I/O is it avoids the extra copy from file cache to your process's memory with VFS system calls
Pass NSDataReadingMappedIfSafe to read NSData with memory mapped I/O

Golden rule of I/O: don't do I/O on the main thread

Use fs_usage command to profile your app's disk accesses

Filter by type of events with -f <mode>
filesys - file system events
diskio - I/Os that access disk (note that this will not show requests handled entirely by the file cache)
Use -w to force wide output when redirecting to file

You should profile your app launch for both "cold" launches (where file cache is empty) and "warm" launch (where file cache already contains files needed by app)
Use purge command to evict caches to force cold launches

Summary of disk I/O best practices:

Use dispatch I/O
Profile disk accesses under different file cache warmth states
Use non-cached I/O if only accessing data once
Watch out when data is flushed
Don't do I/O on the main thread

Background Work

Examples of background work: refreshing data, syncing indexing, backing up, etc

Backgrounding provides hints to system so it can lower CPU scheduling priority and throttle disk IO access

Get background queue: dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)
Anything you dispatch to background queue will be backgrounded
But do not acquire locks needed by the UI since background work can be delayed, causing priority inversion

Can also use XPC activities and Adaptive Daemons to let the system pick best time to perform a task
See WWDC 2013 Efficient Design with XPC

Add launchd key:
<key>ProcessType</key><string>Background</string>

Background specific process or thread: setpriority(PRIO_DARWIN_PROCESS, 0, PRIO_DARWIN_BG)

Running ps -aMx will show priority of each process/thread
Anything with priority 4 or less is running as backgrounded (i.e. lower priority)

spindump - look for throttle_lowpri_io frame

taskpolicy -b <your command> - similar to UNIX nice command to run process as backgrounded