Saurabh 😎
WWDC 2013: Building Efficient OS X Apps
Instead of focusing on speed/latency optimization, will focus on resource optimization (i.e. resource efficiency)
Macs obviously have more resources available than iOS devices, but is also shared amongst multiple apps (unlike iOS)
Memory
Memory pressure causes disk cache pages to be evicted (so system can reclaim that memory), but this causes performance drop in apps that read from disk
Use Instruments - "Allocates" template to profile objects, and "Leaks" template to find leaks and retain cycles - see WWDC 2013: Fixing Memory Issues
Ideally your memory testing is automated
- Look for: (1) unexpected memory usage increases and (2) leaks (and always prioritize fixing leaks)
heap MyLeakyApp
to view allocations and compare memory usageleaks MyLeakyApp
- but make sure to enableMallocStackLogging=1
in Xcode or set env var
Use stringdups
to find duplicate objects (C strings, NSString, NSDate, etc.)
stringdups -nostacks <pid>
(note that it will include duplicate objects from frameworks - ignore those)
Then use stringdups -callTrees <pid>
to find more information about a specific allocation
Memory Pressure (visible in Activity Monitor) is gauge for how difficult it is for system to provide memory to application when it requests allocation
NSCache
- thread-safe dictionary that will evict contents during memory pressure
Purgeable Memory (NSPurgeableData
) - system can evict from memory without interacting with your app
- When you actually need to use the memory, you surround with
-beginContentAccess
and-endContentAccess
so system won't evict between those 2 calls \ - If
-beginContentAccess
returns NO you have to reload the data
System will prefer above two instead of swapping
NSCache
also behaves well when you put NSPurgeableData
objects in it
VM pages are grouped into memory regions:
- Anonymous regions can be named (e.g. one for ImageIO decoded images, one for CALayer rasterizations, different ones for different malloc sizes)
- File back regions - also read lazily from disk on first page fault
May also only have part of region in memory
VM Tracker in Instruments tracks memory used by each region and how much of that region is "dirty"
Try to lower usage of dirty memory since dirty means it has to be written to disk before being evicted
sudo footprint -proc MyLeakyApp -swapped -categories
to get a single (approximate) number of how much memory your app is using
sudo footprint -proc App -proc WindowServer
to view shared memory usage (also useful if you want to get combined memory usage of your app and its XPC service)
What happens during memory pressure?
- NSCache and NSPurgeableData is reclaimed
- Dirty memory pages are written to disk in background to clean them / prepare for fast eviction in future
- File-backed memory written to disk
- Swap anonymous memory
New in Mavericks: before swapping, will now do memory compression
In Activity Monitor:
- App Memory = anonymous and heap regions
- Wired Memory = memory wired by OS (can't be easily reclaimed)
- Compressed = memory used to store other compressed anonymous pages
For more detail than Activity Monitor: vm_stat 1
(1
to get data every 1 second)
Use time profiling tools and look for vm_fault
in the call stack - this indicates that kernel had to process page fault, so if you see this frame more than a few times, then it means your app is imposing memory pressure on system (make sure to enable user & kernel call stacks in Instruments)
sudo sysdiagnose <AppName>
- created since memory pressure depends a lot on what else is happening in system
- Archives output in
/var/tmp/sysdiagnose_TIMESTAMP.tar.gz
- Includes
spindump
,heap
,leaks
,footprint
,vm_stat
,fs_usage
- Shift-Control-Option-Command-Period keyboard shortcut (but will collect less information)
Disk
System-wide I/O contention can create huge performance cliff for your app, especially during app launch and document opening
Two main entrypoints into kernel storage layer: memory mapped I/O or VFS system calls
Make sure to test your app on both spinning hard drives and SSDs
Use Dispatch IO (part of Grand Central Dispatch) for declarative file access and it will encapsulate many best practices for you
- Example: Reading a large file sequentially and doing processing concurrently
Usedispatch_io_create_with_path()
,dispatch_io_set_high_water()
to provide block size, and pass your processing code todispatch_io_read()
Dispatch IO will handle all the details for you like how to parallelize I/O and compute, how often to read, etc. - Example: Reading a large number of files in parallel
Usedispatch_get_global_queue()
anddispatch_io_set_low_water()
One file system anti-pattern is to store a large number of small files - instead, use SQLite or Core Data for storing large number of small items
By default, write()
calls only flush to disk when close()
is called on the file handle
To force flush, use close()
/fsync()
(on VFS) or msync()
(for memory mapped I/O) \
Write buffering is helpful to coalesce multiple writes() into a single disk write
If you are trying to use these to achieve data consistency, you probably want to use SQLite or Core Data instead of trying to roll your own consistency
By default, reading data from disk keeps it in the file cache, which competes for memory
If you know data won't be needed again, can use non-cached IO: pass NSDataReadingUncached
to NSData
or set fcntl(fd, F_NOCACHE, 1)
on the file handle
Main advantage of memory mapped I/O is it avoids the extra copy from file cache to your process's memory with VFS system calls
Pass NSDataReadingMappedIfSafe
to read NSData
with memory mapped I/O
Golden rule of I/O: don't do I/O on the main thread
Use fs_usage
command to profile your app's disk accesses
- Filter by type of events with
-f <mode>
filesys
- file system events
diskio
- I/Os that access disk (note that this will not show requests handled entirely by the file cache) - Use
-w
to force wide output when redirecting to file
You should profile your app launch for both "cold" launches (where file cache is empty) and "warm" launch (where file cache already contains files needed by app)
Use purge
command to evict caches to force cold launches
Summary of disk I/O best practices:
- Use dispatch I/O
- Profile disk accesses under different file cache warmth states
- Use non-cached I/O if only accessing data once
- Watch out when data is flushed
- Don't do I/O on the main thread
Background Work
Examples of background work: refreshing data, syncing indexing, backing up, etc
Backgrounding provides hints to system so it can lower CPU scheduling priority and throttle disk IO access
Get background queue: dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)
Anything you dispatch to background queue will be backgrounded
But do not acquire locks needed by the UI since background work can be delayed, causing priority inversion
Can also use XPC activities and Adaptive Daemons to let the system pick best time to perform a task
See WWDC 2013 Efficient Design with XPC
Add launchd key:
<key>ProcessType</key><string>Background</string>
Background specific process or thread: setpriority(PRIO_DARWIN_PROCESS, 0, PRIO_DARWIN_BG)
Running ps -aMx
will show priority of each process/thread
Anything with priority 4 or less is running as backgrounded (i.e. lower priority)
spindump
- look for throttle_lowpri_io
frame
taskpolicy -b <your command>
- similar to UNIX nice
command to run process as backgrounded