DFS Replication: What’s new in Windows Server™ 2008
SYSVOL Replication – Now on DFSR
One of the coolest features in Windows Server™ 2008 is – DFSR can now be used for replication of the SYSVOL share between domain controllers. This feature is however limited to domain controllers running in Windows Server 2008 domain functional level. All domain controllers (including a new Windows Server™ 2008 domain controller) operating below this domain functional level will continue to use NTFRS for SYSVOL replication.
Replication Partners
|
Replication Engine Used
|
Windows Server 2003 Domain Functional Level
|
Windows Server 2003 <-> Server 2003
|
NT File Replication Service
|
Windows Server 2003 <-> Windows Server 2008
|
NT File Replication Service
|
Windows Server 2008 <-> Windows Server 2008
|
NT File Replication Service
|
Windows Server 2008 Domain Functional Level
|
Windows Server 2008 <-> Windows Server 2008
|
Distributed File System Replication Service
|
Figure 1: SYSVOL Replication on Windows Server.
DFSR also supports Read Only Domain Controllers for SYSVOL replication. On a Read Only Domain Controller, the DFS Replication service will roll back any changes that have been made locally and will not propagate these changes out to other domain controllers.
NOTE: Read only support does not extend to non-SYSVOL replicated folders. Read only domain controllers are supported as leaf nodes only with no outbound connections from the Read only domain controller. Active Directory automatically configures Read Only Domain Controllers in this manner, so no explicit action is required on the part of the end-user/administrator to comply with the above requirement.
DfsrMig.exe – migrate SYSVOL replication from NTFRS to DFSR
Windows Server™ 2008 also ships a command line tool called dfsrmig.exe which can be used by administrators to migrate from NTFRS replication of the SYSVOL folder to using DFSR for replication of SYSVOL. This can be done once the domain functional level has been raised to Windows Server 2008.
The SYSVOL migration process:
a) Is simple and requires minimal admin intervention.
b) Is designed to be low risk and requires minimal downtime of the SYSVOL share.
c) Provides granular control to administrators so that they can monitor the status of migration and commit to DFSR replication of SYSVOL when everything is working smoothly.
d) Is reversible and allows the migration process to be rolled back prior to the final commit stage, thus allowing administrators to fall back on NTFRS replication of SYSVOL if desired.
e) Is robust, fault tolerant and is latency resilient, thus making it suitable for migration of domain controllers located in branch offices as well.
f) Provides built in monitoring mechanisms which can be used to keep track of the status of migration.
A more detailed blog post covering the steps involved in SYSVOL migration will follow.
Performance gets a boost
Some of the key enhancements in the DFS Replication service in Windows Server™ 2008 are on the performance front. We’ve noticed over the course of working with customers on Windows Server™ R2 based DFSR deployments that there is a scope for enhancing performance while replicating workloads as diverse in nature as the ‘hundreds of small files’ and ‘large file’ workloads. The targeted performance work in Windows Server™ 2008 should benefit deployments of DFSR on heavily loaded hub servers.
1. “Over-the-wire” enhancements: RPC Asynchronous Pipes
DFSR in Windows Server™ 2008 builds on some wonderful work done in the Windows RPC subsystem to support RPC asynchronous pipes. This has enabled DFSR to optimize at the replication protocol level and implement RPC asynchronous pipe support for replication. For files of size greater than 256K, DFSR uses RPC asynchronous pipes. There are several benefits that can accrue from the usage of asynchronous RPC pipes:
· Multiple outstanding calls from a replication partner: In Windows Server™ 2003 R2, DFSR on a member receiving updates (ex: a hub server) is blocked in a remote procedure call until the call returns. This prevents the thread servicing that request from having multiple outstanding calls, or performing other work while waiting for the RPC call to return with data. On Windows Server™ 2008, the asynchronous RPC pipes based implementation enables the thread servicing requests to continue to service other outstanding requests from the replication partner while waiting for already issued RPC calls to return with data.
· Slow or delayed partners: If a replication partner is slow to produce data (for instance a remote DFSR server over a slow link) the DFSR thread servicing that partner doesn’t need to block until data is available. Thus slow replication partners do not end up throttling replication performance.
· Replication of large amounts of data: Transferring large amounts of data between replication partners, especially over slow links, ties up worker threads at both ends for the duration of the transfer. With RPC asynchronous pipes, this data transfer can take place incrementally, and without blocking either end from performing other tasks.
NOTE: Please note that this feature requires both ends to be running Windows Server™ 2008. In mixed mode deployments where Windows Server™ 2008 as well as Windows Server™ 2003 R2 servers co-exist, the DFS Replication service will default to not using RPC asynchronous pipes for replication activity.
2. Server-side performance tweaks:
a) Un-buffered I/O: The DFS Replication service on Windows Server™ 2008 now implements un-buffered local I/Os which increases throughput, since the number of data copy operations that would be effected during the course of regular replication activity is greatly reduced. Not requiring data buffers to be copied multiple times means that much more juice for a heavily loaded hub server which is replicating with multiple replication partners.
b) Asynchronous Low Priority I/Os: In Windows Server™ 2008, the DFS Replication service is able to leverage a new feature called low priority I/Os. This feature enables background processes such as DFSR which is a Windows service to run with lower-priority access to the hard disk drive than other programs. Applications which are low-priority I/O compatible are able to run on a server without slowing down other high priority processes and thus help improve the responsiveness of a server even though it is dealing with a lot of replication load.
NOTE: As a result of the two above mentioned performance tweaks, the Windows Cache Manager’s buffers are not polluted with replication related data. This drastically reduces the performance impact on the server during heavy replication activity. Thus, running the DFS Replication service on a hub server which is handling large amounts of replication workload will not cause the server to be brought down to a crawl.
c) 16 concurrent file downloads: In Windows Server™ 2008, owing to the usage of asynchronous low priority local I/Os as well as the usage of RPC asynchronous pipes, the number of concurrent file downloads has been bumped up to 16 as against the existing limit of 4 on Windows Server™ 2003 R2. This allows hub servers running Windows Server™ 2008 to download multiple updates from their replication partners concurrently.
Windows Server™ 2003 R2
|
Windows Server 2008
|
Multiple RPC calls
|
RPC Asynchronous Pipes
|
Synchronous I/Os
|
Asynchronous I/Os
|
Buffered I/Os
|
Un-buffered I/Os
|
Normal priority I/Os
|
Low Priority I/Os
|
4 concurrent downloads
|
16 concurrent downloads
|
Figure 2: The performance dashboard
3. Algorithmic Enhancements:
Based on feedback from some customers who have been using DFSR to replicate data between a central datacenter and multiple remote sites (some of these on slow links), we have enhanced the DFSR service to perform better under these conditions. In experiments conducted to simulate these conditions, it was found that replication member servers on slow links were often throttling the rate at which a datacenter (hub) server was able to collect updates from its replication partners.
Therefore, the algorithm employed to schedule the download of updates has been reworked to distribute the right to send updates more fairly amongst replication partners.
Improved Dirty Shutdown Recovery
In Windows Server™ 2008, recovery from dirty shutdowns is greatly enhanced thus leading to a more resilient DFS Replication service.
The DFS Replication service maintains state information pertaining to the contents of the folders replicated by it in a database on the volume hosting the replicated folder. In this database, it maintains a record of file versions and other metadata which is what enable it to function as a multi-master file replication engine and to automatically resolve conflicts. The DFS Replication service is essentially a consumer of the NTFS USN (Update Sequence Number) journal – a journal of updates to files/folders maintained by NTFS. Entries in this journal notify the DFS Replication service about changes occurring to the contents of a replicated folder and end up triggering replication activity. Every unique change occurring on the file system (pertaining to a folder replicated by DFSR) will trigger the creation/update of a record in the DFSR database as well.
Sometimes, it is possible that the database and the file system go out of sync. Examples of such scenarios are abrupt power loss on the server or if the DFSR service were terminated abnormally for any reason. Another example is if the volume hosting a DFSR replicated folder loses its power, gets disconnected or is forced to dismount. These eventualities result in ‘Dirty Shutdown’ of DFSR and can cause inconsistencies between the database and the file system. DFSR is designed to automatically recover from these situations and will validate the contents of the replicated folder against entries stored in the database for consistency. This is achieved by implementing USN check-pointing, which is a way of keeping track of which USN changes have been committed to the database.
If a dirty shutdown is detected on Windows Server™ 2003 R2, the DFS Replication service triggers an expensive rebuild of the database which is time consuming. Therefore, the service takes more time to recover from dirty shutdowns. In Windows Server™ 2008, sophisticated validation algorithms have been incorporated which reduce the requirement of expensive database rebuilds to the extent possible. Thus, the recovery process has been enhanced to perform much better and will automatically recover from dirty shutdown conditions without consuming as much time as on Windows Server™ 2003 R2.
And … an old ‘formula’ is retired!
In Windows Server™ 2003 R2, there was a scalability limit defined by the below formula. The blog post which explains this in detail can be found here.
On each server, the result of the following formula should be kept to 1024 or fewer:
(number of replicated folders in replication groupx * number of simultaneously replicating connections in replication groupx) + (number of replicated folders in replication groupy * number of simultaneously replicating connections in replication groupy) + (number of replicated folders in replication groupn * number of simultaneously replicating connections in replication groupn)
|
This formula came into existence on R2 because the DFS Replication service made use internally of a library to maintain performance counters. This library had a hard limit on the number of performance counter objects which could be created – yes, 1024. Since DFSR uses performance counters extensively for each replicated folder and for keeping track of state information pertaining to every connection with a replication partner, this limit of 1024 affected scalability.
With Windows Server™ 2008, the DFS Replication service has been upgraded to use the new and improved version of this library. This version doesn’t have the limitation of 1024 performance counter objects (unlimited performance counter objects can be created) and therefore the above formula is no applicable on Windows Server™ 2008.