domingo, 16 de dezembro de 2012

DFS-R Replication problems: Event ID 6002

Hi,

today morning, while applying windows updates into one of mine DFS-R and DFS-N servers, I found a strange ID, 6002:


The DFS Replication service detected invalid msDFSR-Subscriber object data while polling for configuration information.

Additional Information:
Object DN: CN=<removed>,CN=DFSR-LocalSettings,CN=<removed>,OU=<removed>,OU=<removed>,OU=<removed>,OU=<removed>,DC=<domain name>,DC=com
Attribute Name: msDFSR-MemberReference
Domain Controller: <DC hostname>
Polling Cycle: 60 minutes

After seeing this error I filtered mine alerts, and found that for 3 days these alerts were being shown, just a few each day. So, let´s do what every SysAdmin does: google it!

I had the feeling that this was related to conflicts or orphan entries in LDAP DB that contains the entire DFS configuration, and the feeling became true.

My Domain is  based on Windows Server 2008 R2 with 2008 R2 Functional Level.

I found some interesting entries:

http://www.chicagotech.net/netforums/viewtopic.php?p=4675&sid=4533d81e7d24afa689686ff5d5ffbdb3
http://www.eventid.net/display-eventid-6002-source-DFSR-eventno-10483-phase-1.htm
http://social.technet.microsoft.com/wiki/contents/articles/1158.dfsr-event-6002-dfs-replication.aspx (fix only for 2k3 R2 - other problem)

and here the details about the ms-DFSR-MemberReference attribute (Windows):
http://msdn.microsoft.com/en-us/library/windows/desktop/ms677157(v=vs.85).aspx#win_2008_r2

But, after the links and the intro, what was the solution?

Before going to the solution, I need to add one more thing: this error stopped replication for ALL folders and groups. So this is a very critical problem. I created some Diag Reports and create some test files to see them replicated, but nothing happened for hours.

Steps taken:

1. Get the errors (look in all involved servers from your DFS-R farm, if you replicate folders from A to B, look on A and B for 6002 errors - filter the event viewer)

2. Go to your Domain Controller (I am assuming you have DFS deployed in a Domain), open the ADSIEdit.msc console (this tool allows you to browse the LDAP structure from your domain, including all attributes from the objects, allowing a RW view... be careful when editing this, this is only required for advanced procedures)

3. In the ADSIEdit console, connect to your DC/Domain and navigate to the key informed in the error. Open the details for the key, there you will find the replicated folder name.

4. Do the same in the other servers. Verify in all the servers the replicated folder name

5. Do not remove the entry yet. Go back to your DFS Management snap-in and delete the affected replicated group.

6. Wait for the changes to propagate, to speedup run in an elevated prompt dfsrdiag PollAD /Member:<your  DC hostname>

7. Wait up to 1h to see the changes propagated. For me, after removing the conflicting Replication Group from the DFS-R Management (not removing from view, but deleting the RG totally) the DFS replication started to work properly again (yes!!!). But the conflicted still existed in MSIEdit.

8. To stop the 6002 errors, I deleted the entries from LDAP DB and restarted DFS-R service. After that everything was OK.

So, one more problem solved and documented so you can also benefit from this. Please comment