Error Creating a publication in SQL 2000
As luck would have it, and as is very common in a DBA’s life, I get a call right before I had planned to leave for the day “There is data missing from a replicated table”. I immediately check the replication monitor and see no errors or latency reported. I have seen this before when all looks good yet no transactions are getting picked up. I try restarting the agents, dropping and recreating the subscription, recreating a new snapshot… nothing. So I resort to the tried and true drop and recreate the publication and subscriptions which has always worked in the past. Everything is going fine, all dropped ok yet on creation of the publication I get a weird message regarding not being able to obtain information about a user (local SQL user)??? I check the user\logon, its valid and not orphaned. I don’t have a clue. A quick Google on the error and I see a post regarding incompatibilities between SQL2000 (Publisher) and SQL2005 (Distributors). In checking the logs it turns out the the distributor had been patched (Service Packs) since the original replication was established. My assumption is that the newer edition of SQL 2005 introduced the incompatibility and as soon as the publication was dropped that was all she wrote. May have been why replication broke in the first place. I tried logging in as “sa” and recreating the publication and also tried recreating the publication in management studio 2K in both cases I received the same error.
I disabled replication on the server. Re-established replication and configured the publisher with a local distributor. Recreated all the publications (with immediate snapshots) and all the subscriptions. After that everything was running fine.
I thought I was good to go that is… The agents were running, the subscription was running but upon checking the replicated data it was out of sync. I check the replication monitor and to my surprise the snapshot and log reader agents were in a retry status and the subscription stated “never Started”. I immediately checked the jobs and sure enough the agents were in a perpetual retry status. In between retries it simply stated step failed without any error message. Finally I ran the agent command in dos manually and I received an error about a missing or corrupted .net DLL ATL71.DLL. Nothing to my knowledge had changed on the system this was very odd and ultimately could have been the root cause of this whole incident. I did a search and found the DLL located in the backup Exec bin folder. I then searched the registry but found no other references to ALT71.DLL. I took a shot in the dark and copied the it to the system32 folder and reran my snapshot agent command. Hurray!!! it ran successfully. I restarted the agents and snapshot ran fine but the log reader was still having issues. Upon examining the job log I was now getting the following: “ The process could not execute ‘sp_repldone/sp_replcounters’ “. After some more digging it pointed to more than one log reader running against the replicated database. I restarted the SQL Agent but was still getting the error. I then ran sp_who and saw an process from the old distribution server connected to my replicated db, I killed the spid and all was as it should be..