SCOM Error – The Microsoft Exchange Mailbox Replication Service isn’t scanning mailbox database queues for jobs

Recently in one of the Exchange Server was frequently giving this alert on the SCOM alerts.

Ran the below command to check the health of the affected Exchange Server

Get-ServerHealth -Server ServerName

Could see the MailboxMigration HealthSet Unhealthy and the other healthsets were healthy.

The SCOM alert too reported the same DUMP directory:

<b>Dump Directory:</b>
C:\Program Files\Microsoft\Exchange Server\V15\Diagnostics\MigrationResponderDumps
at Microsoft.Exchange.Monitoring.ActiveMonitoring.Migration.Probes.MRSQueueScanProbe.DoWork(CancellationToken cancellationToken)
at System.Threading.Tasks.Task.Execute()
— End of stack trace from previous location where exception was thrown —

Also at the end provided the same information for troubleshooting

Note: Data may be stale. To get current data, run: Get-ServerHealth

 

As a part of normal troubleshoot restarted the Mailbox replication service and the issue still persists.

Now started looking into the event logs and got the below event

 

MRS.png

 

It was trying to process jobs in a recovery database which was created by an admin and forgot to remove them after a restore job.

So dismounted this recovery database and removed them which solved this issue and after that this error never reappeared again.

Thanks & Regards

Sathish Veerapandian

Extend the Symantec Enterprise Vault to DR site for HA

In this article we will have a look at extending the Enterprise Vault to DR site. This configuration will be helpful when the main site is completely down.

Usually below will be the Enterprise vault configuration in most of the cases :
1) Active/Passive Configuration on Primary Site.
2) HA Failover option will be present in primary site.
3) EV will be available 100 percent in primary site.

In most cases 99 percent the enterprise vault will be configured in Microsoft Cluster because of Good  stability of Windows cluster.

Normal Active/Passive setup with HA option in Main site :

EV1.png

Implications without EV DR :

  1. Archived items will not be  available when the main site is not available.
  2. EV items stored in EV storage will be not available.

So in a normal scenario where the main site is operational and available the DR server will not be functioning and will remain as Standby.

A typical DR solution requires primary and secondary sites, and clusters within those sites for the EV to function.

There are 2 options available for EV DR setup :

1)  Go with update service location option with Symantec software. (Requires more manual operation like below)

a) SQL native tools to DR failover.
b) Mount  the volumes of EV stores appropriately.
c) Need to use the EV native  Update service Location (USL).

on top of the above we are not sure that the replicated storage of EV data and SQL  to DR is healthy or not.

2) Go with an EV aware DR application software.(Recommended)

There are few EV aware software’s available in market. They can fully automate the failover and failback between the sites. Its better to go with this option.

Below are the EV aware software’s which is available :

  1. Enterprise Vault with InfoScale Enterprise.
  2. EV Near Sync.

Below is one example  of high level design of EV DR setup:

EV2png.png

Below is the summary:

1) Have EV Seperate Cluster on secondary site.
2) Perform the SQL and EV storage replication to the DR site regularly.
3) Have an EV aware software which performs the automatic failover and failback in case of disaster.Because these software after the intitial configuration does rest of the work such as updating entries in SQL database and activating the DR replicated Vault storage groups.
4) Need to change the DNS alias pointing from production to DR in case of DR activation.

Storage Requirements:

1) The EV storage groups needs to be replicated to the DR site ,can be done through SAN replication and most of the storage vendors are having the SAN replication.
2) Replication needs to be synchronous from the main site to the DR site.
3) Replication needs to be scheduled from the storage everyday for incremental updates.
4) Replication should be performed after the daily archiving schedule, during the vault stores in backup mode.
5) Indexes, databases, and files from the primary NAS to DR should be synced on a daily schedule.

SQL Replication Requirements:

1) Symantec recommends as a best practice to configure SQL Server for disaster recovery before configuring Enterprise Vault for disaster recovery.
2) A SQL server instance must be present on the DR site for SQL replication.
3) SQL server log shipping must be done to replication of DR.
4) SQL server DB replication must be done for replication to DR.
5) SQL data needs to be replicated in daily schedule to the DR site.

EV server requirements:

1) A new site DR to be defined in the EV topology in the vault admin console.
2) 2 new EV nodes with different names to be introduced in this Site.
3) Volume replication needs to be scheduled after storage is ready on the DR.
4) SQL replication needs to be scheduled after the DR instance is setup.
5) Better to have an well known EV aware replicating software like InfoScale or EVNEARSYNC which is having a good presence in the market because these applications provides RTO & RPO in minutes compared to the native EV failover scenario.

Network Requirements:

1) SQL replication needs to be done from the main site to the DR site. Required ports needs to be open.
2) Since SAN replication is already in place better to verify for these Datastores for the replication and required bandwidth for the daily incremental data replication in the current nw bandwidth in the DR site.
3) One standby IP for EV url in the DR site and needs to be pointed to this IP during the DR scenario.

High Level – How DR Works :

1) EV DR servers will be always Turned off in normal scenario.
2) During DR scenario EV DR servers needs to be turned on.
3) Present the replicated healthy storage (indexing & partition) to the DR server (Achieved through EV cmdlets)
4) Present the replicated healthy SQL db to the DR server (Achieved through EV cmdlets)
5) Perform failover by changing production alias to DR Server (Achieved through EV cmdlets)
6) Change DNS alias of Archive URL pointing from production to DR EV server and then run USL (Update service location).

All these above steps can be reduced and performed automatically by an EV aware application like EVNEARSYNC or InfoScale Enterprise.

Note:

1) The storage SAN replication needs to be planned accordingly with the current storage vendor and their recommendations.
2) Need to make sure the Exchange  DR setup is already in place, databases replicated in DR site and should be able to perform Exchange DR activation also to achieve best SLA for Email.

Thanks & Regards
Sathish Veerapandian

Integrate Cisco TelePresence Management Suite Extension for Exchange with Exchange 2016

This article explains on integrating Cisco Telepresence Suite with Exchange Server 2016. Before that lets have a brief on these components.

Cisco Telepresence Management Suite (Core Component of Video Collaboration):

This component in the Cisco IPT infrastructure provides the on-premises video collaboration.By this component we would be able to configure, Deploy, manage ,schedule , analyze and track the telepresence utilization  within an organization.

Cisco TMS helps in the following:

1) Helps Admins in the daily operations, configuration and maintainence of the telepresence network.
2) Helps consumers to use these telepresence network according to their customization.Like telepresence deployment as a service Example : Setting up meeting rooms of multi-monitors, multi-microphones and multi-channel speaker systems which gives stunning real like audio,video experience.
3) Helps in monitoring the Telepresence utilization and analyzing them.

What is Cisco TelePresence Management Suite Extension for Microsoft Exchange ?

Cisco TelePresence Management Suite Extension for Microsoft Exchange (Cisco TMSXE) is an extension for Cisco TelePresence Management Suite that enables videoconference scheduling via Microsoft Outlook, and replicates Cisco TMS conferences to Outlook room calendars.

Cisco TelePresence Management Suite Extension for Microsoft Exchange (Cisco TMSXE) is one of their extension for Cisco TelePresence Management Suite.

How it helps us in Scheduling the Meeting :

1)By having this it enables the option to Video Conferencing Scheduling via Microsoft Outlook.
2)Replicates Cisco TMS conferences settings to Outlook Room Calendars.
3)Makes end users to book Audio/Video conferences based on the Meeting room Availability from Outlook.

Cisco TMSXE Installation:

This Cisco TMSXE server runs on Windows server Cisco TMSXE component will be installed on this server along with booking service option chosen.
It similarly uses the IIS as web server. Enable https on the Default Website after the installation.

All the other configurations in Cisco components required for this integration like integrating with CUCM , CMS must be configured on the Cisco TMSXE and Cisco TMS server. There are more configurations on the TMS and TMSXE componenets which needs to be performed before integrating with Exchange Server.

In a small deployment the Cisco TMS and its extensions can be co-located on the same server.
In large scale deployments Cisco TMSXE extensions is seperate and remote SQL instance is required. And seperate Cisco TMS and Cisco TMSPE are always co-resident.

DNS Requirements:

The Cisco TMSXE server must be present on the same server VLAN where we have AD,Exchange servers.
The communication will be authenticated using the Cisco TMSXE Exchange service user account.

EWS and Autodiscover must be reachable from the TMS and TMSXE server for them to function.

Licensing:

Each telepresence endpoints to be booked through Cisco TMSXE must be licensed for general Cisco TMS usage.

In our case from Exchange perspective only the Meeting rooms where we need telepresence to be enabled must have the license.

Supported Exchange Server Versions:

  1. Office 365 ( Active Directory Federation Services and the Windows Azure Active Directory Sync tool are required)
  2. Exchange Server 2016 CU1  (latest CU’s preferred)
  3. Exchange Server 2013 SP1  (latest CU’s preferred)
  4. Exchange Server 2010 Sp3  (Latest Roll-ups preferred)
  5. Exchange Server 2007   (Latest Roll-ups preferred)

Exchange Requirements:

  1. TMSXE purely depends upon Exchange  AutoDiscover and EWS components to show the configured resource mailboxes availability
  2. Room Mailboxes added to Cisco TMSXE must have below configurations
  3. a)Delete the subject
    b)Add the organizer’s name to the subject
    c)Remove the private flag on an accepted meeting

    3.Cisco TMSXE Service Account with Mailbox is required.This service account will be used in Cisco TMS to connect to Exchange, Cisco TMSXE and Cisco TMS.

Enable impersonation for the service user in Exchange to prevent throttling issues.

To enable impersonation run the below command:
New-ManagementRoleAssignment –Name:impersonationAssignmentName – Role:ApplicationImpersonation –User:[ServiceUser]

Certificate Requirements:

Https is the default communication protocol for communicating with Cisco TMS and with Exchange Web Services.

Certificate can be issued from a Trusted CA , since this is only server to server communcation between the Exchange CAS services (EWS/AutoDiscover) and TMSXE services no public SSL is required.

So the TMSXE server certificate issued from Trusted CA should have the below:

  1. Should have the host name of the TMSXE server.
  2. Should have the host name of the Exchange servers for the EWS and Autodiscover services in secure communication.

To verify that we have certificates that are valid and working:
1. Launch Internet Explorer on the Cisco TMSXE server.
2. Enter the URL for the Exchange CAS and verify that the URL field turns green.
3. Enter the URL for the Cisco TMS server and verify that the URL field turns green.

Below will be the Work Flow :

Cisco TMS

  1. End User Books a meeting through Outlook addin.TP.png
  2. Exchange Checks the resource Mailboxes availability and books the meeting and sends initial confirmation.
  3. Cisco TMSXE communicates with Exchange and passes them on to Cisco TMS.
  4. Cisco TMS checks system and WebEx availability and attempt to book routing resources for the telepresence.

Additional Tips:

  1. The Cisco TMS is dependent only on resource calendars which are configured for this Telepresence feature.
  2. Cisco TMSXE does not have permissions to modify the calendars of personal mailboxes.
  3. All the other configurations  required for this integration must be configured on the Cisco TMSXE and Cisco TMS server.

Thanks & Regards
Sathish Veerapandian

Exchange 2016 CU rollup readiness check fails – MSCORSVW(3404) has open files

During an Exchange CU update we were getting the below message

NGEN

Prior to this all the  Exchange servers were fully patched including  the latest .net assemblies since it was CU5 upgrade.

If we look into the task manager we can see this process running and consuming large CPU resources. This is a .net related process that does the compilation job based on the priorities it is having high priority assemblies  and low priority assemblies.

What is MSCORSVW.exe?

The .Net framework has technology  called Native Image Generator Technology (NGEN) which will speed up the process for .net apps which will run only on a periodic basis purely to improve the performance of that machine

This process MSCORSVW.exe is used by NGEN  to improve the startup performance of .NET apps. So probably after an windows update especially .net patch if we have we can see this process running only at that time and consuming more CPU.

Solution for this problem:

  1. Solution 1: We can wait for a while for this .net compilation job to complete probably 5 or 10 minutes time. Once completed if we rerun the setup  things will  go fine.
  2. Solution 2: By default, NGEN only uses one CPU core for this operation . There is an option to make this work done quickly by making it to use up to 6 cores when we require them. By doing this it will complete its compilation job quickly.

Open CMD in elevated mode and run this command from this path

c:\Windows\Microsoft.NET\Framework\v4.0.30319\ngen.exe executeQueuedItems

Untitlesd

Running the above will  Execute queued compilation jobs with extra CPU cores and make it faster.Now wait for the process to precompile all the assemblies, after a couple of minutes it will be completed.

There will be ngen log as well generated in the same location where we executed this command which we can have a look at after the job completes.

References:

https://msdn.microsoft.com/en-us/library/6t9t5wcf(v=vs.110).aspx
https://blogs.msdn.microsoft.com/dotnet/2013/08/06/wondering-why-mscorsvw-exe-has-high-cpu-usage-you-can-speed-it-up/

Thanks & Regards
Sathish Veerapandian

Failed to store data in the Data Warehouse – SCOM Reports – Exchange Microsoft.Exchange.15.MailboxStatsSubscription

Recently when we tried to generate the top mailbox statistics report with the below option available from SCOM reports we weren’t able to generate them.

SCOMd

It was giving an empty report without any values.

Along with that few report data’s only for Exchange Servers like database IO reads/write  while trying too were empty with no values.

Upon looking into the operations manager log saw the below event ID.

Log Name:      Operations Manager
Source:        Health Service Modules
Date:          20.04.2017 09:36:58
Event ID:      31551
Task Category: Data Warehouse
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      SCOM1.exchangequery.com
Description:
Failed to store data in the Data Warehouse. The operation will be retried.
Exception ‘InvalidOperationException’: The given value of type String from the data source cannot be converted to type nvarchar of the specified target column.
One or more workflows were affected by this.
Workflow name: Microsoft.Exchange.15.MailboxStatsSubscription.Rule
Instance name: SCOM1.exchangequery.com
Instance ID: {466DF86F-CC39-046A-932D-00660D652716}
Management group: ExchangeQueryBy the above error we can see that this mailbox statistics subscription  rule has some problem and hence the reports were not generated.

Below 2 rules are required to be enabled to generate this report:

1) Exchange 2013: Mailbox Statistics Subscription.
2) Exchange 2013: Mailbox Statistics Collection.

SCOMd2

So by looking into the above event we can see that the SCOM is having trouble in writing the data into this target tables in the data-warehouse from the stage table.First the generated alerts are written on the operational stage table database by the SCOM. Then the operational database will insert these bulk datas into its Target DataWareHouse. It uses the option SQL bulk Insert because of the amount of data that it needs to insert from its stage table and needs to take this process.

During this process of bulk insert it will compare the value of the data that needs to be inserted with its default allowed values (NVARCHAR values for each tables). So if any of the alert titles have the values more than its default allowed limit then we will run into this problem.

This value can be seen in active stage under the columns in the operational manager database – Tables – Exchange2013mailboxstatsstaging- columns

Here we can see the nvarchar values for each properties of the mailbox which will be used to generate the mailbox statistics report from the scom 2012

SCOMd1

So here if any of  these nvarchar values which is required to generate the report value have exceeded the allowed limit then it will fail inserting the data into the datawarehouse. For example the default length of the allowed limit for Mailbox_EmailAddress is 1024.

Lets say if there is one system mailbox which has multiple smtp addresses added in them which exceeds this character limit then the  entire mailbox stats report will fail.

The SCOM requires in data type Nvarchar for Exchange because to support the unicode type for multi languages mainly. More details on SQL data types can be read here.

In our case we had a service account mailbox which had multiple SMTP addresses added on them and that exceed the allowed limit.

If any one run into the issue here is the simple command to identify the mailbox which has Email addresses of more than 1024 characters.

get-mailbox | where-object { $_.EmailAddresses.ProxyAddressString.ToCharArray().Length -ge 1024 } | foreach-object {write-host “$_”}

Once we find that mailbox we can remove that additional SMTP addresses and make the value less than 1024. After this the reports will be generating without any issues.

Another solution : ( Not Recommended)

Extend the nvarchar field values on the stage table as well as  target table (Exchange2013.MailboxProperties_) in DataWareHouse which will allow the data to get processed and generate the reports even if it has a large amount of data.

Its better not to change the default values as it might go as unsupported model , rather modifying  the mailbox and reducing the character limit which will keep everything in place without any customization.

Thanks & Regards
Sathish Veerapandian

Start-DatabaseAvailabilityGroup – Error: The network path was not found

During a DR activation the Activation went fine. But when trying to restore the main site after the DR tests are complete were getting the below error

Below was the Current state in the DR site before the restoration to main site :

Version of Exchange – Exchange 2016 CU3 with no coexistence

1) Main site was in stopped state for DAG and All main site exchange
servers were in Stopped mailbox servers list.
2) DR site was activated for DAG and only DR site exchange servers were in started mailbox servers and operational servers list.
3) All the DR copies were mounted , and users were connected.

After the DR tests were completed and trying to start the main site with below command was getting the below error :

Start-DatabaseAvailabilityGroup -ActiveDirectorySite  “MainSite”

A server-side database availability group administrative operation failed. Error The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API ‘”AddClusterNode() (MaxPercentage=12) failed with 0x35. Error: The network path was not found”‘

Had a look at the DAG tasks logs and was getting the same above message :

Error: A server-side database availability group administrative operation failed. Error The operation failed. Create Cluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API failed: “AddClusterNode() (MaxPercentage=12) failed with 0x35. Error: The network path was not found”.

Additionally was getting this message in the DAG Task logs

WriteError! Exception = Microsoft.Exchange.Management.Tasks.FailedToStartNodeException: Start-DatabaseAvailabilityGroup failed to start server

Solution :

Followed the below blog steps and it worked :

https://amagsmb.wordpress.com/2015/09/16/problem-adding-a-second-server-to-dag-error-cluster-api-addclusternode-maxpercentage12/

The Remote Registry Service should have the Startup type set to Automatic and be started.
An SP, Windows update or RU installation will put the service in a disabled state and it might be in that state after the update. In my case the main site servers OS were patched last week and post the patches these services might have gone to disabled state. While stopping  and evicting these nodes on activating the DR site there were no issues this was strange and went smooth without any issues.

The real issue happened  only when we tried to activate the main site on re adding these servers back to the DAG group.

Reason:

Some Exchange EMS/Power Shell functions, such as managing diagnostics logging requires the remote registry service to be enabled. So the Exchange required this service on the  remote servers to add them on the node. If this service is not started then the servers will not join in the DAG.

Thanks & Regards
Sathish Veerapandian

IMAP connection error – UID corruption detected

Recently in one of the IMAP application were getting complaints on accessing the emails via IMAP.

So enabled the imap logging to see the results

Set-IMAPSettings -Server “MBXservname” -ProtocolLogEnabled $true

After going through the lots we were getting the below error message UidCorruptionDetected

imap

Reason for this error:
Don’t access a mailbox with outlook web access while Outlook has an open connection to that mailbox using the IMAP client protocol .
If you do leave Outlook with IMAP access to a mailbox and access that mailbox from an alternate client, you might have some UID errors to accept when you get back to Outlook.
Example when an application requires either POP/IMAP connection to retrive the emails from Exchange better to choose any one of the connection type and not to access from multiple locations with different protocols.

Also another reason is if the IMAP account is configured on a application and if the application is configured to receive thousands of emails  daily, each and every time when the connection is established the  client app will try to access the entire set of emails. This will make this user account to exceed all the IMAP connection types and will make this mailbox logical UID corruption. So if we have any application accessing using IMAP connection then we need to make sure that an automation job from the application is configured on the client  side to purge the older emails which will solve the problem.

Solution :

Since its mailbox corruption , repairing the IMAPID corruption type will solve this problem.

New-MailboxRepairRequest -Mailbox “mailboxname” -CorruptionType ImapID

Thanks & Regards
Sathish Veerapandian

%d bloggers like this: