Category: Troubleshooting

Azure AD Conditional Access policies troubleshooting – Device State: Unregistered

With Azure AD Conditional Access (CA) policies you can control that only managed devices can access resources protected by Azure AD – https://docs.microsoft.com/en-us/azure/active-directory/conditional-access/require-managed-devices#managed-devices

As mentioned in the article above, you might require the devices the sign in is taking place from to be hybrid Azure AD joined.

As explained in this blog – https://jairocadena.com/2016/11/08/how-sso-works-in-windows-10-devices/ the Azure AD Primary Refresh Token (Azure AD PRT) is used during Azure AD CA policies evaluation to get the information about Windows 10 device registration state.

The mentioned blog explains that the Azure AD PRT is initially obtained during user sign into the station. In simple words, if the Cloud AP plugin is able to authenticate on behalf of the user (UPN and password or Windows Hello for Business PIN) to get the Azure AD access token and device is able to authenticate to Azure AD using the device registration state (MS-Organization-Access certificate) the Azure AD PRT will be issued to the user.

If any of these two parts (user or device) didn’t pass the authentication step, no Azure AD PRT will be issued.

So when you see an Azure AD Conditional Access error stating that the device is NOT registered, it doesn’t necessary mean that the hybrid Azure AD join is not working in your environment, but might mean that the valid Azure AD PRT was not presented to Azure AD.

To check if the Azure AD PRT is present for the signed into Windows 10 device user, you can use the “dsregcmd /status” command. Windows 10 OS version 1809 the Azure AD PRT info is stored in the SSO State section:

+----------------------------------------------------------------------+
| SSO State                                                           |
+----------------------------------------------------------------------+
     AzureAdPrt : YES
     AzureAdPrtUpdateTime : 2019-04-03 17:25:24.000 UTC
     AzureAdPrtExpiryTime : 2019-04-17 21:25:54.000 UTC
     AzureAdPrtAuthority : https://login.microsoftonline.com/tenantID

By the way you can use usual /? Switch to get help for the “dsregcmd” command (Windows 1809 and newer versions).

Keep in mind that the Azure AD PRT is a per user token, so you might see AzureAdPrt:NO if you are running the “dsregcmd /state” as local or not synchronized (on-premises AD user UPN doesn’t match the Azure AD UPN) user.

Per my experience, here are examples of what might be the root of Azure AD PRT being absent for the user (will be updating the list as discover more possible root causes):

  1. Device indeed is not hybrid Azure AD joined;
  2. Local registration state of the computer doesn’t match the records in Azure AD:
    • Azure AD computer object was deleted by Global Admin via portal or PowerShell;
    • Computer was moved out of Azure AD Connect sync scope and was removed from Azure AD by Azure AD Connect;
    • Some services modified the Azure AD computer object and deleted the AlternativeSecurityIds attribute from Azure AD Computer object);
  3. CloudAP plugging is not able to authenticate on behalf of the user to get Azure AD access token:
    • If the user is federated, the on premises STS is not reachable or STS do not have WS-Trust endpoint enabled (yes, WS-Trust is still required for Azure AD PRT flow and optional for Windows 1803 and newer registration flow) (for AD FS the WS-Trust endpoint is – adfs/services/trust/13/usernamemixed)
    • The user has recently changed the UPN and is using Windows 1709 or older OS version and can’t get new or refresh expired Azure AD PRT – this issue was resolved in 1803 and newer);

Here are the recommended troubleshooting steps for mentioned above scenarios:

  1. To troubleshoot why the computer can’t perform hybrid Azure AD join refer to the following post – https://s4erka.wordpress.com/2018/03/06/azure-ad-device-registration-error-codes/;
  2. To better understand if there is a discrepancy between local registration state and Azure AD records, collect and review following info:
    1. “Dsregcmd /status” output on the effected computer, make the notes of the following fields: AzureAdJoined, DeviceCertificateValidity, AzureAdPrt, AzureAdPrtUpdateTime, AzureAdPrtExpiryTime;
    2. Check the Azure AD Portal – Devices blade, see if the station is present in Azure AD and has a timestamp listed in the Registered column, compare with the time in the DeviceCertificateValidity from the previous step. If there is no time stamp in the Registered column, that means that the AlternativeSecurityIds attribute (contains the MS-Organization-Access certificate thumbprint. This is the certificate that was saved to the station during registration process) was removed and the station needs to be re-joined to Azure AD;
    3. You can check if the station has the AlternativeSecurityIds attribute by using the Get-MsolDevice Azure AD PowerShell cmdlet;
    4. Check if the computer object is in the sync scope of Azure AD Connect;
  3. To get more clues about user portion of the Azure AD PRT receive process, its recommended to review the following Windows 10 logs – Application and Services Logs – Microsoft – Windows – AAD. These logs contain Operational and Analytic logs. Analytic logs are the equivalent of the Debug logs and are disabled by default. Usually you should be able to get info just by looking at the AAD Operational logs. In the AAD Operation logs look for the events generated by AadCloudAPPlugin Operation. By readying these logs you should get an idea either the STS is not reachable because of the network or protocol issues or Cloud AP is not able to authenticate on behalf of the user due to incorrect credentials or access policies configured on STS that block the authentication attempt for this user;

You can also use the “Get-WinEvent” PowerShell cmdlet to quickly pull latest AAD logs related to Azure AD Cloud AP plugin:

 
Get-WinEvent -LogName "Microsoft-Windows-AAD/Operational" -MaxEvents 20 |
where {$_.TaskDisplayName -like "*AadCloudAPPlugin*"} |
ft TimeCreated,id,KeyWordsDisplayNames,Message -wrap -autosize

Keep in mind that Windows down-level devices do not have Azure AD PRT and they “proof” to Azure AD CA that they are registered by establishing TLS authentication channel using the MS-Organization-Access certificate saved in the User certificate store during device registration. So if the successfully registered down-level Windows device is treated by Azure AD CA policy as not registered, most likely something (firewall/proxy) is messing up with that attempt of the device authentication.

And final thought. In case you need to re-join the Windows current device, make sure to follow the steps in this order to make sure the station really disjoined and will try the clean join process. Also keep in mind that since the computer object is recreated, the Bitlocker recovery keys that you might be saving in Azure AD for this station will be deleted and you will need to re-save them .

  1. Open elevated CMD (as local Admin) and issue “dsregcmd /leave”. Elevated CMD is important part, since during the leave flow, the registration service is trying to contact Azure AD and delete the computer object and also it tries to delete the MS-Organization-Access certificate from Computer certificate store, that definitely requires elevated privileges;
  2. Open new CMD window and confirm that the local registration state is cleaned and the station is not Azure AD joined by issuing “dsregcmd /status”;
  3. Using Azure AD devices portal confirm the computer object is gone, if not, delete it manually;
  4. In case you are in Managed environment, you need to run delta Azure AD Connect sync to pre-sync the AD computer object to Azure AD;
  5. Restart the station.
Advertisements

AD FS 2016 Extranet Smart Lockout eventIDs 1203 and 1210 clarification

Continuing my journey of learning the great AD FS Extranet Smart Lockout (ESL) feature.

As mentioned in my other post, the enhancement were made in AD FS 2016 auditing and there will be Event ID 1203 logged in the ADFS Security log by ADFS Auditing in case there was a failure to validate user credentials against Active Directory.

When you have enabled ADFS Extranet Smart Lockout feature in either log or enforce mode and AD FS Security auditing was enabled (the user has AD FS ESL bad password counts set to zero), as soon as the external bad password attempt count reaches the value specified in the ExtranetLockoutThreshold (you will see event ID 1203 for each bad password attempt), the account will be locked out on AD FS for a duration specified in the ExtranetObservationWindow, the event ID 1210 will be logged in Security event log and password validation attempts will not be sent to Active directory.

As mentioned in AD FS ESL public documentation:

AD FS will write extranet lockout events to the security audit log:

  • When a user is locked out (reaches the lockout threshold for unsuccessful login attempts)
  • When AD FS receives a login attempt for a user who is already in lockout state

At the same time, no event ID 1203 will be logged, since no password validation against Active Directory is taking place.

Only after the extranet observation window expires, the password attempts will be forwarded to AD and if the password validation fails, the event ID 1203 is logged.

Please note, that the CurrentBadPasswordCount value in event ID 1210 only increases when the password validation happens against AD and at the time the account is locked on AD FS.

Also keep in mind, that when the AD FS ESL extranet observation window expires, it doesn’t clear the AD FS ESL bad password count until good password was provided, so one single 1203 event from the same bad IP location with no bad password counts cleared will put account in ESL state again for the time specified in the ExtranetObservationWindow.

Hope this information will be helpful for you.

AD FS Extranet Smart Lockout user management via remote PowerShell

Recently had experienced issue when trying to execute AD FS Extranet Smart Lockout user management cmdlet via remote PowerShell.

Invoke-Command -ComputerName Win2016-ADFS01 -scriptBlock {Get-AdfsAccountActivity -Identity user@domain.com}

Error in PowerShell:

Exception of type
‘Microsoft.IdentityServer.User.UserActivityRestServiceException’ was thrown.
+ CategoryInfo         : NotSpecified: (:) [Get-AdfsAccountActivity], User
ActivityRestServiceException
+ FullyQualifiedErrorId : Microsoft.IdentityServer.User.UserActivityRestSer
viceException,Microsoft.IdentityServer.Management.Commands.GetAdfsAccountAc
tivity
+ PSComputerName       : Win2016-ADFS01

In AD FS Admin logs on Win2016-ADFS01 server saw following error:

Log Name:     AD FS/Admin
Source:       AD FS
Date:         10/29/2018 5:20:39 PM
Event ID:     1100
Task Category: None
Level:         Error
Keywords:     AD FS
User:         domain\adfs_service_account
Computer:     Win2016-ADFS01
Description:
The Federation Service could not authorize a request to one of the REST endpoints.
Additional Data
Exception details:
Microsoft.IdentityServer.WebHost.Rest.RestRequestAuthorizationFailedException: Only AD FS Service can access this endpoint. The client was authenticated as NT AUTHORITY\ANONYMOUS LOGON.
at Microsoft.IdentityServer.Web.UserActivity.UserStoreAuthenticationVerificationMethod.VerifyTrustedRequest(WrappedHttpListenerContext context, String& auditInformation)
at Microsoft.IdentityServer.Web.Rest.RestRequestHandler.OnGetContext(WrappedHttpListenerContext context)

Solution was to enable CredSSP on management machine and Win2016-ADFS01 server and use following commands:

$cred = Get-Credential
Invoke-Command -ComputerName Win2016-ADFS01 -Authentication Credssp -credential $cred -ScriptBlock {Get-AdfsAccountActivity user@domain.com}

You can read more about managing the second hop in PowerShell remoting and consideration when enabling CredSSP in this article – https://docs.microsoft.com/en-us/powershell/scripting/setup/ps-remoting-second-hop?view=powershell-6

Update 2-14-2019: Microsoft has updated the documentation how to delegate ADFS PowerShell access to non-admin users – https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/operations/delegate-ad-fs-pshell-access

 

Internal application published via Azure AD Application Proxy access issues troubleshooting

Recently was troubleshooting the issue when the internal application portal page was not loaded (part of the portal was not loaded at all) when accessed via Azure AD Application Proxy (AAD AP). The application in question was Dell Storage Manager web console, but the troubleshooting steps described below are applicable to any application.

First thing checked the Azure AD application settings related to AAD AP – Azure AD pre authentication was used, no custom domain, headers and application body translation enabled, so setup looked pretty standard.

As next step captured the Fiddler trace when accessing the internal application directly and via AAD AP.

In the trace for the AAD AP access see one of the pages fail to load and this error message:

Azure AD Application Proxy
Root cause: The connector did not respond within the timeout period.
Status code:  GatewayTimeout
Url:  https://xxx/messages
TransactionID:  XXX
ConnectorGroupId:  XXX
Timestamp:  9/4/2018 6:50:00 PM

At the same time, the “messages” page is successfully loaded when the application is accessed directly from the corporate network.

Looking closer at the request and response in both Fiddler traces see next.

Request (redacted):

GET https://IntenalHostName/messages HTTP/1.1
Origin: https://IntenalHostName
Sec-WebSocket-Key: =
Connection: Upgrade
Upgrade: Websocket
Sec-WebSocket-Version: 13
User-Agent: Mozilla/4.0
Host: IntenalHostName

Response (redacted):

HTTP/1.1 101 Switching Protocols
Expires: 0
Cache-Control: no-cache, no-store, must-revalidate
X-Powered-By: Undertow/1
Server: WildFly/8
Pragma: no-cache
Origin: https://IntenalHostName
Upgrade: WebSocket
Sec-WebSocket-Accept: pDsDNKGWwSG8=
Date: Tue, 04 Sep 2018 GMT
Connection: Upgrade
Sec-WebSocket-Location: wss://IntenalHostName/messages
Content-Length: 0

In the bad Fiddler see following:

Request (redacted):

GET https://ExternalHostName.msappproxy.net/messages HTTP/1.1
Origin: https://ExternalHostName.msappproxy.net
Sec-WebSocket-Key: nl/CD3hakpNw==
Connection: Upgrade
Upgrade: Websocket
Sec-WebSocket-Version: 13
User-Agent: Mozilla/5.0
Host: ExternalHostName.msappproxy.net
DNT: 1
Cache-Control: no-cache
Cookie: dsmUsername=; JSESSIONID=ZEfQJAHRszfZGXql33h06aRw.vdellem01; AzureAppProxyUserSessionCookie

Response:

HTTP/1.1 504 Gateway Timeout

So the issue seems to be happening when there is a request to upgrade to Websocket.

The Websocket support by Azure AD App Proxy is currently in preview and it was recommended to collect additional logs to see if it can be fixed in the current case.

To enable the verbose Connector logs, it was recommended to make these changes:

  1. Locate the installation directory of the connector (should be C:\Program Files\Microsoft AAD App Proxy Connector)
  2. Open the file ApplicationProxyConnectorService.exe.config in notepad for edit
  3. Add the following section right after appSettings:

<system.diagnostics>
  <trace autoflush=”true” indentsize=”4″>
    <listeners>
      <add name=”consoleListener” type=”System.Diagnostics.ConsoleTraceListener” />
      <add name=”textWriterListener” type=”System.Diagnostics.TextWriterTraceListener” initializeData=”<PATH_WITH_WRITE_PERMISSIONS> \ConnectorTrace.log” />
      <remove name=”Default” />
    </listeners>
  </trace>
</system.diagnostics>

  • Make sure to change to a path with write permissions
  1. Restart the connector service and reproduce the issue from your PC while capturing the Fiddler trace.

Looking at the logs, found this exception entry:

System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server —> System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. —> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.

As the next step, tried to access the application directly from the Connector server using the Internet Explorer browser and sure thing the browser complained about SSL error.

Looking closer, noticed that the internal application URL was protected by SSL certificate issued to the host running the application.

As soon as the application URL was changed to server host name on the AAD App Proxy side, the issue was resolved.

AD FS Relying Party certificates errors troubleshooting (EventID 317)

Customer has configured the new Relying Party Trust by using the Relying Party Trust Wizard and importing the data from the file that was downloaded earlier on the management computer.

When testing the Relying Party sign-on, the application was returning the error

“An error SAML response status was received. urn:oasis:names:tc:SAML:2.0:status:Responder”

Per following article https://msdn.microsoft.com/en-us/library/hh269642.aspx this means “The request could not be performed due to an error on the part of the SAML responder or SAML authority.”

Looking at the AD FS event logs, located the following self-explanatory error corresponding to unsuccessful sign in attempt.

Log Name:      AD FS/Admin
Source:        AD FS
Date:          7/3/2018 9:55:33 AM
Event ID:      317
Task Category: None
Level:         Error
Keywords:      AD FS
User:          xxx
Computer:      XXX
Description:
An error occurred during an attempt to build the certificate chain for the relying party trust ‘microsoft:identityserver:XXX’ certificate identified by thumbprint ‘xxx’. Possible causes are that the certificate has been revoked, the certificate chain could not be verified as specified by the relying party trust’s encryption certificate revocation settings or certificate is not within its validity period.
You can use Windows PowerShell commands for AD FS to configure the revocation settings for the relying party encryption certificate.
Relying party trust’s encryption certificate revocation settings: CheckChainExcludeRoot
The following errors occurred while building the certificate chain: 
A certificate chain could not be built to a trusted root authority.
The revocation function was unable to check revocation for the certificate.
The revocation function was unable to check revocation because the revocation server was offline.
User Action:
Ensure that the relying party trust’s encryption certificate is valid and has not been revoked.
Ensure that AD FS can access the certificate revocation list if the revocation setting does not specify “none” or a “cache only” setting.
Verify your proxy server setting. For more information about how to verify your proxy server setting, see the AD FS Troubleshooting Guide (http://go.microsoft.com/fwlink/?LinkId=182180).

Per environment’s security requirements, the AD FS server had no Internet access, that is why the Certificate Revocation List checks for the Relying Party Encryption and Signing certificates were failing.

Please note, that this is not recommended to turn of the revocation checking, that is why you might review your firewall policy for external connections to the Internet for AD FS and WAP (https://social.technet.microsoft.com/wiki/contents/articles/964.certificate-revocation-list-crl-verification-an-application-choice.aspx)

While the security team was reviewing the option allowing outbound connections from ADFS to some public Certificate Authority CRL URLs, we have used following switches in the Set-ADFSRelyingPartyTrust PowerShell command, to disable Relying Party certificates CRL check by setting the values to None.

-EncryptionCertificateRevocationCheck and – SigninCertificateRevocationCheck

https://docs.microsoft.com/en-us/powershell/module/adfs/set-adfsrelyingpartytrust?view=win10-ps

Per this article these are the acceptable values:

–          None (this is default value)

–          CheckEndCert

–          CheckEndCertCacheOnly

–          CheckChain

–          CheckChainCacheOnly

–          CheckChainExcludingRoot

–          CheckChainExcludingRootCacheOnly

AD FS 2.0 and Safari (iOS 9 and iOS 10) sign in issue (authentication cookies size issue)

Recently have been working on the issue when the users on iOS 9 and 10 were not able to complete the authentication from outside of corporate network (via AD FS Proxy). The users were getting the error – “There was a problem accessing the site. Try to browse to the site again. Reference number: XXX”.

The same users from iOS 11 had no issues signing in via the same AD FS farm.

The users were the members of 150-200 on premises AD security groups.

It took me some time to go over basic troubleshooting and making sure the AD FS servers have all needed patches installed – https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/deployment/updates-for-active-directory-federation-services-ad-fs, but the issue persisted.

Turned out to be a known issue, when the older Safari versions have cookies size limitation.

Looking at the captured Fiddler trace we saw the AD FS was issuing 5 MSISAuth cookies (total size around 9 Kb) and when Safari was redirected to ADFS to get the access token, only 4 MSISAuth cookies were posted to ADFS (around 8 Kb). In this case Safari was dropping the cookies.

In order to work around this issue, the customer decided to do not pass all 200 security groups in the claims.

In order to achieve this, the default “Pass through all Group SID claims” was replaced with following claim rule since the customer was using these groups in his Authorization claim rules:

c:[Type == “http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid“, Value =~ “S-1-5-21-XXXXXXXX-XXXXXXXXXX-XXXXXXXXX-XXX840|S-1-5-21-XXXXXXXXX-XXXXXXXXX-XXXXXXXXXX-124XXXX”, Issuer == “AD AUTHORITY”] => issue(claim = c);

Azure Multi-Factor Authentication Server not sending emails out for new users

Recently was troubleshooting the issue when no email is sent to the new MFA server users regardless all the configurations seems to be correct. See following official documentation for more details. 

Because Administrator was able to send the Update email to the end user, we excluded the improper SMTP server configuration.

Per MFA server Help file: New Users – An email is sent to a user added that is enabled and complete (phone specified, mobile app activated, or OATH token secret key specified), or to an updated user that was either disabled or incomplete and is now enabled and complete.

Note: Emails are only sent when Send email to users is checked and the user’s email address is specified or their username is in email address format.

Confirmed that the New user has “Send email” check box selected on the User profile General Tab and the email address is correct.

MFAnewUser

Also, by going to MFA UI – Email – Email Context confirmed all the New User templates have correct email address specified in the From field.

Checked the SMTP server logs and don’t see any email send attempt from MFA server for New User email, only connections for Update email to be send.

Time to check the MFA Server logs!

To make sure you are looking at the latest logs, go to MFA UI – Logging – View Log Files.

Looking at the MultiFactorAuthAdSyncSvc.log see following error correlating to the time when the new user was added:

2018-03-07T18:20:20.006280Z|e|2960|4036|pfadssvc|***** ERROR ***** Error sending email to NewUser@domain.com: Access to the path ‘\\FileShare\public\MFA-Instructions\MFA-Guide.pdf’ is denied.

So the Administrator used the Attachment option to send additional instructions to new users, but the File share had access restrictions preventing the MFA server Local System account reading this document.

MFAnewUser2

Solution was to either move the MFA instructions files to the MFA server or adjust the file share access permissions to allow Everyone to read the files.

Morale of the story: Read the manuals and logs, those are written by smart people 😊

Azure Multi-Factor Authentication Server with ADFS – EventID 105 troubleshooting. Part 2

You might already have checked for the EventID 105 error solution in my previous post.

This time the issue was similar, followed the official instructions – https://docs.microsoft.com/en-us/azure/multi-factor-authentication/multi-factor-authentication-get-started-adfs-w2k12 and when restarting the AD FS service we got the EventID 105.

Looking at the ADFS Debug logs see new error:
Log Name:      AD FS Tracing/Debug
Source:        AD FS Tracing
Date:          3/6/2018 3:03:41 PM
Event ID:      183
Task Category: None
Level:         Error
Keywords:      ExternalAuthentication
User:          XXX
Computer:      XXX
Description:
OnAuthenticationPipelineLoad() exception: System.Exception: Error connecting to Multi-Factor Authentication service. —> System.Runtime.InteropServices.SEHException: External component has thrown an exception.
   at native.construct(construct_ret_t* , __MIDL_pfAgent_idl_0009 )
   at PfSvcClientClr.PfSvcClient.construct(ConstructTarget target, ConstructResult& result)
   at pfadfs.AuthenticationAdapter.ConnectToService(ConstructTarget constructTarget, Int32 lcid)
   — End of inner exception stack trace —
   at pfadfs.AuthenticationAdapter.ConnectToService(ConstructTarget constructTarget, Int32 lcid)
   at pfadfs.AuthenticationAdapter.OnAuthenticationPipelineLoad(IAuthenticationMethodConfigData configData)
   at Microsoft.IdentityServer.Web.Authentication.External.ExternalAuthenticationHandlerBase.

Looking at MultiFactorAuthenticationAdfsAdapter.config file closer, have noticed that the value of UseWebServiceSdk is True, so have changed it to true, re-run the Registration script and there were no errors after AD FS service restart.

How to replace the host name in captured Fiddler trace

For the previous post had to find out how to replace the host name in the captured Fiddler trace.

To do this, open the Fiddler, go to Rules – Customize Rules. The Fiddler ScriptEditor will open.

Use following code to create new column named MaskedHostName with new host names. You can put it right after commented (green) part of the script.

public static BindUIColumn("MaskedHostName", 60)
function FillNameColumn(oS: Session): String {
if (oS.hostname.EndsWith("OtherHost1.OldDomain.com")) return "OtherHost1.NewDomain.com”;
if (oS.hostname.EndsWith("Host1.OldDomain.com")) return "Host1.NewDomain.com”;
return oS.hostname;
}

 If some of your host names look similar (like “OtherHost1” and “Host1”) make sure you put the longer host name to the top of the list, so its evaluated first and you don’t have both host names replaced by “Host1.NewDomain.com”

Federated applications (CRM and IIS) ADFS Single Sign-On (SSO) troubleshooting with Fiddler

Recently had very interesting issue to troubleshoot. This (long 😊 ) troubleshooting description for sure will help many to understand the ADFS Single Sign-On (SSO) flow and how to read the Fiddler traces.

Environment: ADFS 3.0, CRM 2013, IIS 8.5 running a site. Both the CRM and the IIS site are federated with the ADFS.

The CRM and the IIS site were accessed from outside of the corporate network, so only Form Based Authentication was taking place when redirected to the ADFS.

Problem: If the user accesses the IIS site first, completes authentication to the ADFS, then the user browses to the CRM site (using the same browser), the ADFS SSO takes place and user do not have to authenticate second time (put user name and password) via ADFS to access the CRM.

But if the user accesses the CRM first, completes authentication to the ADFS and then browses to the IIS site, the ADFS SSO doesn’t take place and the user is presented with the ADFS Form Based Authentication (FBA) page.

Another variable added to the puzzle was the fact that the CRM and the IIS belong to one Active Directory domain (lets call it EXTERNAL) and the ADFS belongs to other (call it PUBLIC). The two-way trust was configured between these domains. As troubleshooting continued, the issue was replicated if all three services (ADFS, CRM, IIS) were placed in the same AD, so the issue was NOT about on-premises AD location and which domain the services belonged to (more details explained below).

Troubleshooting: As always in such cases, the Fiddler trace was captured to get better understanding of browser redirections and sign in processes.

Here is a non-working SSO attempt (Note: in some screenshots I’m not able to show all the details (but will do my best to provide good description).

AuthFlowAllDomainA

Frame #2 – user accesses the CRM and is redirected to the ADFS;

Frame #3-8 – user completes the ADFS FBA (providing correct username/password) and browser gets the ADFS SSO cookie – MSISAuth=AAEAAJo…;

ADFSSSOcookie

Frame #9 – the ADFS redirects the browser with the ADFS SSO cookie to itself, where the ADFS SSO cookie is exchanged to the ADFS access token (MSISAuthenticated) that will be presented to the application as a proof that the user was authenticated;

ADFSappCookie

Frame #10 – browser is redirected to the CRM Ws-Fed endpoint configured in the ADFS CRM Relying Party, where MSISAuthenticated cookie is exchanged to two application session cookies (MSISAuth=77uj… and MSISAuth=VWJ0…). These two cookies will always be presented to the CRM by browser as a proof that this is “authenticated” session;

Frame10BadSignIn

Frame #13 – user opens a new tab in the browser and goes to the IIS site. The browser presents two CRM session cookies to the IIS site and obviously, the IIS site doesn’t recognize them and redirects the browser to the ADFS for authentication;

IISredirection

Frame #14 (were all fun begins) – The browser presents 3 cookies to the ADFS – the ADFS SSO cookie we got in Frame #8 + 2 CRM application cookies;

ADFSFBAloop

Looks like regardless correct ADFS SSO cookie presented (MSISAuth=AAEAAJo…), it was not accepted by the ADFS and the Form Based Authentication sign in page is returned. No errors in the ADFS Admin logs.

In the ADFS Debug logs see the following error:

Log Name:      AD FS Tracing/Debug
Source:        AD FS Tracing
Date:          2/6/2018 1:52:20 PM
Event ID:      67
Task Category: None
Level:         Error
Keywords:      ADFSProtocol
User:          S-1-5-21-1337953637-3591879799-2366245552-4020
Computer:      XXX
Description:
Ignore corrupted SSO cookie.

Have confirmed that this is expected ADFS 3.0 behavior to parse all cookies through its code pipe. Since the browser was presenting 2 cookies with the same name (MSISAuth – one set by the ADFS, other by the CRM) only the last one in the pipe was treated as the ADFS SSO cookie. But as we see from previous screenshot, the last cookie in the pipe was MSISAuth=77uj… and was set by CRM and for sure is not a valid ADFS SSO cookie.

When looking at the trace when we access the IIS site first and the CRM second, the issue with non-working SSO is NOT present, because the IIS site was setting cookies with the name of FedAuth and that cookie name is not causing the issue during the ADFS cookies evaluation.

IIScookies

Explained the above flow to the owner of the environment and said that the solution is to see if we can make sure the CRM is not using MSISAuth name for its session cookie.

After that, the owner of the environment added another variable to this puzzle . He said he has other CRM and IIS site federated with the same ADFS 3.0 farm, but there the issue we are troubleshooting is NOT present there!

Captured new Fiddler trace for working scenario.

Following screenshot is for working sign in.

AuthFlowDomainA-B

The authentication flow is the same – user accesses CRM (#2), browser redirected to the ADFS, successfully authenticates (#3-8), the CRM WS-Fed endpoint sets the session cookies with name MSISAuth (#9-10), user opens new tab to the IIS site (#12), site redirects to the ADFS (#13).

But looking at the Frame #13 we see that the browser is sending ADFS only one MSISAuth cookie (the ADFS SSO cookie) which ADFS accepts and issues MSISAuthenticated cookie to the IIS site (SSO takes place to IIS).

Frame13Good signin

So now the question was – why in one scenario the browser presents 3 cookies (ADFS SSO + 2 app session cookies), in other only one ADFS SSO cookie.

You might have already noticed the difference in the scenarios. As mentioned at the beginning, it was NOT about the local AD and what Active Directory domains each of three services belong to.

The explanation was discovered in the Frame #10 where the CRM was setting the application cookies. Going to the Raw tab in the response window and viewing the frame in Notepad, gave the explanation.

DomainInTheCookie

The domain scope was set for the cookie by the CRM.

Since in non-working scenario, all three services (ADFS, CRM, IIS) are located in the same domain name space (domainA.com), the browser was presenting the CRM cookies as well to the ADFS when redirected from the IIS site.

In the second scenario the CRM was specifying domainB.com in the session cookies and because ADFS belong to domainA.com, the browser was not presenting the CRM cookies with the ADFS SSO cookie when it was redirected from the IIS to the ADFS for authentication.

DomainInTheCookie2

To resolve the issue, it was decided to move production ADFS host name to different domain name space than CRM and IIS by the owner of the described environment.

So far was not able to confirm with the CRM team if its possible to change the name of the session cookies or make sure the domain name is not specified (though think the last is not a valid option at all).