Azure AD Conditional Access policies troubleshooting – Device State: Unregistered

With Azure AD Conditional Access (CA) policies you can control that only managed devices can access resources protected by Azure AD – https://docs.microsoft.com/en-us/azure/active-directory/conditional-access/require-managed-devices#managed-devices

As mentioned in the article above, you might require the devices the sign in is taking place from to be hybrid Azure AD joined.

As explained in this blog – https://jairocadena.com/2016/11/08/how-sso-works-in-windows-10-devices/ the Azure AD Primary Refresh Token (Azure AD PRT) is used during Azure AD CA policies evaluation to get the information about Windows 10 device registration state.

The mentioned blog explains that the Azure AD PRT is initially obtained during user sign into the station. In simple words, if the Cloud AP plugin is able to authenticate on behalf of the user (UPN and password or Windows Hello for Business PIN) to get the Azure AD access token and device is able to authenticate to Azure AD using the device registration state (MS-Organization-Access certificate) the Azure AD PRT will be issued to the user.

If any of these two parts (user or device) didn’t pass the authentication step, no Azure AD PRT will be issued.

So when you see an Azure AD Conditional Access error stating that the device is NOT registered, it doesn’t necessary mean that the hybrid Azure AD join is not working in your environment, but might mean that the valid Azure AD PRT was not presented to Azure AD.

To check if the Azure AD PRT is present for the signed into Windows 10 device user, you can use the “dsregcmd /status” command. Windows 10 OS version 1809 the Azure AD PRT info is stored in the SSO State section:

+----------------------------------------------------------------------+
| SSO State                                                           |
+----------------------------------------------------------------------+
     AzureAdPrt : YES
     AzureAdPrtUpdateTime : 2019-04-03 17:25:24.000 UTC
     AzureAdPrtExpiryTime : 2019-04-17 21:25:54.000 UTC
     AzureAdPrtAuthority : https://login.microsoftonline.com/tenantID

By the way you can use usual /? Switch to get help for the “dsregcmd” command (Windows 1809 and newer versions).

Keep in mind that the Azure AD PRT is a per user token, so you might see AzureAdPrt:NO if you are running the “dsregcmd /state” as local or not synchronized (on-premises AD user UPN doesn’t match the Azure AD UPN) user.

Per my experience, here are examples of what might be the root of Azure AD PRT being absent for the user (will be updating the list as discover more possible root causes):

  1. Device indeed is not hybrid Azure AD joined;
  2. Local registration state of the computer doesn’t match the records in Azure AD:
    • Azure AD computer object was deleted by Global Admin via portal or PowerShell;
    • Computer was moved out of Azure AD Connect sync scope and was removed from Azure AD by Azure AD Connect;
    • Some services modified the Azure AD computer object and deleted the AlternativeSecurityIds attribute from Azure AD Computer object);
  3. CloudAP plugging is not able to authenticate on behalf of the user to get Azure AD access token:
    • If the user is federated, the on premises STS is not reachable or STS do not have WS-Trust endpoint enabled (yes, WS-Trust is still required for Azure AD PRT flow and optional for Windows 1803 and newer registration flow) (for AD FS the WS-Trust endpoint is – adfs/services/trust/13/usernamemixed)
    • The user has recently changed the UPN and is using Windows 1709 or older OS version and can’t get new or refresh expired Azure AD PRT – this issue was resolved in 1803 and newer);

Here are the recommended troubleshooting steps for mentioned above scenarios:

  1. To troubleshoot why the computer can’t perform hybrid Azure AD join refer to the following post – https://s4erka.wordpress.com/2018/03/06/azure-ad-device-registration-error-codes/;
  2. To better understand if there is a discrepancy between local registration state and Azure AD records, collect and review following info:
    1. “Dsregcmd /status” output on the effected computer, make the notes of the following fields: AzureAdJoined, DeviceCertificateValidity, AzureAdPrt, AzureAdPrtUpdateTime, AzureAdPrtExpiryTime;
    2. Check the Azure AD Portal – Devices blade, see if the station is present in Azure AD and has a timestamp listed in the Registered column, compare with the time in the DeviceCertificateValidity from the previous step. If there is no time stamp in the Registered column, that means that the AlternativeSecurityIds attribute (contains the MS-Organization-Access certificate thumbprint. This is the certificate that was saved to the station during registration process) was removed and the station needs to be re-joined to Azure AD;
    3. You can check if the station has the AlternativeSecurityIds attribute by using the Get-MsolDevice Azure AD PowerShell cmdlet;
    4. Check if the computer object is in the sync scope of Azure AD Connect;
  3. To get more clues about user portion of the Azure AD PRT receive process, its recommended to review the following Windows 10 logs – Application and Services Logs – Microsoft – Windows – AAD. These logs contain Operational and Analytic logs. Analytic logs are the equivalent of the Debug logs and are disabled by default. Usually you should be able to get info just by looking at the AAD Operational logs. In the AAD Operation logs look for the events generated by AadCloudAPPlugin Operation. By readying these logs you should get an idea either the STS is not reachable because of the network or protocol issues or Cloud AP is not able to authenticate on behalf of the user due to incorrect credentials or access policies configured on STS that block the authentication attempt for this user;

You can also use the “Get-WinEvent” PowerShell cmdlet to quickly pull latest AAD logs related to Azure AD Cloud AP plugin:

 
Get-WinEvent -LogName "Microsoft-Windows-AAD/Operational" -MaxEvents 20 |
where {$_.TaskDisplayName -like "*AadCloudAPPlugin*"} |
ft TimeCreated,id,KeyWordsDisplayNames,Message -wrap -autosize

Keep in mind that Windows down-level devices do not have Azure AD PRT and they “proof” to Azure AD CA that they are registered by establishing TLS authentication channel using the MS-Organization-Access certificate saved in the User certificate store during device registration. So if the successfully registered down-level Windows device is treated by Azure AD CA policy as not registered, most likely something (firewall/proxy) is messing up with that attempt of the device authentication.

And final thought. In case you need to re-join the Windows current device, make sure to follow the steps in this order to make sure the station really disjoined and will try the clean join process. Also keep in mind that since the computer object is recreated, the Bitlocker recovery keys that you might be saving in Azure AD for this station will be deleted and you will need to re-save them .

  1. Open elevated CMD (as local Admin) and issue “dsregcmd /leave”. Elevated CMD is important part, since during the leave flow, the registration service is trying to contact Azure AD and delete the computer object and also it tries to delete the MS-Organization-Access certificate from Computer certificate store, that definitely requires elevated privileges;
  2. Open new CMD window and confirm that the local registration state is cleaned and the station is not Azure AD joined by issuing “dsregcmd /status”;
  3. Using Azure AD devices portal confirm the computer object is gone, if not, delete it manually;
  4. In case you are in Managed environment, you need to run delta Azure AD Connect sync to pre-sync the AD computer object to Azure AD;
  5. Restart the station.
Advertisements

RSA SecurID Access SAML Configuration for Microsoft Office 365 issue – “AADSTS50008: Unable to verify token signature. The signing key identifier does not match any valid registered keys”

There recently have been couple cases when the customers who has configured the Azure AD federation with RSA SecureID by following these instructions https://community.rsa.com/docs/DOC-1019 were randomly experiencing the error during users sign in: “AADSTS50008: Unable to verify token signature. The signing key identifier does not match any valid registered keys”.

aadsts50008

The issue temporary goes away when the domain is converted to managed and back to federated.
Looking at the mentioned RSA instructions, noticed that the RSA recommends to set up Issuer Entity ID for the SAML response as “urn:uri:o365”.
Since multiple customers most likely are selecting the same Issuer ID (most likely by copying the example), the confusion takes place on Azure AD side what tenant the SAML response is intended to and the token signing certificate validation fails.
The recommended fix is to change the Issued Entity ID on RSA and AAD side to “urn:uri:your_domain” (ex: urn:uri:contoso) to make it unique.

Troubleshooting NPS extension for Azure Multi-Factor Authentication

I’m sure you are familiar with following official documentation how to use your existing NPS infrastructure with Azure Multi-Factor Authentication. 

And the following one is proving detailed steps to troubleshoot error messages from the NPS extension for Azure MFA

Here are the recommended troubleshooting steps in case you see the following combination of errors in the NPS Security and Microsoft-AzureMfa-AuthZ.

Log Name:     Security
Source:       Microsoft-Windows-Security-Auditing
Date:         1/22/2019 12:32:30 PM
Event ID:     6274
Task Category: Network Policy Server
Level:         Information
Keywords:     Audit Failure
User:         N/A
Computer:     XXX
Description:
Network Policy Server discarded the request for a user.
Contact the Network Policy Server administrator for more information.

User:
                    Security ID:                                                        XXX
                    Account Name:                                                XXX
                    Account Domain:                         XXX
                    Fully Qualified Account Name:                 XXX
Client Machine:
                    Security ID:                                                        NULL SID
                    Account Name:                                                –
                    Fully Qualified Account Name:                 –
                    Called Station Identifier:                             –
                    Calling Station Identifier:                            –
NAS:
                    NAS IPv4 Address:                      –
                    NAS IPv6 Address:                      –
                    NAS Identifier:                                                 –
                    NAS Port-Type:                                                –
                    NAS Port:                                                            –
RADIUS Client:
                    Client Friendly Name:                                   –
                    Client IP Address:                                            –

Authentication Details:
                    Connection Request Policy Name:          Use Windows authentication for all users
                    Network Policy Name:                                  VPN-
                    Authentication Provider:                             Windows
                    Authentication Server:                                 xxx
                    Authentication Type:                                    PAP
                    EAP Type:                                                           –
                    Account Session Identifier:                        –
                    Reason Code:                                                    9
                    Reason:                                                                The request was discarded by a third-party extension DLL file.

Log Name:     AuthZAdminCh
Source:       Microsoft-AzureMfa-AuthZ
Date:         1/22/2019 12:32:30 PM
Event ID:     3
Task Category: None
Level:         Critical
Keywords:    
User:         NETWORK SERVICE
Computer:     XXX
Description:
The following information was included with the event:
CID: xxx :Exception in Authentication Ext for User XXX :: ErrorCode:: CID :xxx ESTS_TOKEN_ERROR Msg:: Verify the client certificate is property enrolled in Azure against your tenant and the server can access URL in Registry STS_URL. Error authenticating to eSTS: ErrorCode:: ESTS_TOKEN_ERROR Msg:: Error in retrieving token details from request handle: -895352831 Enter ERROR_CODE @ https://go.microsoft.com/fwlink/?linkid=846827 for detailed TroubleShooting steps. Enter ERROR_CODE @ https://go.microsoft.com/fwlink/?linkid=846827 for detailed TroubleShooting steps.

In case you have verified that the certificate generated during NPS configuration was correctly associated with Azure MFA Client SPN and there are no network connectivity issues, I would recommend checking if Azure MFA Client and Connector SPN are enabled in your tenant.

You can do this either via Azure AD portal – go to Enterprise Applications – Change the Application Type to All, search for Azure Multi-Factor Auth Connector and Azure Multi-Factor Auth Client and make sure they are enabled.

Or you can use Azure AD PowerShell. Connect to MSOLServicies and issue following commands (first checks Client, second Connector):

Get-MsolServicePrincipal -AppPrincipalId "981f26a1-7f43-403b-a875-f8b09b8cd720" | fl *
Get-MsolServicePrincipal -AppPrincipalId "1f5530b3-261a-47a9-b357-ded261e17918" | fl *

 

If the AccountEnabled attribute is set to False, you can enable it with this PowerShell command:


Set-MsolServicePrincipal -AppPrincipalId "xx" -AccountEnabled $True

 

I will also highly recommend to have a look at the following Azure MFA NPS Extension Health Check Script – http://azuredummies.com/2018/09/11/azure-mfa-nps-extension-health-check-script-v1/

Microsoft Company Portal temporary unavailable error troubleshooting

Recently was assisting the Intune team to troubleshoot “Company Portal Temporary Unavailable” error for the iOS devices.

The Azure AD was federated with AD FS.

Looking at the Company Portal logs from mobile device the following detailed error message was discovered (extracted the part we are interested in):

InAppProcess : {context = “Failed to fetch aad service token! Error: Optional(Error with code: -1005 Domain: NSURLErrorDomain ProtocolCode:(null) Details:The network connection was lost.. Inner error details: Error domain: NSURLErrorDomain\nCode: -1005\nDescription: The network connection was lost.\nUser info: {\n   NSErrorFailingURLKey = \”https://sts.domain.com/adfs/ls/?

This error pointed us to Network/SSL issues, not Authentication.

As always used the great SSL test portal https://www.ssllabs.com/ssltest/ to check SSL settings for the AD FS public host name.

In the Handshake Simulation section we saw the following:

TLSciphers2

Looking at the supported TLS Cipher Suites saw this:

TLSciphers

Looking at the following Apple Developer site we get following information about one of the requirements for Connecting Using ATS:

The connection must use either the AES-128 or AES-256 symmetric cipher. The negotiated TLS connection cipher suite must support perfect forward secrecy (PFS) through Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) key exchange, and must be one of the following:

  • TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
  • TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
  • TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384
  • TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
  • TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
  • TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
  • TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
  • TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
  • TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
  • TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
  • TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA

In this environment the Load Balancer was installed in front of Web Application Proxy (WAP) and SSL offloading was configured for AD FS farm host name.

Adjusting the supported cipher suit settings to the recommended above values has addressed the issue.

There is another great blog post about this issue troubleshooting.

And this is Intune What’s New page describing new ATS requirement.

AD FS 2016 Access Control Policies troubleshooting

The Access Control (AC) policies were introduced in AD FS 2016.

See this official documentation to get familiar with AD FS Access Control policies concept and settings.

Recently had to troubleshoot the following scenario.

Customer has Hybrid Exchange environment with email boxes located in on premises Exchange 2010 and archives located in Exchange Online.

Exchange on premises did not have Modern Authentication enabled.

Office 365 domain federated with AD FS 2016.

AD FS was configured to use Azure MFA.

The goal was to require MFA for all external users using Outlook 2016 and accessing their mailboxes and archives and skip MFA if the user is located inside corporate network.

At the beginning the following AD FS AC policy was configured for Office 365 Relying Party Trust (RPT).

{
Permit users
	and when authentication includes MFA
except
	from 'intranet' location;

Permit everyone
}

The issue with this policy is the “Permit everyone” at the second part, since this rule will allow anybody who didn’t meet requirement in first policy, to authenticate.

So the AD FS AC policy was modified the following way and applied to O365 RPT.

{
Permit users
	and when authentication includes MFA
except
	from 'intranet' location
}

And that’s where the unexpected authentication pop-up windows and MFA prompts started to happen when users inside corporate network were opening Outlook to access their email.

User continuously was receiving the Windows Security pop-up window that didn’t except correct user credentials and there was a second pop-up asking to authenticate again and that was triggering the MFA call to the user’s phone.

Analyzing the authentication flow based on the Exchange layout mentioned above, it was confirmed that its expected to have following authentication events to happen:

  1. User accessing email box located in on premises Exchange must authenticate via AD FS using legacy authentication from Intranet.
  2. When accessing email archive, Exchange Online has to authenticate user as well and since the legacy authentication was used, Exchange Online was authenticating on behalf of the user from outside of the corporate network.

By the way, due to the fact of legacy authentication flow we were not able to use Microsoft Claims X-Ray service.

To reduce the impact to production users, it was recommended to change the Access Control policy to contain these settings:
1. Permit users from the security group with MFA and exclude Intranet
2. Permit users from the security group with MFA and exclude Internet if the client IP (public IP of the office) matches the regex.
3. Permit all.

AD FS Access Control policy now looked like this.

{
Permit users
	from 'TEST' group
	and when authentication includes MFA
except
	from 'intranet' location;

Permit users
	from 'TEST' group
	and when authentication includes MFA
except
	from 'extranet' location
	and with 'http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-client-ip' claim regex matches 'regex_for_public_IP_of_the_office' in the request;

Permit everyone
}

Applied this AD FS AC policy to Office 365 RPT and still there were two prompts when the Outlook 2016 is opened – basic authentication window when accessing on premises email box and Modern Authentication prompt with MFA when accessing EXO archive.

In the AD FS Admin logs we saw the error: “MSIS5007: The caller authorization failed for caller identity XXX for relying party trust urn:federation:MicrosoftOnline.”

And in the AD FS Debug logs see the MFA is still required regardless the fact that the authentication attempt is coming from Intranet.

After multiple additional tests and getting better understanding how the Access Control policies are applied the following AD FS AC Policy was crafted that has addressed the issue completely.

{
Permit users 
	from 'extranet' location
	and from ‘TEST’' group
	and with 'http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-forwarded-client-ip' claim regex matches 'regex_for_public_IP_of_the_office' in the request;

Permit users 
	from 'extranet' location
	and from 'TEST' group
	and when authentication includes MFA
except 
	with 'http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-forwarded-client-ip' claim regex matches 'regex_for_public_IP_of_the_office' in the request;

Permit users 
	from 'intranet' location
	and from 'TEST' group;

Permit everyone
} 

The AD FS Access Control polices above work the following way:
1. Permits authentication coming from Internet if its EXO legacy authentication protocol on behalf of the user in the test security group coming from office public IP address;
2. Permits any other authentication for the test group coming from Internet if the MFA completed, but excludes office public IP from this rule (to make sure the first rule works as expected);
3. Permits authentication from the Intranet with no MFA for the test group (authentication to on premises Exchange)
4. And the last one catch Permit All group to make sure prod users are not affected during testing.

ADFS Access Control Policy

When the AD FS AC policy was successfully tested with the users added to the TEST group, the policy was applied to the rest of the production users by removing the TEST group condition from the first three policies and removing “Permit everyone”.

PowerShell script to collect ADFS Extranet Smart Lockout events sequence

Below is slightly modified script from here to collect the sequence of the EventIDs 1203 and 1210 on single AD FS server that might help you understanding and troubleshooting the AD FS Extranet Smart Lockout (ESL) behavior.

You can read more about AD FS ESL behavior here and here.

$events = Get-WinEvent -MaxEvents 2000 -FilterHashtable @{Logname='Security';Id=1203,1210}
$events2 = ($events | select ID, Message,TimeCreated -ExpandProperty Message)
$info = @()

$events2 | foreach {

$IpAddresses = $null
$UserId = $null
$BadCount = $null

    $IpStart = $_.Message.IndexOf("<IpAddress>")
    $IpEnd = $_.Message.IndexOf("</IpAddress>")
    $IpAddresses = $_.Message.Substring($IpStart+11,($IpEnd-$IpStart-11))

    $UserIdStart = $_.Message.IndexOf("<UserId>")
    $UserIdEnd = $_.Message.IndexOf("</UserId>")
    $UserId = $_.Message.Substring($UserIdStart+8,($UserIdEnd-$UserIdStart-8))
    
 if ($_.Id -like 1210) 
    {
    $BadCountStart = $_.Message.IndexOf("<CurrentBadPasswordCount>")
    $BadCountEnd = $_.Message.IndexOf("</CurrentBadPasswordCount>")
    $BadCount = $_.Message.Substring($BadCountStart+25,($BadCountEnd-$BadCountStart-25))
    }
    else {$BadCount = $null}

$Fail = New-object -TypeName PSObject
add-member -inputobject $Fail -membertype noteproperty -name "EventID" -value $_.Id
add-member -inputobject $Fail -membertype noteproperty -name "TimeStamp" -value $_.TimeCreated
add-member -inputobject $Fail -membertype noteproperty -name "IPaddress" -value $IpAddresses
add-member -inputobject $Fail -membertype noteproperty -name "User ID" -value $UserId
add-member -inputobject $Fail -membertype noteproperty -name "BadCount" -value $BadCount

$info +=$Fail
}
$info | ft

AD FS 2016 Extranet Smart Lockout eventIDs 1203 and 1210 clarification

Continuing my journey of learning the great AD FS Extranet Smart Lockout (ESL) feature.

As mentioned in my other post, the enhancement were made in AD FS 2016 auditing and there will be Event ID 1203 logged in the ADFS Security log by ADFS Auditing in case there was a failure to validate user credentials against Active Directory.

When you have enabled ADFS Extranet Smart Lockout feature in either log or enforce mode and AD FS Security auditing was enabled (the user has AD FS ESL bad password counts set to zero), as soon as the external bad password attempt count reaches the value specified in the ExtranetLockoutThreshold (you will see event ID 1203 for each bad password attempt), the account will be locked out on AD FS for a duration specified in the ExtranetObservationWindow, the event ID 1210 will be logged in Security event log and password validation attempts will not be sent to Active directory.

As mentioned in AD FS ESL public documentation:

AD FS will write extranet lockout events to the security audit log:

  • When a user is locked out (reaches the lockout threshold for unsuccessful login attempts)
  • When AD FS receives a login attempt for a user who is already in lockout state

At the same time, no event ID 1203 will be logged, since no password validation against Active Directory is taking place.

Only after the extranet observation window expires, the password attempts will be forwarded to AD and if the password validation fails, the event ID 1203 is logged.

Please note, that the CurrentBadPasswordCount value in event ID 1210 only increases when the password validation happens against AD and at the time the account is locked on AD FS.

Also keep in mind, that when the AD FS ESL extranet observation window expires, it doesn’t clear the AD FS ESL bad password count until good password was provided, so one single 1203 event from the same bad IP location with no bad password counts cleared will put account in ESL state again for the time specified in the ExtranetObservationWindow.

Hope this information will be helpful for you.

AD FS Extranet Smart Lockout user management via remote PowerShell

Recently had experienced issue when trying to execute AD FS Extranet Smart Lockout user management cmdlet via remote PowerShell.

Invoke-Command -ComputerName Win2016-ADFS01 -scriptBlock {Get-AdfsAccountActivity -Identity user@domain.com}

Error in PowerShell:

Exception of type
‘Microsoft.IdentityServer.User.UserActivityRestServiceException’ was thrown.
+ CategoryInfo         : NotSpecified: (:) [Get-AdfsAccountActivity], User
ActivityRestServiceException
+ FullyQualifiedErrorId : Microsoft.IdentityServer.User.UserActivityRestSer
viceException,Microsoft.IdentityServer.Management.Commands.GetAdfsAccountAc
tivity
+ PSComputerName       : Win2016-ADFS01

In AD FS Admin logs on Win2016-ADFS01 server saw following error:

Log Name:     AD FS/Admin
Source:       AD FS
Date:         10/29/2018 5:20:39 PM
Event ID:     1100
Task Category: None
Level:         Error
Keywords:     AD FS
User:         domain\adfs_service_account
Computer:     Win2016-ADFS01
Description:
The Federation Service could not authorize a request to one of the REST endpoints.
Additional Data
Exception details:
Microsoft.IdentityServer.WebHost.Rest.RestRequestAuthorizationFailedException: Only AD FS Service can access this endpoint. The client was authenticated as NT AUTHORITY\ANONYMOUS LOGON.
at Microsoft.IdentityServer.Web.UserActivity.UserStoreAuthenticationVerificationMethod.VerifyTrustedRequest(WrappedHttpListenerContext context, String& auditInformation)
at Microsoft.IdentityServer.Web.Rest.RestRequestHandler.OnGetContext(WrappedHttpListenerContext context)

Solution was to enable CredSSP on management machine and Win2016-ADFS01 server and use following commands:

$cred = Get-Credential
Invoke-Command -ComputerName Win2016-ADFS01 -Authentication Credssp -credential $cred -ScriptBlock {Get-AdfsAccountActivity user@domain.com}

You can read more about managing the second hop in PowerShell remoting and consideration when enabling CredSSP in this article – https://docs.microsoft.com/en-us/powershell/scripting/setup/ps-remoting-second-hop?view=powershell-6

Update 2-14-2019: Microsoft has updated the documentation how to delegate ADFS PowerShell access to non-admin users – https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/operations/delegate-ad-fs-pshell-access

 

AD FS 2016 Extranet Smart Lockout behavior

I’m sure you are familiar with the following articles discussing the Federated account lockouts and AD FS Extranet Smart Lockout (ESL) feature and set up recommendations.

https://blogs.technet.microsoft.com/tspring/2017/01/20/federated-to-microsoft-cloud-and-account-lockouts/

https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/operations/configure-ad-fs-extranet-smart-lockout-protection

https://samilamppu.com/2018/07/09/w2016-adfs-smart-lockout/

Recently was helping the customer whose environment was experiencing high volume of on-premises AD accounts lockouts due to the external bad passwords attempts via AD FS 2016 farm.

As per second article, Microsoft recommends enabling the AD FS ESL in the log only mode. It is recommended to run AD FS ESL in such mode for 5-7 days to build the list of familiar locations per user.

Since the impact of the AD lockouts was high to the customer, they decided to switch from log to enforce mode after 24 hours of enabling the ESL, but ran into following issue.

It was not enough time to build the familiar IPs list, and some users accounts still experiencing heavy bad passwords attempts were locked out on AD FS side due to empty familiar IP addresses list.

This happened because in case in the output of Get-AdfsAccountActivity PowerShell cmdlet you see the familiar IP list is empty and the UnknownLockout = True, the user will not be able to sign in with correct password until the observationWindows time elapses or ADFS admin resets the count. In such scenario AD FS ESL works in AD FS Extranet Lockout mode introduced in AD FS 3.0.

This issue was addressed in AD FS 2019 where you can enable audit mode for smart lockout while continuing to enforce the soft lockout behavior (ADPasswordCounter)

Internal application published via Azure AD Application Proxy access issues troubleshooting

Recently was troubleshooting the issue when the internal application portal page was not loaded (part of the portal was not loaded at all) when accessed via Azure AD Application Proxy (AAD AP). The application in question was Dell Storage Manager web console, but the troubleshooting steps described below are applicable to any application.

First thing checked the Azure AD application settings related to AAD AP – Azure AD pre authentication was used, no custom domain, headers and application body translation enabled, so setup looked pretty standard.

As next step captured the Fiddler trace when accessing the internal application directly and via AAD AP.

In the trace for the AAD AP access see one of the pages fail to load and this error message:

Azure AD Application Proxy
Root cause: The connector did not respond within the timeout period.
Status code:  GatewayTimeout
Url:  https://xxx/messages
TransactionID:  XXX
ConnectorGroupId:  XXX
Timestamp:  9/4/2018 6:50:00 PM

At the same time, the “messages” page is successfully loaded when the application is accessed directly from the corporate network.

Looking closer at the request and response in both Fiddler traces see next.

Request (redacted):

GET https://IntenalHostName/messages HTTP/1.1
Origin: https://IntenalHostName
Sec-WebSocket-Key: =
Connection: Upgrade
Upgrade: Websocket
Sec-WebSocket-Version: 13
User-Agent: Mozilla/4.0
Host: IntenalHostName

Response (redacted):

HTTP/1.1 101 Switching Protocols
Expires: 0
Cache-Control: no-cache, no-store, must-revalidate
X-Powered-By: Undertow/1
Server: WildFly/8
Pragma: no-cache
Origin: https://IntenalHostName
Upgrade: WebSocket
Sec-WebSocket-Accept: pDsDNKGWwSG8=
Date: Tue, 04 Sep 2018 GMT
Connection: Upgrade
Sec-WebSocket-Location: wss://IntenalHostName/messages
Content-Length: 0

In the bad Fiddler see following:

Request (redacted):

GET https://ExternalHostName.msappproxy.net/messages HTTP/1.1
Origin: https://ExternalHostName.msappproxy.net
Sec-WebSocket-Key: nl/CD3hakpNw==
Connection: Upgrade
Upgrade: Websocket
Sec-WebSocket-Version: 13
User-Agent: Mozilla/5.0
Host: ExternalHostName.msappproxy.net
DNT: 1
Cache-Control: no-cache
Cookie: dsmUsername=; JSESSIONID=ZEfQJAHRszfZGXql33h06aRw.vdellem01; AzureAppProxyUserSessionCookie

Response:

HTTP/1.1 504 Gateway Timeout

So the issue seems to be happening when there is a request to upgrade to Websocket.

The Websocket support by Azure AD App Proxy is currently in preview and it was recommended to collect additional logs to see if it can be fixed in the current case.

To enable the verbose Connector logs, it was recommended to make these changes:

  1. Locate the installation directory of the connector (should be C:\Program Files\Microsoft AAD App Proxy Connector)
  2. Open the file ApplicationProxyConnectorService.exe.config in notepad for edit
  3. Add the following section right after appSettings:

<system.diagnostics>
  <trace autoflush=”true” indentsize=”4″>
    <listeners>
      <add name=”consoleListener” type=”System.Diagnostics.ConsoleTraceListener” />
      <add name=”textWriterListener” type=”System.Diagnostics.TextWriterTraceListener” initializeData=”<PATH_WITH_WRITE_PERMISSIONS> \ConnectorTrace.log” />
      <remove name=”Default” />
    </listeners>
  </trace>
</system.diagnostics>

  • Make sure to change to a path with write permissions
  1. Restart the connector service and reproduce the issue from your PC while capturing the Fiddler trace.

Looking at the logs, found this exception entry:

System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server —> System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. —> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.

As the next step, tried to access the application directly from the Connector server using the Internet Explorer browser and sure thing the browser complained about SSL error.

Looking closer, noticed that the internal application URL was protected by SSL certificate issued to the host running the application.

As soon as the application URL was changed to server host name on the AAD App Proxy side, the issue was resolved.

AD FS Relying Party certificates errors troubleshooting (EventID 317)

Customer has configured the new Relying Party Trust by using the Relying Party Trust Wizard and importing the data from the file that was downloaded earlier on the management computer.

When testing the Relying Party sign-on, the application was returning the error

“An error SAML response status was received. urn:oasis:names:tc:SAML:2.0:status:Responder”

Per following article https://msdn.microsoft.com/en-us/library/hh269642.aspx this means “The request could not be performed due to an error on the part of the SAML responder or SAML authority.”

Looking at the AD FS event logs, located the following self-explanatory error corresponding to unsuccessful sign in attempt.

Log Name:      AD FS/Admin
Source:        AD FS
Date:          7/3/2018 9:55:33 AM
Event ID:      317
Task Category: None
Level:         Error
Keywords:      AD FS
User:          xxx
Computer:      XXX
Description:
An error occurred during an attempt to build the certificate chain for the relying party trust ‘microsoft:identityserver:XXX’ certificate identified by thumbprint ‘xxx’. Possible causes are that the certificate has been revoked, the certificate chain could not be verified as specified by the relying party trust’s encryption certificate revocation settings or certificate is not within its validity period.
You can use Windows PowerShell commands for AD FS to configure the revocation settings for the relying party encryption certificate.
Relying party trust’s encryption certificate revocation settings: CheckChainExcludeRoot
The following errors occurred while building the certificate chain: 
A certificate chain could not be built to a trusted root authority.
The revocation function was unable to check revocation for the certificate.
The revocation function was unable to check revocation because the revocation server was offline.
User Action:
Ensure that the relying party trust’s encryption certificate is valid and has not been revoked.
Ensure that AD FS can access the certificate revocation list if the revocation setting does not specify “none” or a “cache only” setting.
Verify your proxy server setting. For more information about how to verify your proxy server setting, see the AD FS Troubleshooting Guide (http://go.microsoft.com/fwlink/?LinkId=182180).

Per environment’s security requirements, the AD FS server had no Internet access, that is why the Certificate Revocation List checks for the Relying Party Encryption and Signing certificates were failing.

Please note, that this is not recommended to turn of the revocation checking, that is why you might review your firewall policy for external connections to the Internet for AD FS and WAP (https://social.technet.microsoft.com/wiki/contents/articles/964.certificate-revocation-list-crl-verification-an-application-choice.aspx)

While the security team was reviewing the option allowing outbound connections from ADFS to some public Certificate Authority CRL URLs, we have used following switches in the Set-ADFSRelyingPartyTrust PowerShell command, to disable Relying Party certificates CRL check by setting the values to None.

-EncryptionCertificateRevocationCheck and – SigninCertificateRevocationCheck

https://docs.microsoft.com/en-us/powershell/module/adfs/set-adfsrelyingpartytrust?view=win10-ps

Per this article these are the acceptable values:

–          None (this is default value)

–          CheckEndCert

–          CheckEndCertCacheOnly

–          CheckChain

–          CheckChainCacheOnly

–          CheckChainExcludingRoot

–          CheckChainExcludingRootCacheOnly

AD FS 2.0 and Safari (iOS 9 and iOS 10) sign in issue (authentication cookies size issue)

Recently have been working on the issue when the users on iOS 9 and 10 were not able to complete the authentication from outside of corporate network (via AD FS Proxy). The users were getting the error – “There was a problem accessing the site. Try to browse to the site again. Reference number: XXX”.

The same users from iOS 11 had no issues signing in via the same AD FS farm.

The users were the members of 150-200 on premises AD security groups.

It took me some time to go over basic troubleshooting and making sure the AD FS servers have all needed patches installed – https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/deployment/updates-for-active-directory-federation-services-ad-fs, but the issue persisted.

Turned out to be a known issue, when the older Safari versions have cookies size limitation.

Looking at the captured Fiddler trace we saw the AD FS was issuing 5 MSISAuth cookies (total size around 9 Kb) and when Safari was redirected to ADFS to get the access token, only 4 MSISAuth cookies were posted to ADFS (around 8 Kb). In this case Safari was dropping the cookies.

In order to work around this issue, the customer decided to do not pass all 200 security groups in the claims.

In order to achieve this, the default “Pass through all Group SID claims” was replaced with following claim rule since the customer was using these groups in his Authorization claim rules:

c:[Type == “http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid“, Value =~ “S-1-5-21-XXXXXXXX-XXXXXXXXXX-XXXXXXXXX-XXX840|S-1-5-21-XXXXXXXXX-XXXXXXXXX-XXXXXXXXXX-124XXXX”, Issuer == “AD AUTHORITY”] => issue(claim = c);

Azure Multi-Factor Authentication Server not sending emails out for new users

Recently was troubleshooting the issue when no email is sent to the new MFA server users regardless all the configurations seems to be correct. See following official documentation for more details. 

Because Administrator was able to send the Update email to the end user, we excluded the improper SMTP server configuration.

Per MFA server Help file: New Users – An email is sent to a user added that is enabled and complete (phone specified, mobile app activated, or OATH token secret key specified), or to an updated user that was either disabled or incomplete and is now enabled and complete.

Note: Emails are only sent when Send email to users is checked and the user’s email address is specified or their username is in email address format.

Confirmed that the New user has “Send email” check box selected on the User profile General Tab and the email address is correct.

MFAnewUser

Also, by going to MFA UI – Email – Email Context confirmed all the New User templates have correct email address specified in the From field.

Checked the SMTP server logs and don’t see any email send attempt from MFA server for New User email, only connections for Update email to be send.

Time to check the MFA Server logs!

To make sure you are looking at the latest logs, go to MFA UI – Logging – View Log Files.

Looking at the MultiFactorAuthAdSyncSvc.log see following error correlating to the time when the new user was added:

2018-03-07T18:20:20.006280Z|e|2960|4036|pfadssvc|***** ERROR ***** Error sending email to NewUser@domain.com: Access to the path ‘\\FileShare\public\MFA-Instructions\MFA-Guide.pdf’ is denied.

So the Administrator used the Attachment option to send additional instructions to new users, but the File share had access restrictions preventing the MFA server Local System account reading this document.

MFAnewUser2

Solution was to either move the MFA instructions files to the MFA server or adjust the file share access permissions to allow Everyone to read the files.

Morale of the story: Read the manuals and logs, those are written by smart people 😊

Azure AD device registration error codes

Even when you followed the Hybrid Azure AD join instructions to set up your environment (https://docs.microsoft.com/en-us/azure/active-directory/device-management-hybrid-azuread-joined-devices-setup ), you still might experience some issues with the computers not registering with Azure AD.

If you are looking for troubleshooting guide for the issue when Azure AD Conditional Access policy is treating your successfully joined station as Unregistered, see my other recent post – https://s4erka.wordpress.com/2019/04/05/azure-ad-conditional-access-policies-troubleshooting-device-state-unregistered/

To check if the device was joined to Azure AD run “dsregcmd /status” command in command prompt and look at AzureAdJoined value. For the Azure AD registered devices, it should be set to YES.

If the AzureAdJoined says NO, next step will be to collect information from the Application and Services – Microsoft – Windows – User Device Registration – Admin logs.

First thing, try to locate and read the text description in the error to see if it gives any clue.

Below are some examples of the errors and possible solutions to try.

User Device Registration Admin log – EventID 304 or 305adalResponseCode: 0xcaa1000e – recommended step is to check the AD FS claim rules per mentioned above article. It is important to have the AD FS claim rules in the described order and if you have multiple verified domains, do not forget remove any existing IssuerID rule that might have been created by Azure AD Connect or other means.

User Device Registration Admin log –wmain: Unable to retrieve access token 0x80004005 – recommended step is to check the AD FS claim rules.

User Device Registration Admin log – EventID 305AdalErrorCode: 0xcaa90006 – make sure the computer is able to reach and authenticate to specified in the error text description Identity Provider endpoint.

User Device Registration Admin log – EventID 204 – Error code: 0x801c03f2 (“The device object by the given id (xxx) is not found.”) – make sure the on-premises computer object is synchronized to Azure AD. Run the Delta Azure AD Connect sync.

User Device Registration Admin log – EventID 304 (309, 201 and 233 coming before it) – Error code: 0x801c0021 (Error code: 0x80072efe in EventID 201) – most likely the network or proxy didn’t allow the connection to Azure AD device registration endpoints or IdP to complete authentication. 

User Device Registration Admin log – 0xCAA90022 Could not discover endpoint for Integrate Windows Authentication. Check your ADFS settings. It should support Integrate Widows Authentication for WS-Trust 1.3. (error message is self explanatory). In case your IdP is not AD FS consult your IdP documentation.

User Device Registration Admin log – 0xCAA9002b with this error from ADAL – ADALUseWindowsAuthenticationTenant failed, unable to perform integrated auth. Check your STS settings. It should support Integrate Widows Authentication for WS-Trust 1.3. (error message is self explanatory). In case your IdP is not AD FS consult your IdP documentation.

User Device Registration Admin log – 0x801c001d. Failed to lookup the registration service information from Active Directory. Recommended to check the Service Connection Point settings in on-premises Active Directory.

Sometimes the error description of the User Device Registration Admin log event is not providing enough information and you have to enable the User Device Registration Debug log to get more information.

To enable debug logs open Event Viewer – check “Show Analytic and Debug Logs” and browse to Application and Services – Microsoft – Windows – User Device Registration – right click on Debug log and select Enable log.

To trigger the device join attempt you have to open Command prompt as System account (you can use Sysinternals PsExec – psexec -i -s cmd.exe) and issue “dsregcmd /debug /join” command. After that disable the Debug log, check the Admin logs and if still the error description is not informative go to Debug logs.

Example 1:

Log Name:      Microsoft-Windows-User Device Registration/Admin
Source:        Microsoft-Windows-User Device Registration
Date:          2/9/2018 10:17:49 AM
Event ID:      304
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      XXX
Description:
Automatic registration failed at join phase.  Exit code: An unexpected internal error has occurred in the Platform Crypto Provider.

User Device Registration Debug log –

Log Name:      Microsoft-Windows-User Device Registration/Debug
Source:        Microsoft-Windows-User Device Registration
Date:          2/9/2018 10:23:30 AM
Event ID:      500
Task Category: None
Level:         Information
Keywords:     
User:          SYSTEM
Computer:      XXX
Description:
wmain: failed with error code 0x80290407.

Most likely this error is an indication that the TPM is not supporting Azure AD join requirements (https://docs.microsoft.com/en-us/windows/security/hardware-protection/tpm/tpm-recommendations ).

Next steps for this particular issue I would recommend for these stations are:

  • Ensure the TPM is in 2.0 mode. You will find this setting in the BIOS.
  • As a last resort, disable TPM in the BIOS, so Azure AD Join process uses software-based keys.

Example 2:

Log Name:      Microsoft-Windows-User Device Registration/Admin
Source:        Microsoft-Windows-User Device Registration
Date:          4/17/2018 12:44:10 PM
Event ID:      304
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      XXX
Description:
Automatic registration failed at join phase.  Exit code: Keyset does not exist. Server error: empty.

After running dsregcmd /debug /join see following in the output:

DsrDeviceAutoJoinFederated failed with -2146893802
wmain: failed with error code 0x80090016.

Most likely this error indicates that the machine was imaged from the already Azure AD registered computer. Also it might indicate the TPM issues (see the TMP troubleshooting steps mentioned above).

If the fist is true, try renaming the “C:\ProgramData\Microsoft\Crypto\Keys” folder and re-running the dsregcmd /debug /join.

Example 3:

Log Name:      Microsoft-Windows-User Device Registration/Admin
Source:        Microsoft-Windows-User Device Registration
Date:          5/16/2018 8:44:03 AM
Event ID:      305
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:     XXX
Description:
Automatic registration failed at authentication phase.  Unable to acquire access token.  Exit code: Unspecified error. Server error: AdalMessage: ADALUseWindowsAuthenticationTenant failed,  unable to preform integrated auth
AdalErrorCode: 0xcaa9002c

This error usually indicates an issue with connecting to AD FS farm. Check if Windows Integrated Authentication is enabled for Intranet, is working correctly for Intranet and WSTrust windows endpoints are enabled in AD FS.