I’ve just posted a new Greatest Hits article on the ILM forum on the subject of how ILM (or the FIM Sync Service) can be used to clean up the mess of existing accounts, before you can actually get on to the more interesting tasks of provisioning and updating. With the way FIM codeless sync works, needing an existing attribute to match on, and only allowing simple matching rules, it will be more important than ever to start from a position of tidy directories with correctly identified existsing accounts. Here’s the article…
Phase One Joins and Data Matching
The tedious truth is that most IdM projects must begin with a phase of data matching and cleaning. Before you can start to automate management of identities, you need a predictable
data set around which to base your rules. Many organizations today, whether due to changes in IT personnel, company mergers, variable naming conventions or lack of guidance on handling resignations, have existing user account bases that can only be described as a mess.
This document covers some of the methods you can use with ILM to get through that first project phase. Unfortunately there are no magic bullets here – eye-straining, brain-numbing trawls through long lists of unmatched accounts cannot usually be avoided. What you can do is try to extract quality data, and construct your lists as helpfully as possible, so they can be targeted at the people whose eyeballs and brains are most likely to give the best response.
It must be added that this is a very big topic, and the methods you choose will depend on the data you are faced with, and the aims of your project.
Really, it’s all about joins.
Once existing accounts are joined to their correct data source (such as matching an HR record to an AD account) you can begin to flow updates.
Once you have a clear idea who does and does not have an account in a target directory, you can begin to make provisioning and deprovisioning decisions.
Your eventual aim is a simple Join Rule
When you first start data matching, you will use a lot of different join rules, and you will make joins manually and with CSV files (more on that below). But keep
this in mind:
Any join made manually, or with extra effort, should be considered temporary.
There are various situations in ILM which can only be reliably rectified by a clear-out and re-import of a connector space.
Always plan for this.
Ideally, when you re-import a connector space, you will have a single, direct join rule which effortlessly re-joins all your objects. And to achieve this we use…
Once the join is verified you should export a uniquely identifying attribute, such as an employee number, to the target directory.
After that a simple “employeeID = employeeID” type join rule is all you will need.
Sometimes you are faced with an import-only system. For either technical or political reasons you are not able to export the breadcrumb attribute.
There are a couple of things you can do:
- Import the DN or other identifying attribute from the target directory into an attribute on the Metaverse object.But be warned, you could still lose these joins if you had to repopulate your Metaverse.
- To be on the safe side you should also “save” these joins by exporting an identifying pair somewhere else – for example using a database or text file
MA, as pictured below.
Understanding Join Rules
There are some key points to note about Join Rules:
- Join rules operate on the CS object. It takes one CS object and attempts to find a match among all Metaverse objects of the correct type, even if they are already joined.
- Components within one rule are AND-ed together – they must all match.
- Multiple rules are evaluated top down, so put your strongest rules at the top.
- While you can offer variations on a CS objects attribute using an Advanced Join Rule, you have to find an exact match with an attribute already on a Metaverse
object. There are no “StartsWith” or “Contains” comparisons.
At the bottom of the Join Rule configuration you will see a check box “Use rules extension to resolve”.
Here you can link to code you write under the ResolveJoinSearch subroutine in the MA extension.
The Resolve rule is used when ILM finds multiple possible matches in the Metaverve (including already-joined objects).
All the possible matches into the collection rgmventry and your job, in the code, is to check through them, looking for the best possible match.
If your code finds an ideal match you return the index number of the object in imventry, and set the value of ResolveJoinSearch to true.
The following example only joins if a single, unjoined Metaverse object was found.
Public Function ResolveJoinSearch(ByVal joinCriteriaName As String, ByVal csentry As CSEntry, ByVal rgmventry() As MVEntry, ByRef imventry As Integer, ByRef MVObjectType As String) As Boolean Implements IMASynchronization.ResolveJoinSearch Select Case joinCriteriaName Case "Resolve_NotYetJoined" If rgmventry.Length = 1 AndAlso _ rgmventry(0).ConnectedMAs("AD").Connectors.Count = 0 Then imventry = 0 Return True Else Return False End If End Select End Function
Advanced Join Rules
A simple join rule directly matches a connector space attribute to a Metaverse attribute:
|givenName = Kathryn||FirstName = Kathryn|
|sn = Bigalow||Lastname = Bigalow|
With an Advanced Join Rule you construct a list of possible values with which to find an exact match in the Metaverse:
|givenName = KathryngivenName = KategivenName = Kathy||FirstName = Kate|
|sn = Bigalow||Lastname = Bigalow|
Sometimes you can use code rules to make your list of possible matches (eg., presenting a phone number in different formats); other times you have to use long
look-up lists of possible variations (try genealogy websites for name-variation lists).
The following example uses a lookup file of aliases, where each line has the possible variations on a name.
If a match to the first name is found on the connector space object, the whole line is added to the possible values to search for in the Metaverse.
Public Class MAExtensionObject Implements IMASynchronization Dim fileAliases As System.IO.StreamReader Dim arrAliases As String() Dim i As Integer Public Sub Initialize() Implements IMASynchronization.Initialize fileAliases = New System.IO.StreamReader("C:\aliases.txt", System.Text.Encoding.Default) i = 0 While Not fileAliases.EndOfStream ReDim Preserve arrAliases(i) arrAliases(i) = fileAliases.ReadLine i = i + 1 End While fileAliases.Close() End Sub Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin Select FlowRuleName Case "Join_aliases" Dim aliasList As String Dim value As String For Each aliasList In arrAliases If aliasList.Contains(csentry("givenName").Value) Then For Each value In aliasList.Split(",".ToCharArray) values.Add(value) Next End If Next End Select End Sub End Class
Using CSV Files
A lot of the difficult account matching will probably be done outside ILM.
It is therefore useful to be able to “export” lists of possible matches, and later, “import” the joins from a CSV file.
Be careful when writing to files from extension code – the DLL doesn’t unload for five minutes after the MA run completes, which means you may have to wait for it
to finish writing to the file. If you’re in a hurry, recompiling the code will force the DLL to finish writing to the file.
Export Possible Matches to a CSV file
You can use the Resolve rule to export possible matches to a CSV file. The following example resolves the join rule “sn | Direct | lastname”. As a match on the last
name alone is too weak for an immediate join, we just write the possible matches to the text file.
Public Class MAExtensionObject Implements IMASynchronization Dim fileMatches As System.IO.StreamWriter Public Sub Initialize() Implements IMASynchronization.Initialize fileMatches = New System.IO.StreamWriter("C:\possible matches.txt", System.Text.Encoding.Default) End Sub Public Sub Terminate() Implements IMASynchronization.Terminate fileMatches.Close() End Sub Public Function ResolveJoinSearch(ByVal joinCriteriaName As String, ByVal csentry As CSEntry, ByVal rgmventry() As MVEntry, ByRef imventry As Integer, ByRef MVObjectType As String) As Boolean Implements IMASynchronization.ResolveJoinSearch Select Case joinCriteriaName Case "Resolve_Lastname" Dim MAName As String = csentry.MA.Name Dim mvobject As MVEntry Dim cFirstname, mFirstname As String If csentry("givenName").IsPresent Then cFirstname = csentry("givenName").StringValue Else cFirstname = "UNKNOWN" End If If mvobject("firstname").IsPresent Then mFirstname = mvobject("firstname").StringValue Else mFirstname = "UNKNOWN" End If For Each mvobject In rgmventry If mvobject.ConnectedMAs(MAName).Connectors.Count = 0 Then fileMatches.WriteLine(csentry("sn").StringValue & ";" _ & cFirstname & ";" _ & mvobject("lastname").StringValue & ";" _ & mFirstname) End If Next Return False End Select End Function End Class
You will need to adapt this code of course. Firstly, you probably want to export a lot more identifying information in your CSV file – department, email address,
dn … whatever helps. Next, it can really help to supplement your possible matches with a probability score. This is where you do a series of tests and add points-
the more points, the higher the chance of the match. For example:
- Names similar* +1
- Department the same +1
- City the same
*Some tips for testing if names are similar:
- Strip out all spaces, dashes and hyphens then compare;
- Check if one string is contained in the other (so that “Sally-Anne” gets
a point for “Sally”);
- Use a function which compares string similarity (search “Soundex” and “Levenshtein
Join from CSV
You can use an Advanced Join Rule to “import” joins from a CSV file.
Firstly, our CSV file must be constructed like this:
For example, if we are trying to match an AD account against Metaverse objects imported from the HR system, we populate the CSV with the AD DN and the employeeID:
Next we create the Advanced Join Rule which will look up the csobject’s DN in the text file, but use the employeeID to search the Metaverse.
Note that you can’t actually use the DN in the join rule – but that’s ok, just use any attribute that definitely exists. Eg.,
sAMAccountName | Rules extension – Join_CSV | employeeID
And now for the code :
Imports Microsoft.MetadirectoryServices Public Class MAExtensionObject Implements IMASynchronization Dim joins As String() Dim i As Integer Public Sub Initialize() Implements IMASynchronization.Initialize Dim fileJoins As System.IO.StreamReader Dim strLine As String 'Open the csv file and read into an array fileJoins = New System.IO.StreamReader("C:\joins.csv", System.Text.Encoding.Default) i = 0 While Not fileJoins.EndOfStream ReDim Preserve joins(i) joins(i) = fileJoins.ReadLine i = i + 1 End While fileJoins.Close() End Sub Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin Select FlowRuleName Case "Join_CSV" 'If the csentry DN is found in the joins array, then 'use the paired employeeID to search the Metaverse. For i = 0 To joins.Length - 1 If joins(i).Contains(csentry.DN.ToString) Then values.Add(joins(i).Split(";")(1)) End If Next End Select End Sub
People will ask you questions like “How many people have you joined in system X but not in Y?”, “How sure are you that the joins are correct?”, “Which department
has the most unidentified accounts?” It’s best to be prepared for these sorts of questions.
Querying the Metaverse
Once data is in the Metaverse it is a simple matter to access it for reporting, either by exporting it into a reporting table, or by directly querying the underlying
tables (as long as you’re careful to do it when ILM is idle, or else use NOLOCK).
So consider this: During the data cleaning phase, import all identifying attributes into the Metaverse from all sources.
For example: You’ve made a join between a user in AD and an HR record.
Under normal operations you would consider HR as the master source for the name, and you would only flow it from there.
You wouldn’t bother importing the name attributes from AD – in fact you’re more likely to be overwriting them with export flow rules.
|Lastname = Powells-Brown||->||lastname = Powells-Brown||->||sn = Powells-Brown|
|Firstname = Joanna||->||firstname = Joanna||->||givenName = Joanna|
However, during the data matching phase, you’re probably not ready to start overwriting attributes, and the information about current values can be very important in your
verification and reporting.
|Lastname = Powells-Brown||->||HR_lastname = Powells-Brown||->||sn = Powells-Brown|
|Firstname = Joanna||->||HR_firstname = Joanna
AD_lastname = Powells
AD_firstname = Jo
|givenName = Joanna
givenName = Jo
Now, if you have a look at the mms_metaverse table in the ILM database, you will see how simple it is to query the progress of your joins, and also to judge
on what criteria the joins were made.
Some example queries…
/* HR person with no join to AD */
select HR_lastname, HR_firstname, HR_employeeid from mms_metaverse
where AD_dn is null
/* HR person with join to AD */
select HR_lastname, HR_firstname, HR_employeeid, AD_lastname,AD_firstname,AD_dn from mms_metaverse
where AD_dn is not null
|As mentioned above you need to be
careful when directly querying the Metaverse tables.If your system is already in production, and you happen to be adding in
a new data source, then you may be better off employing a SQL MA to export
the data you’re interested in to another table, where you can query it as
much as you like without risk of locking errors.
Querying the Connector Space
Unfortunately it is not so simple to query the connector space to, for example, report on that state of your disconnected objects.
It’s a great pity that you can’t just save results from the Joins page in the Identity Manager GUI, so your options are:
SQL query – The CS table holds data differently to the Metaverse tables.
It is possible to query for disconnectors in this way, however you will only be able to retrieve the CN of objects – which may not be sufficient to identify them.
select cs.rdn from dbo.mms_connectorspace cs join dbo.mms_management_agent ma on cs.ma_id = ma.ma_id left outer join dbo.mms_csmv_link mv on mv.cs_object_id = cs.object_id where ma.ma_name = 'My MA' and mv.mv_object_id is null and cs.connector_state = 0
CSExport – The command-line utility csexport.exe, found in the <ILM program>\bin folder, will allow you to dump connector space objects to an XML file.
Report directly from the data source – Once an object in the data source has been correctly identified you will ideally export a unique attribute out to it. It may then be possible to identify the non-joined objects as those which don’t possess this attribute.
Project to a different object type – ILM processes joins before projections, so it is fairly simple to project all non-joined objects to a different Metaverse object type – for example, one called “disconnectors”. This may help in reporting on an overall status direct from the Metaverse tables.
Note however that to make these objects once again available for joins they will have to be disconnected from the MVExtension provisioning code.
Advanced Tips and Tricks
Join to Multi-value Attribute
Sometimes you may need to join to a value in a multi-value attribute. An example is searching through all the proxyAddresses for a match against
a single email address.
When the multi-valued attribute exists in the connector space, and the single valued attribute is in the Metaverse, this is very easily accomplished. Just use an Advanced Join Rule to break the multi-valued attribute down into the values list used by the join rule.
Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin Select FlowRuleName Case "Join_proxyAddresses" Dim alias As String For Each alias In csentry("givenName").Values values.Add(alias) Next End Select End Sub
However, when the multi-valued attribute has already been imported into the Metaverse you will have a problem. While you can join on a multi-valued attribute, you have to join on the whole thing. There is no way that you can match one value out of a multi-valued Metaverse attribute against a single-valued connector space attribute.
Some possible options:
- Import the multi-valued attribute into a series of single-valued attributes in the Metaverse, eg., proxy1, proxy2, proxy3 …
- Use a different Metaverse object type to do the project and join the other way around.
Multiple Possibilities in the Connector Space
All the joining techniques so far have been based around a single connector space object, with one or more possible matches in the Metaverse. But what do you do if you want to work the other way around, where there are multiple possible matches in the connector space for a single Metaverse object, and you want to pick the best one? Unfortunately this is not straight-forward. There is no way to offer a selection of connector space objects in a join rule, as joins always work on a single connector space object at a time. You can judge the merits of the current CS object, but you can’t tell if there’s a better one coming up. One way around this is to do everything with CSV files.
- Use the ResolveJoinSearch to write possible joins to a CSV, but don’t actually join anything,
- Do your matching outside ILM, then
- Use a CSV with an Advanced Join Rule to make the joins.
Too much data
If you have an enormous number of accounts, and different data sources to trawl through, you may be best off doing your data matching outside of ILM, and then just using the CSV join rule above to make the joins. Check out the Fuzzy Lookup Transformation from the Enterprise version of SQL SSIS for help here.
- Complex join rules are a means to an end – not the end itself.
- Breadcrumbing is essential for automation.
- When matching on weak rules (eg., surname only) then verify the match another way.
- You can’t do it all in ILM. CSV, Excel and fuzzy lookup algorithms will also help, but an element of by-hand matching is inevitable.
- Get the matching and breadcrumbing sorted out before you start flowing and provisioning.This will make for a happier project, stake holders, users and YOU!
About the Author
Carol Wapshere is an ILM MVP and contributor of a few of these documents now. She has always believed that putting the work in up-front will save you lots of headaches in the long run. It’s a self-defense strategy really – there’s nothing worse than having to go over and over (and over) the same ground again and again, especially when you’d already moved on to something new and far more interesting. Keep it neat, and keep it simple, and things should work just fine.
Thanks to Markus Vilcinskas and Paul Loonen for their help with this document.