If you work in the computing industry, the chances are good that you've heard of LDAP by now. Wondering what all the excitement is about? Want to know a little more about the underlying technology? You've come to the right place. This introduction - the first in a series of articles describing how to design, implement, and integrate an LDAP environment at your company - will familiarize you with the concepts behind LDAP while leaving the really hardcore details for later. Here, we'll touch on the following topics:
But wait, you say. You're already using an Oracle, Sybase, Informix, or Microsoft SQL database to store much of that same data. How is LDAP different? What makes it better? Read on.
The Lightweight Directory Access Protocol, better known as LDAP, is based on the X.500 standard, but significantly simpler and more readily adapted to meet custom needs. Unlike X.500, LDAP supports TCP/IP, which is necessary for Internet access. The core LDAP specifications are all defined in RFCs -- a complete list of LDAP-related RFCs may be found at the LDAPman RFC page.
Using
"LDAP" in a sentence
In
everyday conversation, you'll hear well-intentioned people say things like,
"Should we be storing that in LDAP?" or "Just get that data from the LDAP
database," or "How do we go about tying LDAP into an RDB?" Strictly speaking,
though, LDAP isn't a database at all, but a protocol used to access information
stored in an information directory (also known as an LDAP directory). A
more precise formulation might look something like this: "Using LDAP, data
will be retrieved from (or stored in) the correct location within our information
directory." But you won't find me correcting anyone on this point: either
way, you get the idea across, and that's what counts.
Is
an LDAP information directory a database?
Just
as a Database Management System (DBMS) from Sybase, Oracle, Informix, or
Microsoft is used to process queries and updates to a relational database,
an LDAP server is used to process queries and updates to an LDAP information
directory. In other words, an LDAP information directory is a type
of database, but it's not a relational database. And unlike databases
that are designed for processing hundreds or thousands of changes per minute
- such as the Online Transaction Processing (OLTP) systems often used in
e-commerce - LDAP directories are heavily optimized for read performance.
The
advantages of LDAP directories
Now
that we've straightened that out, what are the advantages of LDAP directories?
The current popularity of LDAP is the culmination of a number of factors.
I'll give you a few basic reasons, provided you keep in mind that it's
just part of the story.
Perhaps the biggest plus for LDAP is that your company can access the LDAP directory from almost any computing platform, from any one of the increasing number of readily available, LDAP-aware applications. It's also easy to customize your company's internal applications to add LDAP support.
The LDAP protocol is both cross-platform and standards-based, so applications needn't worry about the type of server hosting the directory. In fact, LDAP is finding much wider industry acceptance because of its status as an Internet standard. Vendors are more willing to write LDAP integration into their products because they don't have to worry about what's at the other end. Your LDAP server could be any one of a number of open-source or commercial LDAP directory servers (or perhaps even a DBMS server with an LDAP interface), since interacting with any true LDAP server involves the same protocol, client connection package, and query commands. By contrast, vendors looking to integrate directly with a DBMS usually must tailor their product to work with each database server vendor individually.
Unlike many relational databases, you do not have to pay for either client connection software or for licensing.
Most LDAP servers are simple to install, easily maintained, and easily optimized.
LDAP servers can replicate either some or all of their data via push or pull methods, allowing you to push data to remote offices, to increase security, and so on. The replication technology is built-in and easy to configure. By contrast, many of the big DBMS vendors charge extra for this feature, and it's far more difficult to manage.
LDAP allows you to securely delegate read and modification authority based on your specific needs using ACIs (collectively, an ACL, or Access Control List). For example, your facilities group might be given access to change an employee's location, cube, or office number, but not be allowed to modify entries for any other fields. ACIs can control access depending on who is asking for the data, what data is being asked for, where the data is stored, and other aspects of the record being modified. This is all done through the LDAP directory directly, so you needn't worry about making security checks at the user application level.
LDAP is particularly useful for storing information that you wish to read from many locations, but update infrequently. For example, your company could store all of the following very efficiently in an LDAP directory:
Most LDAP servers are heavily optimized for read-intensive operations. Because of this, one can typically see an order of magnitude difference when reading data from an LDAP directory versus obtaining the same data from a relational database server optimized for OLTP. Because of this optimization, however, most LDAP directories are not well suited for storing data where changes are frequent. For instance, an LDAP directory server is great for storing your company's internal telephone directory, but don't even think of using it as a database back end for your high-volume e-commerce site.
If the answer to each of the following questions is Yes, then storing your data in LDAP is a good idea.
The structure of an LDAP directory tree
LDAP directory servers store their data hierarchically. If you've seen the top-down representations of DNS trees or UNIX file directories, an LDAP directory structure will be familiar ground. As with DNS host names, an LDAP directory record's Distinguished Name (DN for short) is read from the individual entry, backwards through the tree, up to the top level. More on this point later.
Why break things up into a hierarchy? There are a number of reasons. Here are a few possible scenarios:
o="FooBar, Inc.", c=US
(base
DN in X.500 format)
In
this example, o=FooBar, Inc. refers to the organization, which
in this context should be treated as synonymous with the company name.
c=US
indicates that the company headquarters is in the US. Once upon a time,
this was the preferred method of specifying your base DN. Times and fashions
change, though; these days, most companies are (or plan to be) on the Internet.
And what with Internet globalization, using a country code in the base
DN probably made things more confusing in the end. In time, the X.500 format
evolved into the other formats listed below.
o=foobar.com
(base
DN derived from the company's Internet presence)
This
format is fairly straightforward, using the company's Internet domain name
as the base. Once you get past the o= portion (which stands for
organization=),
everyone at your company should know where the rest came from. This was,
until recently, probably the most common of the currently used formats.
dc=foobar,
dc=com
(base
DN derived from the company's DNS domain components)
As
with the previous format, this uses the DNS domain name as its basis. But
where the other format leaves the domain name intact (and thus human-readable),
this format is split into domain components: foobar.com becomes
dc=foobar,
dc=com. In theory, this could be slightly more versatile, though it's
a little harder for end users to remember. By way of illustration, consider
foobar.com. When foobar.com merges with gizmo.com, you simply start thinking
of "dc=com" as the base DN. Place the new records into your existing directory
under dc=gizmo, dc=com, and you're ready to go. (Of course, this approach
doesn't help if foobar.com merges with wocket.edu.) This is the format
I'd recommend for any new installations. Oh, and if you're planning to
use Active Directory, Microsoft has already decided for you that this is
the format you wanted.
Time
to branch out: How to organize your data in your directory tree
In
a UNIX file system, the top level is the root. Beneath the root you have
numerous files and directories. As mentioned above, LDAP directories are
set up in much the same manner.
Underneath your directory's base, you'll want to create containers that logically separate your data. For historical (X.500) reasons, most LDAP directories set these logical separations up as OU entries. OU stands for "Organizational Unit," which in X.500 was used to indicate the functional organization within a company: sales, finance, et cetera. Current LDAP implementations have kept the ou= naming convention, but break things apart by broad categories like ou=people, ou=groups, ou=devices, and so on. Lower level OUs are sometimes used to break categories down further. For example, an LDAP directory tree (not including individual entries) might look like this:
dc=foobar, dc=com ou=customers ou=asia ou=europe ou=usa ou=employees ou=rooms ou=groups ou=assets-mgmt ou=nisgroups ou=recipesIndividual LDAP records
What's
in a name? The DN of an LDAP entry
All
entries stored in an LDAP directory have a unique "Distinguished Name,"
or DN. The DN for each LDAP entry is composed of two parts: the Relative
Distinguished Name (RDN) and the location within the LDAP directory where
the record resides.
The RDN is the portion of your DN that is not related to the directory tree structure. Most items that you'll store in an LDAP directory will have a name, and the name is frequently stored in the cn (Common Name) attribute. Since nearly everything has a name, most objects you'll store in LDAP will use their cn value as the basis for their RDN. If I'm storing a record for my favorite oatmeal recipe, I'll be using cn=Oatmeal Deluxe as the RDN of my entry.
cn=Oatmeal Deluxe,ou=recipes,dc=foobar,dc=com
People
are always more trouble than inanimate objects
Now
it's time to tackle the DN of a company employee. For user accounts, you'll
typically see a DN based either on the cn or on the uid
(User ID). For example, the DN for FooBar's employee Fran Smith (login
name: fsmith) might look like either of these two formats:
uid=fsmith,ou=employees,dc=foobar,dc=com
(login-based)
LDAP
(and X.500) use uid to mean "User ID", not to be confused with
the UNIX uid number. Most companies try to give everyone a unique login
name, so this approach makes good sense for storing information about employees.
You don't have to worry about what you'll do when you hire the next Fran
Smith, and if Fran changes her name (marriage? divorce? religious experience?),
you won't have to change the DN of the LDAP entry.
cn=Fran
Smith,ou=employees,dc=foobar,dc=com
(name-based)
Here
we see the Common Name (CN) entry used. In the case of an LDAP record for
a person, think of the common name as their full name. One can easily see
the downside to this approach: if the name changes, the LDAP record has
to "move" from one DN to another. As indicated above, you want to avoid
changing the DN of an entry whenever possible.
Customizing your directory's object classes
You can use LDAP to store data on almost any type of object, as long as that object can be described in terms of various attributes. Here are a few examples of information you might store:
LDAP directories store all information for a given record's entries as a series of attribute pairs, each one consisting of an attribute type and an attribute value. (This is completely different from the way relational database servers store data, in columns and rows.) Consider this portion of my recipe record, as stored in an LDAP directory:
dn: cn=Oatmeal Deluxe, ou=recipes, dc=foobar, dc=com cn: Instant Oatmeal Deluxe recipeCuisine: breakfast recipeIngredient: 1 packet instant oatmeal recipeIngredient: 1 cup water recipeIngredient: 1 pinch salt recipeIngredient: 1 tsp brown sugar recipeIngredient: 1/4 apple, any typeNote that in this case, each ingredient is listed as a value of attribute type recipeIngredient. LDAP directories are designed to store multiple values of a single type in this fashion, rather than storing the entire list in a single database field with some sort of delimiter to distinguish the individual values.
Because the data is stored in this way, the shape of the database can be completely fluid - you don't need to recreate a database table (and all its indexes) to start tracking a new piece of data. Even more important, LDAP directories use no memory or storage to handle "empty" fields - in fact, having unused optional fields costs you nothing at all.
An example of an individual LDAP entry
Let's look at an example. We'll use the LDAP record of Fran Smith, our friendly employee from Foobar, Inc. The format of this entry is LDIF, the format used when exporting and importing LDAP directory entries.
dn: uid=fsmith, ou=employees, dc=foobar, dc=com objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson objectclass: foobarPerson uid: fsmith givenname: Fran sn: Smith cn: Fran Smith cn: Frances Smith telephonenumber: 510-555-1234 roomnumber: 122G o: Foobar, Inc. mailRoutingAddress: fsmith@foobar.com mailhost: mail.foobar.com userpassword: {crypt}3x1231v76T89N uidnumber: 1234 gidnumber: 1200 homedirectory: /home/fsmith loginshell: /usr/local/bin/bashTo start with, attribute values are stored with case intact, but searches against them are case-insensitive by default. Certain attributes (like password) are case-sensitive when searching.
Let's break this entry down and look at it piece by piece.
dn: uid=fsmith, ou=employees, dc=foobar, dc=comThis is the full DN of Fran's LDAP entry, including the whole path to the entry in the directory tree. LDAP (and X.500) use uid to mean "User ID," not to be confused with the UNIX uid number.
objectclass: person objectclass: organizationalPerson objectclass: inetOrgPerson objectclass: foobarPersonOne can assign as many object classes as are applicable to any given type of object. The person object class requires that the cn (common name) and sn (surname) fields have values. Object Class person also allows other optional fields, including givenname, telephonenumber, and so on. The object class organizationalPerson adds more options to the values from person, and inetOrgPerson adds still more options to that (including email information). Finally, foobarPerson is Foobar's customized object class that adds all the custom attributes they wish to track at their company.
uid: fsmith givenname: Fran sn: Smith cn: Fran Smith cn: Frances Smith telephonenumber: 510-555-1234 roomnumber: 122G o: Foobar, Inc.As mentioned before, uid stands for User ID. Just translate it in your head to "login" whenever you see it.
Note that there are multiple entries for the CN. As mentioned above, LDAP allows some attributes to have multiple values, with the number of values being arbitrary. When would you want this? Let's say you're searching the company LDAP directory for Fran's phone number. While you might know her as Fran (having heard her spill her guts over lunchtime margaritas on more than one occasion), the people in HR may refer to her (somewhat more formally) as Frances. Because both versions of her name are stored, either search will successfully look up Fran's telephone number, email, cube number, and so on.
mailRoutingAddress: fsmith@foobar.com mailhost: mail.foobar.comLike most companies on the Internet, Foobar uses Sendmail for internal mail delivery and routing. Foobar stores all users' mail routing information in LDAP, which is fully supported by recent versions of Sendmail.
userpassword: {crypt}3x1231v76T89N uidnumber: 1234 gidnumber: 1200 gecos: Frances Smith homedirectory: /home/fsmith loginshell: /usr/local/bin/bashNote that Foobar's systems administrators store all the NIS password map information in LDAP as well. At Foobar, the foobarPerson object class adds this capability. Note that the user password is stored in UNIX crypt format. The UNIX uid is stored here as uidnumber. Mind you, there's a whole RFC on storing NIS information in LDAP. I'll talk about NIS integration in a future article.
LDAP servers can be set to replicate some or all of their data, on a push or a pull basis, using simple authentication or certificate-based authentication.
For example, Foobar has a "public" LDAP server running on ldap.foobar.com, port 389. This server is used by Netscape Communicator's pinpoint email addressing feature, the "ph" command from UNIX, and other locations where a user would want to query for the phone number of an employee or customer contact. The company's master LDAP server is running on the same system, but on port 1389 instead.
You wouldn't necessarily want employees searching the directory to query against asset management or recipe data, nor would it be desirable to see IT accounts (like "root") showing up on the company directory. To accomodate these unpleasant realities, Foobar replicates selected directory subtrees from its master LDAP server to its "public" server. The replication excludes subtrees containing data they wish to hide. To keep things current at all times, the master directory server is set to do immediate push-based synchronization. Note that this approach is designed for convenience, not security: the idea is to allow power users to simply query the other LDAP port if they want to search all available data.
Let's say Foobar is managing its customer contact information via LDAP, over a low bandwidth connection between Oakland and Europe. They might set up replication from ldap.foobar.com:1389 to munich-ldap.foobar.com:389 as follows:
periodic pull: ou=asia,ou=customers,o=sendmail.com periodic pull: ou=us,ou=customers,o=sendmail.com immediate push: ou=europe,ou=customers,o=sendmail.comThe pull connections would keep things in sync every 15 minutes, which would probably be just fine in this scenario. The push connection would guarantee that any change made to the European contact information would be pushed out to Munich immediately.
Given this replication scheme, where would users connect to access their data? Users in Munich could simply connect to their local server. If they were making changes to the data, the local LDAP server would refer those changes to the master LDAP server, which would then push all the changes back to the local LDAP server to keep it in sync. This is of tremendous benefit to the local user: all their LDAP queries (mostly reads) are against their local server, which is substantially faster. When it's time to make a change to their information, end users needn't worry about reconfiguring their client software, because the LDAP directory servers handle the data exchange for them.
LDAP provides for a complex level of access control instances, or ACIs. Because the access can be controlled on the server side, it's much more secure than security methods that work by securing data through client software.
With LDAP ACIs, you can do things like:
That's it for now. I hope you've found this article useful. If you have comments or questions, send email to donnelly@ldapman.org.
28 april 2000