An Introduction to LDAP

If you work in the computing industry, the chances are good that you've heard of LDAP by now. Wondering what all the excitement is about? Want to know a little more about the underlying technology? You've come to the right place. This introduction - the first in a series of articles describing how to design, implement, and integrate an LDAP environment at your company - will familiarize you with the concepts behind LDAP while leaving the really hardcore details for later. Here, we'll touch on the following topics:

What is LDAP, anyway?
When should you use LDAP to store your data?
The structure of an LDAP directory tree
Individual LDAP records
Customizing your directory's object classes
An example of an individual LDAP entry
LDAP replication
Security and access control

To start with, what's happening with LDAP today is exciting. A company-wide LDAP implementation can enable almost any application, running on almost any computer platform, to obtain information from your LDAP directory. And that directory can be used to store a broad range of data: email address and mail routing information, HR data, public security keys, contact lists, and much more. By making an LDAP directory a focal point in your systems integration, you're providing one-stop shopping whenever people go looking for information within your company - even if the primary source of the data lives elsewhere.

But wait, you say. You're already using an Oracle, Sybase, Informix, or Microsoft SQL database to store much of that same data. How is LDAP different? What makes it better? Read on.

What is LDAP, anyway?

The Lightweight Directory Access Protocol, better known as LDAP, is based on the X.500 standard, but significantly simpler and more readily adapted to meet custom needs. Unlike X.500, LDAP supports TCP/IP, which is necessary for Internet access. The core LDAP specifications are all defined in RFCs -- a complete list of LDAP-related RFCs may be found at the LDAPman RFC page.

Using "LDAP" in a sentence
In everyday conversation, you'll hear well-intentioned people say things like, "Should we be storing that in LDAP?" or "Just get that data from the LDAP database," or "How do we go about tying LDAP into an RDB?" Strictly speaking, though, LDAP isn't a database at all, but a protocol used to access information stored in an information directory (also known as an LDAP directory). A more precise formulation might look something like this: "Using LDAP, data will be retrieved from (or stored in) the correct location within our information directory." But you won't find me correcting anyone on this point: either way, you get the idea across, and that's what counts.

Is an LDAP information directory a database?
Just as a Database Management System (DBMS) from Sybase, Oracle, Informix, or Microsoft is used to process queries and updates to a relational database, an LDAP server is used to process queries and updates to an LDAP information directory. In other words, an LDAP information directory is a type of database, but it's not a relational database. And unlike databases that are designed for processing hundreds or thousands of changes per minute - such as the Online Transaction Processing (OLTP) systems often used in e-commerce - LDAP directories are heavily optimized for read performance.

The advantages of LDAP directories
Now that we've straightened that out, what are the advantages of LDAP directories? The current popularity of LDAP is the culmination of a number of factors. I'll give you a few basic reasons, provided you keep in mind that it's just part of the story.

Perhaps the biggest plus for LDAP is that your company can access the LDAP directory from almost any computing platform, from any one of the increasing number of readily available, LDAP-aware applications. It's also easy to customize your company's internal applications to add LDAP support.

The LDAP protocol is both cross-platform and standards-based, so applications needn't worry about the type of server hosting the directory. In fact, LDAP is finding much wider industry acceptance because of its status as an Internet standard. Vendors are more willing to write LDAP integration into their products because they don't have to worry about what's at the other end. Your LDAP server could be any one of a number of open-source or commercial LDAP directory servers (or perhaps even a DBMS server with an LDAP interface), since interacting with any true LDAP server involves the same protocol, client connection package, and query commands. By contrast, vendors looking to integrate directly with a DBMS usually must tailor their product to work with each database server vendor individually.

Unlike many relational databases, you do not have to pay for either client connection software or for licensing.

Most LDAP servers are simple to install, easily maintained, and easily optimized.

LDAP servers can replicate either some or all of their data via push or pull methods, allowing you to push data to remote offices, to increase security, and so on. The replication technology is built-in and easy to configure. By contrast, many of the big DBMS vendors charge extra for this feature, and it's far more difficult to manage.

LDAP allows you to securely delegate read and modification authority based on your specific needs using ACIs (collectively, an ACL, or Access Control List). For example, your facilities group might be given access to change an employee's location, cube, or office number, but not be allowed to modify entries for any other fields. ACIs can control access depending on who is asking for the data, what data is being asked for, where the data is stored, and other aspects of the record being modified. This is all done through the LDAP directory directly, so you needn't worry about making security checks at the user application level.

LDAP is particularly useful for storing information that you wish to read from many locations, but update infrequently. For example, your company could store all of the following very efficiently in an LDAP directory:

The company employee phone book and organizational chart
External customer contact information
Infrastructure services information, including NIS maps, email aliases, and so on
Configuration information for distributed software packages
Public certificates and security keys

When should you use LDAP to store your data?

Most LDAP servers are heavily optimized for read-intensive operations. Because of this, one can typically see an order of magnitude difference when reading data from an LDAP directory versus obtaining the same data from a relational database server optimized for OLTP. Because of this optimization, however, most LDAP directories are not well suited for storing data where changes are frequent. For instance, an LDAP directory server is great for storing your company's internal telephone directory, but don't even think of using it as a database back end for your high-volume e-commerce site.

If the answer to each of the following questions is Yes, then storing your data in LDAP is a good idea.

Would you like your data to be available cross-platform?
Do you need to access this data from a number of computers or applications?
Do the individual records you're storing change a few times a day or less, on average?
Does it make sense to store this type of data in a flat database instead of a relational database? That is, could you effectively store all the data for a given item in a single record?

This final question often gives people pause, because it's very common to access a flat record to obtain data that's relational in nature. For example, a record for a company employee might include the login name of that employee's manager. It's fine to use LDAP to store this kind of information. Rule of thumb: If you can imagine storing your data in a large electronic Rolodex, you can store it easily in an LDAP directory.

The structure of an LDAP directory tree

LDAP directory servers store their data hierarchically. If you've seen the top-down representations of DNS trees or UNIX file directories, an LDAP directory structure will be familiar ground. As with DNS host names, an LDAP directory record's Distinguished Name (DN for short) is read from the individual entry, backwards through the tree, up to the top level. More on this point later.

Why break things up into a hierarchy? There are a number of reasons. Here are a few possible scenarios:

You may wish to push all your US-based customer contact information to an LDAP server in the Seattle office (which is devoted to sales), whereas you probably don't need to push the company's asset management information there.
You may wish to grant permissions to a group of individuals based on the directory structure. In the example listed below, the company's asset management team might need full access to the asset-mgmt section, but not to other areas.
Combined with replication, you can tailor the layout of your directory structure to minimize WAN bandwidth utilization. Your sales office in Seattle might need up-to-the minute updates for US sales contacts, but only hourly updates for European sales information.

Getting to the root of the matter: Your base DN and you
The top level of the LDAP directory tree is the base, referred to as the "base DN." A base DN usually takes one of the three forms listed here. Let's assume I work at a US electronic commerce company called FooBar, Inc., which is on the Internet at foobar.com.

o="FooBar, Inc.", c=US
(base DN in X.500 format)
In this example, o=FooBar, Inc. refers to the organization, which in this context should be treated as synonymous with the company name. c=US indicates that the company headquarters is in the US. Once upon a time, this was the preferred method of specifying your base DN. Times and fashions change, though; these days, most companies are (or plan to be) on the Internet. And what with Internet globalization, using a country code in the base DN probably made things more confusing in the end. In time, the X.500 format evolved into the other formats listed below.

o=foobar.com
(base DN derived from the company's Internet presence)
This format is fairly straightforward, using the company's Internet domain name as the base. Once you get past the o= portion (which stands for organization=), everyone at your company should know where the rest came from. This was, until recently, probably the most common of the currently used formats.

dc=foobar, dc=com
(base DN derived from the company's DNS domain components)
As with the previous format, this uses the DNS domain name as its basis. But where the other format leaves the domain name intact (and thus human-readable), this format is split into domain components: foobar.com becomes dc=foobar, dc=com. In theory, this could be slightly more versatile, though it's a little harder for end users to remember. By way of illustration, consider foobar.com. When foobar.com merges with gizmo.com, you simply start thinking of "dc=com" as the base DN. Place the new records into your existing directory under dc=gizmo, dc=com, and you're ready to go. (Of course, this approach doesn't help if foobar.com merges with wocket.edu.) This is the format I'd recommend for any new installations. Oh, and if you're planning to use Active Directory, Microsoft has already decided for you that this is the format you wanted.

Time to branch out: How to organize your data in your directory tree
In a UNIX file system, the top level is the root. Beneath the root you have numerous files and directories. As mentioned above, LDAP directories are set up in much the same manner.

Underneath your directory's base, you'll want to create containers that logically separate your data. For historical (X.500) reasons, most LDAP directories set these logical separations up as OU entries. OU stands for "Organizational Unit," which in X.500 was used to indicate the functional organization within a company: sales, finance, et cetera. Current LDAP implementations have kept the ou= naming convention, but break things apart by broad categories like ou=people, ou=groups, ou=devices, and so on. Lower level OUs are sometimes used to break categories down further. For example, an LDAP directory tree (not including individual entries) might look like this:

    dc=foobar, dc=com 
        ou=customers 
            ou=asia 
            ou=europe 
            ou=usa 
        ou=employees 
        ou=rooms 
        ou=groups 
        ou=assets-mgmt 
        ou=nisgroups 
        ou=recipes

Individual LDAP records

What's in a name? The DN of an LDAP entry
All entries stored in an LDAP directory have a unique "Distinguished Name," or DN. The DN for each LDAP entry is composed of two parts: the Relative Distinguished Name (RDN) and the location within the LDAP directory where the record resides.

The RDN is the portion of your DN that is not related to the directory tree structure. Most items that you'll store in an LDAP directory will have a name, and the name is frequently stored in the cn (Common Name) attribute. Since nearly everything has a name, most objects you'll store in LDAP will use their cn value as the basis for their RDN. If I'm storing a record for my favorite oatmeal recipe, I'll be using cn=Oatmeal Deluxe as the RDN of my entry.

My directory's base DN is dc=foobar,dc=com
I'm storing all the LDAP records for my recipes in ou=recipes
The RDN of my LDAP record is cn=Oatmeal Deluxe

Given all this, what's the full DN of the LDAP record for this oatmeal recipe? Remember, it reads backwards - just like a host name in DNS.

cn=Oatmeal Deluxe,ou=recipes,dc=foobar,dc=com

People are always more trouble than inanimate objects
Now it's time to tackle the DN of a company employee. For user accounts, you'll typically see a DN based either on the cn or on the uid (User ID). For example, the DN for FooBar's employee Fran Smith (login name: fsmith) might look like either of these two formats:

uid=fsmith,ou=employees,dc=foobar,dc=com
(login-based)
LDAP (and X.500) use uid to mean "User ID", not to be confused with the UNIX uid number. Most companies try to give everyone a unique login name, so this approach makes good sense for storing information about employees. You don't have to worry about what you'll do when you hire the next Fran Smith, and if Fran changes her name (marriage? divorce? religious experience?), you won't have to change the DN of the LDAP entry.

cn=Fran Smith,ou=employees,dc=foobar,dc=com
(name-based)
Here we see the Common Name (CN) entry used. In the case of an LDAP record for a person, think of the common name as their full name. One can easily see the downside to this approach: if the name changes, the LDAP record has to "move" from one DN to another. As indicated above, you want to avoid changing the DN of an entry whenever possible.

Customizing your directory's object classes

You can use LDAP to store data on almost any type of object, as long as that object can be described in terms of various attributes. Here are a few examples of information you might store:

Employees: What's the employee's full name, login name, password, employee number, manager's login, mail server?
Asset tracking: What's the computer name, IP address, asset tag, make and model information, physical location?
Customer contact lists: What's the customer's company name? The primary contact's phone, fax, and email information?
Meeting room information: What's the room name, location, seating capacity, telephone number? Is there wheelchair access? Is there an overhead projector?
Recipe information: Give the name of the dish, the list of ingredients, the type of cuisine, and instructions for preparing it.

Because your LDAP directory can be customized to store any type of text or binary data, what you store is really up to you. LDAP directories use the concept of object classes to define which attributes are allowed for objects of any given type. In almost every LDAP implementation, you'll want to extend the basic functionality of your LDAP directory to meet your specific needs, either by creating new object classes or by extending existing ones.

LDAP directories store all information for a given record's entries as a series of attribute pairs, each one consisting of an attribute type and an attribute value. (This is completely different from the way relational database servers store data, in columns and rows.) Consider this portion of my recipe record, as stored in an LDAP directory:

  dn: cn=Oatmeal Deluxe, ou=recipes, dc=foobar, dc=com 
  cn: Instant Oatmeal Deluxe 
  recipeCuisine: breakfast 
  recipeIngredient: 1 packet instant oatmeal 
  recipeIngredient: 1 cup water 
  recipeIngredient: 1 pinch salt 
  recipeIngredient: 1 tsp brown sugar 
  recipeIngredient: 1/4 apple, any type

Note that in this case, each ingredient is listed as a value of attribute type recipeIngredient. LDAP directories are designed to store multiple values of a single type in this fashion, rather than storing the entire list in a single database field with some sort of delimiter to distinguish the individual values.

Because the data is stored in this way, the shape of the database can be completely fluid - you don't need to recreate a database table (and all its indexes) to start tracking a new piece of data. Even more important, LDAP directories use no memory or storage to handle "empty" fields - in fact, having unused optional fields costs you nothing at all.

An example of an individual LDAP entry

Let's look at an example. We'll use the LDAP record of Fran Smith, our friendly employee from Foobar, Inc. The format of this entry is LDIF, the format used when exporting and importing LDAP directory entries.

  dn: uid=fsmith, ou=employees, dc=foobar, dc=com
  objectclass: person
  objectclass: organizationalPerson
  objectclass: inetOrgPerson
  objectclass: foobarPerson
  uid: fsmith
  givenname: Fran
  sn: Smith
  cn: Fran Smith
  cn: Frances Smith
  telephonenumber: 510-555-1234
  roomnumber: 122G
  o: Foobar, Inc.
  mailRoutingAddress: fsmith@foobar.com
  mailhost: mail.foobar.com
  userpassword: {crypt}3x1231v76T89N
  uidnumber: 1234
  gidnumber: 1200
  homedirectory: /home/fsmith
  loginshell: /usr/local/bin/bash

To start with, attribute values are stored with case intact, but searches against them are case-insensitive by default. Certain attributes (like password) are case-sensitive when searching.

Let's break this entry down and look at it piece by piece.

  dn: uid=fsmith, ou=employees, dc=foobar, dc=com

This is the full DN of Fran's LDAP entry, including the whole path to the entry in the directory tree. LDAP (and X.500) use uid to mean "User ID," not to be confused with the UNIX uid number.

  objectclass: person 
  objectclass: organizationalPerson 
  objectclass: inetOrgPerson 
  objectclass: foobarPerson

One can assign as many object classes as are applicable to any given type of object. The person object class requires that the cn (common name) and sn (surname) fields have values. Object Class person also allows other optional fields, including givenname, telephonenumber, and so on. The object class organizationalPerson adds more options to the values from person, and inetOrgPerson adds still more options to that (including email information). Finally, foobarPerson is Foobar's customized object class that adds all the custom attributes they wish to track at their company.

  uid: fsmith 
  givenname: Fran 
  sn: Smith 
  cn: Fran Smith 
  cn: Frances Smith 
  telephonenumber: 510-555-1234 
  roomnumber: 122G 
  o: Foobar, Inc.

As mentioned before, uid stands for User ID. Just translate it in your head to "login" whenever you see it.

Note that there are multiple entries for the CN. As mentioned above, LDAP allows some attributes to have multiple values, with the number of values being arbitrary. When would you want this? Let's say you're searching the company LDAP directory for Fran's phone number. While you might know her as Fran (having heard her spill her guts over lunchtime margaritas on more than one occasion), the people in HR may refer to her (somewhat more formally) as Frances. Because both versions of her name are stored, either search will successfully look up Fran's telephone number, email, cube number, and so on.

  mailRoutingAddress: fsmith@foobar.com 
  mailhost: mail.foobar.com

Like most companies on the Internet, Foobar uses Sendmail for internal mail delivery and routing. Foobar stores all users' mail routing information in LDAP, which is fully supported by recent versions of Sendmail.

  userpassword: {crypt}3x1231v76T89N 
  uidnumber: 1234 
  gidnumber: 1200 
  gecos: Frances Smith 
  homedirectory: /home/fsmith 
  loginshell: /usr/local/bin/bash

Note that Foobar's systems administrators store all the NIS password map information in LDAP as well. At Foobar, the foobarPerson object class adds this capability. Note that the user password is stored in UNIX crypt format. The UNIX uid is stored here as uidnumber. Mind you, there's a whole RFC on storing NIS information in LDAP. I'll talk about NIS integration in a future article.

LDAP replication

LDAP servers can be set to replicate some or all of their data, on a push or a pull basis, using simple authentication or certificate-based authentication.

For example, Foobar has a "public" LDAP server running on ldap.foobar.com, port 389. This server is used by Netscape Communicator's pinpoint email addressing feature, the "ph" command from UNIX, and other locations where a user would want to query for the phone number of an employee or customer contact. The company's master LDAP server is running on the same system, but on port 1389 instead.

You wouldn't necessarily want employees searching the directory to query against asset management or recipe data, nor would it be desirable to see IT accounts (like "root") showing up on the company directory. To accomodate these unpleasant realities, Foobar replicates selected directory subtrees from its master LDAP server to its "public" server. The replication excludes subtrees containing data they wish to hide. To keep things current at all times, the master directory server is set to do immediate push-based synchronization. Note that this approach is designed for convenience, not security: the idea is to allow power users to simply query the other LDAP port if they want to search all available data.

Let's say Foobar is managing its customer contact information via LDAP, over a low bandwidth connection between Oakland and Europe. They might set up replication from ldap.foobar.com:1389 to munich-ldap.foobar.com:389 as follows:

  periodic pull: ou=asia,ou=customers,o=sendmail.com
  periodic pull: ou=us,ou=customers,o=sendmail.com
  immediate push: ou=europe,ou=customers,o=sendmail.com

The pull connections would keep things in sync every 15 minutes, which would probably be just fine in this scenario. The push connection would guarantee that any change made to the European contact information would be pushed out to Munich immediately.

Given this replication scheme, where would users connect to access their data? Users in Munich could simply connect to their local server. If they were making changes to the data, the local LDAP server would refer those changes to the master LDAP server, which would then push all the changes back to the local LDAP server to keep it in sync. This is of tremendous benefit to the local user: all their LDAP queries (mostly reads) are against their local server, which is substantially faster. When it's time to make a change to their information, end users needn't worry about reconfiguring their client software, because the LDAP directory servers handle the data exchange for them.

Security and access control

LDAP provides for a complex level of access control instances, or ACIs. Because the access can be controlled on the server side, it's much more secure than security methods that work by securing data through client software.

With LDAP ACIs, you can do things like:

Grant users the ability to change their home phone number and home address, while restricting them to read-only access for other data types (such as job title or manager's login).
Grant anyone in the group "HR-admins" the ability to modify any user's information for the following fields: manager, job title, employee ID number, department name, and department number. There would be no write permission to other fields.
Deny read access to anyone attempting to query LDAP for a user's password, while still allowing a user to change his or her own password.
Grant managers read-only permission for the home phone numbers of their direct reports, while denying this privilege to anyone else.
Grant anyone in the group "host-admins" to create, delete, and edit all aspects of host information stored in LDAP.
Via a Web page, allow people in "foobar-sales" to selectively grant or deny themselves read access to subsets of the customer contact database. This would, in turn, allow these individuals to download the customer contact information to their local laptops or to a PDA. (This will be most useful if your sales force automation tool is LDAP-aware.)
Via a Web page, allow any group owner to add or remove any entries from groups they own. For example, this would allow sales managers to grant or remove access for salespeople to modify Web pages. This would allow owners of mail aliases to add and remove users without having to contact IT. Mailing lists designated as "public" could allow users to add or remove themselves (but only themselves) to or from those mail aliases. Restrictions can also be based on IP address or hostname. For example, fields can be made readable only if the user's IP address begins with 192.168.200.*, or if the user's reverse DNS hostname maps to *.foobar.com.

This will give you an idea what's possible using access control with LDAP directories, but be aware that a correct implementation requires much more information than is given here. I'll discuss access control in greater detail in a future article.

That's it for now. I hope you've found this article useful. If you have comments or questions, send email to donnelly@ldapman.org.

28 april 2000

http://www.ldapman.org/articles/intro_to_ldap.html