Shared User Identity, Plan A

Posted Thursday, February 17 2011 by jonathan

I have a project I’m working on that involves a group of federated web sites that share functionality provided from a “hub” web server. The hub maintains a centralized user database with various profile data attached to it and each web site accesses and updates a subset of that profile data. It is necessary to ensure the user’s singular identity is correctly tracked across all the web sites.

Normally this would be an easy problem — I’d run an OpenID service on the hub and use it for authentication across the web sites. There are two problems:

  1. The sites don’t use authentication. Everything is tracked with durable session cookies, and
  2. The sites don’t share domain names so cookies don’t pass between sites or hub.

For the purposes of this exercise we’re assuming a few things:

  • The web sites are “trusted” by the Hub, meaning that if a web site passes a user identifier to the hub API the hub believes that the identifier has been verified.
  • Each web site shares a distinct shared secret with the hub, and each site and the hub have a public-private keypair for signing data.
  • The hub API is accessed by web sites over SSL and has strong authentication and authorization.

Plan A

  1. Client browser A connects to web site B.
  2. If A presents a session cookie, B looks up session and attempts to retrieve A’s user identifier U.
  3. If U is found in B’s session store S, B fetches U’s profile from hub C via secure API and, upon success, provides the requested web page to A.

In the event U is not found or C does not return a valid profile (new visitor, deleted cookies, ancient and expired user data, etc.):

  1. B redirects A to a public web page on C with a cryptographically signed return URL.
  2. A connects to C, C validates signature against the public key assigned to the returl URL’s domain.
  3. C checks for a session cookie from A, retrieves user identifier V from C’s session store T, and verifies that V’s profile exists.
  4. If not successful, C generates a new user identifier V and creates a profile and session record in T.
  5. C returns a redirect to B’s provided return URL along with a signed copy of V and a cookie that contains C’s session token.
  6. A connects to B; B detects and validates V, creates a session for A in S, retrieves V’s profile from C, assigns V to the value of U in S and provides the requested web page along with a cookie that contains B’s session token.

This scheme has a few advantages:

  • User identities are assigned by the hub,
  • All web sites rely on the hub and the hub’s relationship with the web browser to track persistent identity, and
  • User identities are confirmed and validated on the very first page load.

There are always problems. Especially with anything dealing with authentication.

Identifier Leakage

In this case I’m not sure it matters, but I always try to avoid exposing the internal user identifier to the end user. We don’t, as a general rule, put the primary key from our user database table into a cookie. We generate a random session code for the cookie and associate that code with the user id in a secured cache on the server.

The Site to Hub Redirect

In the above handshakes, passing the return URL in cleartext is safe because that URL is public knowledge.
The request is informational — the web site is not passing any user-supplied data to the hub. The hub will only respond with a cookie and signed user identifier if the return URL is correctly signed and returns to the domain of the signer.

I haven’t encrypted the return URL for two reasons:

  1. It’s not necessary to hide it from the client, as it’s known data.
  2. Encrypting it gives the client a known plaintext attack, potentially exposing the private key or shared secret for the web server or the hub.

To limit replay attacks, we could add a unique id to the parameters signed by the web site and have the hub verify that id has never been used. I don’t see replaying the request to be problematic though, because the response can only return the user to a trusted site.

The Hub to Site Redirect

This one is a bit trickier. We’ll assume for the moment that the hub properly identifies the browser and is not susceptible to hijacking attacks of various types.

The redirect from hub back to web site sets a durable cookie with the hub’s session identifier for the browser. The next time the browser hits the hub we’ll be able to locate the correct profile.

The redirect also passes the signed user identifier back to the web site. The web site can verify the signature, so we don’t worry about spoofing attacks here. However, we’re leaking the user identifier which I’d prefer was kept private. I can think of two solutions for this:

  1. Encrypt the identifier with the web site’s shared secret and include the encrypted value in the redirect, or
  2. Provide a signed one-time random identifier in the redirect instead of the user identifier and have the web site access a hub API to retrieve the user identifier associated with that random id when processing the redirected client request.

My initial instinct is to go with the one-time code and have the web site fetch the user id from the hub after the client redirects back to the web site. This means we don’t have to ever expose the user identifier, even in encrypted form. Since the web server will be accessing the hub API to fetch the user profile anyway, we can modify the API to accept a one-time id and we get the same API performance.

The down side is that I’m not sure I trust triangles. Triangles imply saved state and extra gears. It’s not a lot of work but the hub will now need to issue, track, and revoke one-time-use identifiers associated with the requesting web site and the user identifier.

I’ll be pondering this until next time, when we’ll attempt to get rid of the ugly redirect.

Your Thoughts?