Using Integer Document IDs in RavenDB Indexes

At work, we recently moved our database from MongoDB to RavenDB. In this context, we set up a couple of indexes for all frequent queries to optimize reading documents. We also adjusted quite a few data access methods in our application to query the indexes instead of directly loading documents by their ID.

The Issue: Indexing Troubles

When we wrote the aforementioned indexes, we ran into a problem with integer document IDs. All of our entity POCOs use an ID property of type int. The document IDs are standard RavenDB document names when using integer IDs: They're composed of the POCO class name and the value of the ID property, thus making them human-readable (e.g. comments/1337).

Before we start, let me give you a quick overview over one of the indexes we're using in our application.

Our Scenario: Indexing Comments by Topic

In our application, we have a pretty simple Comment class, which looks as follows. Note that it actually has a couple more properties, which I omitted here for the sake of brevity.

public class Comment
{
    public int ID { get; set; }
    public int TopicID { get; set; }
    public string Author { get; set; }
    public string Text { get; set; }
}

The TopicID holds information about the topic that was commented. Since our application requires comments to be queried by topic, we created an index which, well, indexes the TopicID property:

public class Comments_ByTopic
    : AbstractIndexCreationTask<Comment, Comments_ByTopic.QueryResult>
{
    public class QueryResult
    {
        public int ID { get; set; }
        public int TopicID { get; set; }
        public string Author { get; set; }
        public string Text { get; set; }

        // More properties (omitted)
    }

    public Comments_ByTopic()
    {
        Map = comments => from comment in comments
                          select new QueryResult
                          {
                              ID = comment.ID,
                              Author = comment.Author,
                              TopicID = comment.TopicID,
                              Text = comment.Text.Value,

                              // More stuff happening here
                              // (loading documents, ...)
                          };

        Index(x => x.TopicID, FieldIndexing.NotAnalyzed);

        StoreAllFields(FieldStorage.Yes);
    }
}

Actually, our index does a little more than shown here. We don't store the author as a string, for example, but instead an ID referencing the corresponding user document. The index then makes use of RavenDB's LoadDocument<T> feature to pull in the author document for each comment. However, I left out this part since this post isn't about LoadDocument<T>.

With the index defined as above, there was one problem, though: The ID property was never part of the indexed fields and, consequently, was always 0 when queried. So, what do you do? Let's have a look at the workaround that solved the issue for us.

Our Solution: Two ID properties

We tried different things to make the index work correctly. In the end, we created a separate DocumentID property of type string and told RavenDB to treat it as the document ID:

var documentStore = new DocumentStore
{
    ConnectionStringName = "RavenDB",
    Conventions =
    {
        FindIdentityProperty = prop => prop.Name == "DocumentID"
    }
};

This DocumentID property contains the full (!) ID of each document, e.g. comments/1337. Because we're only interested in the 1337 part — which is the actual integer ID we deal with in our application — we split the string when indexing the documents:

Map = comments => from comment in comments
                  select new QueryResult
                  {
                      ID = int.Parse(comment.DocumentID.ToString().Split('/')[1]),
                      Author = comment.Author,
                      TopicID = comment.TopicID,
                      Text = comment.Text.Value,

                      // More stuff happening here (loading documents, ...)
                  };

The Map expression as listed above made the index work for us. We also wrote a little wrapper around the integer ID property because we didn't want to change our codebase to use strings as document IDs:

[JsonIgnore]
private string _documentID { get; set; }

public string DocumentID
{
    get { return _documentID; }
    set
    {
        _documentID = value;
        ID = int.Parse(value.Split('/')[1]);
    }
}

[JsonIgnore]
public int ID { get; private set; }

While the solution may seem a little hacky, it works smoothly. Please note that if you're using this DocumentID property, the corresponding documents' names all have to follow the <collectionName>/<ID> pattern.

Use the coupon code LAUNCHDAY for $10 off!

Learn React

1 Comment

Johannes

To me this does not look like a workaround but as something that you are supposed to do. In an SQL Database the Issue would be the same. Not all entities need an integer primary key, especially not when you plan to access it from the outside and use it for something else than indexing. What makes a row identifyable to a person should be different from what the database uses for its performance optimizations. The conceptual overlap is what produces problems. I think this is a sensible solution. The question then remains: Is RavenDB using a "sensible default" here? Should you care about the db internals when defining document classes? Should an ID be human Readable? Although I hate working with UUIDs they kind of make sense because they make passing them to humans feel weird. You wouldn't use "/blog/article/f89f6a5b-5b5b-456a-88cc-295ef87c3895 " whereas "/blog/article/1234" is not too bad. I believe the answer is - as often: separation of concerns.