The Wayback Machine - https://web.archive.org/web/20110720084243/http://blog.abodit.com:80/category/programming/mongodb

MongoDB

Class-free persistence and multiple inheritance in C# with MongoDB

Much as I appreciate Object Relational Mappers and the C# type system there’s a lot of work to do if you just want create and persist a few objects. MongoDB alleviates a lot of that work with its Bson serialization code that converts almost any object into a binary serialized object notation and provides easy round tripping with JSON.

But there’s no getting around the limitations of C# when it comes to multiple inheritance. You can use interfaces to get most of the benefits of multiple inheritance but implementing a tangled set of classes with multiple interfaces on them can lead to a lot of duplicate code.

What if there was a way to do multiple inheritance without every having to write a class? What if we could simply declare a few interfaces and then ask for an object that implements all of them and a way to persist it to disk and get it back? What if we could later take one of those objects and add another interface to it? “Crazy talk” I hear you say!

Well, maybe not so crazy … take a look at the open source project impromptu-interface and you’ll see some of what you’ll need to make this reality. It can take a .NET dynamic object and turn it into an object that implements a specific interface.

Combine that with a simple MongoDB document store and some cunning logic to link the two together and voila, we have persistent objects that can implement any interface dynamically and there’s absolutely no classes in sight anywhere!

Let’s take a look at it in use and then I’ll explain how it works. First, let’s define a few interfaces:

    public interface ILegs
    {
        int Legs { get; set; }
    }

    public interface IMammal
    {
        double BodyTemperatureCelcius { get; set; }
    }

    // Interfaces can use multiple inheritance:

    public interface IHuman: IMammal, ILegs
    {
        string Name { get; set; }
    }

    // We can have interfaces that apply to specific instances of a class: not all humans are carnivores

    public interface ICarnivore
    {
        string Prey { get; set; }
    }

Now let’s take a look at some code to create a few of these new dynamic documents and treat them as implementors of those interfaces. First we need a MongoDB connection:

            MongoServer MongoServer = MongoServer.Create(ConnectionString);
            MongoDatabase mongoDatabase = MongoServer.GetDatabase("Remember", credentials);

Next we grab a collection where we will persist our objects.

            var sampleCollection = mongoDatabase.GetCollection<SimpleDocument>("Sample");

Now we can create some objects adding interfaces to them dynamically and we get to use those strongly typed interfaces to set properties on them.

            var person1 = new SimpleDocument();
            person1.AddLike<IHuman>().Name = "John";
            person1.AddLike<ILegs>().Legs = 2;
            person1.AddLike<ICarniovore>().Prey = "Cattle";
            sampleCollection.Save(person1);

            var monkey1 = new SimpleDocument();
            monkey1.AddLike<IMammal>();                 // mark as a mammal
            monkey1.AddLike<ILegs>().Legs = 2;
            monkey1.AddLike<ICarniovore>().Prey = "Bugs";
            sampleCollection.Save(monkey1);

Yes, that’s it! That’s all we needed to do to create persisted objects that implement any collection of interfaces. Note how the IHuman is also an IMammal because our code will also support inheritance amongst interfaces. We can load them back in from MongoDB and get the strongly typed versions of them by using .AsLike() which returns a value of type T or null if that object doesn’t implement the interface T. But that’s not all, we can even add new interfaces to them later allowing an object to change type over time! Now, of course, you could do a lot of this just with dynamic types but then you lose Intellisense and compile time checking.

So next, let’s take a look at how we can query for objects that support a given interface and how we can get strongly typed objects back from MongoDB:

            var query = Query.EQ("int", typeof(IHuman).Name);
            var humans = sampleCollection.Find(query);

            Console.WriteLine("Examine the raw documents");

            foreach (var doc in humans)
            {
                Console.WriteLine(doc.ToJson());
            }

            Console.WriteLine("Use query results strongly typed");

            foreach (IHuman human in humans.Select(m => m.AsLike<IHuman>()))
            {
                Console.WriteLine(human.Name);
            }

            Console.ReadKey();

So how does this ‘magic’ work? First we need a simple Document class. It can be any old object class, no special requirements. At the moment it does wrap these interface properties up in a document inside it called ‘prop’ making it just a little bit harder to query and index but still fairly easy.

        /// <summary>
        /// A very simple document object
        /// </summary>
        public class SimpleDocument : DynamicObject
        {
            public ObjectId Id { get; set; }

            // All other properties are added dynamically and stored wrapped in another Document
            [BsonElement("prop")]
            protected BsonDocument properties = new BsonDocument();

            /// <summary>
            /// Interfaces that have been added to this object
            /// </summary>
            [BsonElement("int")]
            protected HashSet<string> interfaces = new HashSet<string>();

            /// <summary>
            /// Add support for an interface to this document if it doesn't already have it
            /// </summary>
            public T AddLike<T>()
                where T:class
            {
                interfaces.Add(typeof(T).Name);
                foreach (var @interface in typeof(T).GetInterfaces())
                    interfaces.Add(@interface.Name);
                return Impromptu.ActLike<T>(new Proxy(this.properties));
            }

            /// <summary>
            /// Cast this object to an interface only if it has previously been created as one of that kind
            /// </summary>
            public T AsLike<T>()
                where T : class
            {
                if (!this.interfaces.Contains(typeof(T).Name)) return null;
                else return Impromptu.ActLike<T>(new Proxy(this.properties));
            }

        }

Then we need a simple proxy object to wrap up the properties as a dynamic object that we can feed to Impromptu:

        public class Proxy : DynamicObject
        {
            public BsonDocument document { get; set; }

            public Proxy(BsonDocument document)
            {
                this.document = document;
            }
            public override bool TryGetMember(GetMemberBinder binder, out object result)
            {
                BsonValue res = null;
                this.document.TryGetValue(binder.Name, out res);
                result = res.RawValue;
                return true;            // We always support a member even if we don't have it in the dictionary
            }

            /// <summary>
            /// Set a property (e.g. person1.Name = "Smith")
            /// </summary>
            public override bool TrySetMember(SetMemberBinder binder, object value)
            {
                this.document.Add(binder.Name, BsonValue.Create(value));
                return true;
            }
        }

And that’s it! There is no other code required. Multiple-inheritance and code-free persistent objects are now a reality! All you need to do is design some interfaces and objects spring magically to life and get persisted easily.

[NOTE: This is experimental code: it's a prototype of an idea that's been bugging me for some time as I look at how to meld Semantic Web classes which have multiple inheritance relationships with C# classes (that don't) and with MongoDB's document-centric storage format. Does everything really have to be stored in a triple-store or is there some hybrid where objects can be stored with their properties and triple-store statements can be reserved for more complex relationships? Can we get semantic web objects back as meaningful C# objects with strongly typed properties on them? It's an interesting challenge and this approach appears to have some merit as a way to solve it.]

MongoDB – Map-Reduce coming from C#

People coming from traditional relational database thinking and LINQ sometimes struggle to understand map-reduce. One way to understand it is to realize that it’s actually the simple composition of some LINQ operators with which you may already be familiar.

Map reduce is in effect a SelectMany() followed by a GroupBy() followed by an Aggregate() operation.

In a SelectMany() you are projecting a sequence but each element can become multiple elements. This is equivalent to using multiple emit statements in your map operation. The map operation can also chose not to call emit which is like having a Where() clause inside your SelectMany() operation.

In a GroupBy() you are collecting elements with the same key which is what Map-Reduce does with the key value that you emit from the map operation.

In the Aggregate() or reduce step you are taking the collections associated with each group key and combining them in some way to produce one result for each key. Often this combination is simply adding up a single ’1′ value output with each key from the map step but sometimes it’s more complicated.

One thing you should be aware of with map-reduce in MongoDB is that the reduce operation must accept and output the same data type because it may be applied repeatedly to partial sets of the grouped data. In C# your Aggregate() operation would be applied repeatedly on partial sequences to get to the final sequence.

Custom Serialization for MongoDB – Hashset with IBsonSerializable

The official C# driver for MongoDB does a great job serializing most objects without any extra work. Arrays, Lists and Hashsets all round trip nicely and share a common representation in the actual database as a simple array. This is great if you ever change the C# code from one collection type to another – there’s no migration work to do on the database – you can write a List and retrieve a Hashset and everything just works.

But there are cases where everything doesn’t work and one of these is a Hashset with a custom comparer. The MongoDB driver will instantiate a regular Hashset rather than one with the custom comparer when it materializes objects from the database.

Fortunately MongoDB provides several ways to override the default Bson serialization. Unfortunately the documentation doesn’t include an example showing how to do it. So here’s an example using the IBsonSerializable option. In this example I show a custom Hashset with a custom comparer to test for equality. It still serializes to an array in MongoDB but on deserialization it instantiates the correct Hashset with the custom comparer in place.

/// <summary>
/// A HashSet with a specific comparer that prevents duplicate Entity Ids
/// </summary>
public class EntityHashSet : HashSet<Entity>, IBsonSerializable
{
    private class EntityComparer : IEqualityComparer<Entity>
    {
        public bool Equals(Entity x, Entity y) { return x.Id.Equals(y.Id); }
        public int GetHashCode(Entity obj) { return obj.Id.GetHashCode(); }
    }

    public EntityHashSet()
        : base(new EntityComparer())
    {
    }

    public EntityHashSet(IEnumerable<Entity> values)
        : base (values, new EntityComparer())
    {
    }

    public void Serialize(MongoDB.Bson.IO.BsonWriter bsonWriter, Type nominalType, bool serializeIdFirst)
    {
        if (nominalType != typeof(EntityHashSet)) throw new ArgumentException("Cannot serialize anything but self");
        ArraySerializer<Entity> ser = new ArraySerializer<Entity>();
        ser.Serialize(bsonWriter, typeof(Entity[]), this.ToArray(), serializeIdFirst);
    }

    public object Deserialize(MongoDB.Bson.IO.BsonReader bsonReader, Type nominalType)
    {
        if (nominalType != typeof(EntityHashSet)) throw new ArgumentException("Cannot deserialize anything but self");
        ArraySerializer<Entity> ser = new ArraySerializer<Entity>();
        return new EntityHashSet((Entity[])ser.Deserialize(bsonReader, typeof(Entity[])));
    }

    public bool GetDocumentId(out object id, out IIdGenerator idGenerator)
    {
        id = null;
        idGenerator = null;
        return false;
    }

    public void SetDocumentId(object id)
    {
        return;
    }
}

A Semantic Web ontology / triple Store built on MongoDB

In a previous blog post I discussed building a Semantic Triple Store using SQL Server. That approach works fine but I’m struck by how many joins are needed to get any results from the data and as I look to storing much larger ontologies containing billions of triples there are many potential scalability issues with this approach. So over the past few evenings I decided to try a different approach and so I created a semantic store based on MongoDB. In the MongoDB version of my semantic store I take a different approach to storing the basic building blocks of semantic knowledge representation. For starters I decided that typical ABox and TBox knowledge has really quite different storage requirements and that smashing all the complex TBox assertions into simple triples and stringing them together with meta fields only to immediately join then back up whenever needed just seemed like a bad idea from the NOSQL / document-database perspective.

TBox/ABox: In the ABox you typically find simple triples of the form X-predicate-Y. These store simple assertions about individuals and classes. In the TBox you typically find complex sequents, that’s to say complex logic statements having a head (or consequent) and a body (or antecedents). The head is ‘entailed’ by the body, which means that if you can satisfy all of the body statements then the head is true. In a traditional store all the ABox assertions can be represented as triples and all the complex TBox assertions use quads with a meta field that is used solely to rebuild the sequent with a head and a body. The ABox/TBox distinction is however arbitrary (see http://www.semanticoverflow.com/questions/1107/why-is-it-necessary-to-split-reasoning-into-t-box-and-a-box).

I also decided that I wanted to be use ObjectIds as the primary way of referring to any Entity in the store. Using the full Uri for every Entity is of course possible and MongoDB couuld have used that as the index but I wanted to make this efficient and easily shardable across multiple MongoDB servers. The MongoDB ObjectID is ideal for that purpose and will make queries and indexing more efficient.

The first step then was to create a collection that would hold Entities and would permit the mapping from Uri to ObjectId. That was easy: an Entity type inheriting from a Resource type produces a simple document like the one shown below. An index on Uri with a unique condition ensures that it’s easy to look up any Entity by Uri and that there can only ever be one mapping to an Id for any Uri.

RESOURCES COLLECTION - SAMPLE DOCUMENT

{
  "_id": "4d243af69b1f26166cb7606b",
  "_t": "Entity",
  "Uri": "http://www.w3.org/1999/02/22-rdf-syntax-ns#first"
}

Although I should use a proper Uri for every Entity I also decided to allow arbitrary strings to be used here so if you are building a simple ontology that never needs to go beyond the bounds of this one system you can forgo namespaces and http:// prefixes and just put a string there, e.g. “SELLS”. Since every Entity reference is immediately mapped to an Id and that Id is used throughout the rest of the system it really doesn’t matter much.

The next step was to represent simple ABox assertions. Rather than storing each assertion as its own document I created a document that could hold several assertions all related to the same subject. Of course, if there are too many assertions you’ll still need to split them up into separate documents but that’s easy to do. This move was mainly a convenience for developing the system as it makes it easy to look at all the assertions made concerning a single Entity using MongoVue or the Mongo command line interface but I’m hoping it will also help performance as typical access patterns need to bring in all of the statements concerning a given Entity.

Where a statement requires a literal the literal is stored directly in the document and since literals don’t have Uris there is no entry in the resources collection.

To make searches for statements easy and fast I added an array field “SPO” which stores the set of all Ids mentioned anywhere in any of the statements in the document. This array is indexed in MongoDB using the array indexing feature which makes it very efficient to find and fetch every document that mentions a particular Entity. If the Entity only ever appears in the subject position in statements that search will result in possibly just one document coming back which contains all of the assertions about that Entity. For example:

STATEMENTGROUPS COLLECTION - SAMPLE DOCUMENT

{
  "_id": "4d243af99b1f26166cb760c6",
  "SPO": [
    "4d243af69b1f26166cb7606f",
    "4d243af69b1f26166cb76079",
    "4d243af69b1f26166cb7607c"
  ],
  "Statements": [
    {
      "_id": "4d243af99b1f26166cb760c5",
      "Subject": {
        "_t": "Entity",
        "_id": "4d243af69b1f26166cb7606f",
        "Uri": "GROCERYSTORE"
      },
      "Predicate": {
        "_t": "Entity",
        "_id": "4d243af69b1f26166cb7607c",
        "Uri": "SELLS"
      },
      "Object": {
        "_t": "Entity",
        "_id": "4d243af69b1f26166cb76079",
        "Uri": "DAIRY"
      }
    }
	... more statements here ...
  ]
}

The third and final collection I created is used to store TBox sequents consisting of a head (consequent) and a body (antecedents). Once again I added an array which indexes all of the Entities mentioned anywhere in any of the statements used in the sequent. Below that I have an array of Antecedent statements and then a single Consequent statement. Although the statements don’t really need the full serialized version of an Entity (all they need is the _id) I include the Uri and type for each Entity for now. Variables also have Id values but unlike Entities, variables are not stored in the Resources collection, they exist only in the Rule collection as part of consequent statements. Variables have no meaning outside a consequent unless they are bound to some other value.

RULE COLLECTION - SAMPLE DOCUMENT

{
  "_id": "4d243af99b1f26166cb76102",
  "References": [
    "4d243af69b1f26166cb7607d",
    "4d243af99b1f26166cb760f8",
    "4d243af99b1f26166cb760fa",
    "4d243af99b1f26166cb760fc",
    "4d243af99b1f26166cb760fe"
  ],
  "Antecedents": [
    {
      "_id": "4d243af99b1f26166cb760ff",
      "Subject": {
        "_t": "Variable",
        "_id": "4d243af99b1f26166cb760f8",
        "Uri": "V3-Subclass8"
      },
      "Predicate": {
        "_t": "Entity",
        "_id": "4d243af69b1f26166cb7607d",
        "Uri": "rdfs:subClassOf"
      },
      "Object": {
        "_t": "Variable",
        "_id": "4d243af99b1f26166cb760fa",
        "Uri": "V3-Class9"
      }
    },
    {
      "_id": "4d243af99b1f26166cb76100",
      "Subject": {
        "_t": "Variable",
        "_id": "4d243af99b1f26166cb760fa",
        "Uri": "V3-Class9"
      },
      "Predicate": {
        "_t": "Variable",
        "_id": "4d243af99b1f26166cb760fc",
        "Uri": "V3-Predicate10"
      },
      "Object": {
        "_t": "Variable",
        "_id": "4d243af99b1f26166cb760fe",
        "Uri": "V3-Something11"
      }
    }
  ],
  "Consequent": {
    "_id": "4d243af99b1f26166cb76101",
    "Subject": {
      "_t": "Variable",
      "_id": "4d243af99b1f26166cb760f8",
      "Uri": "V3-Subclass8"
    },
    "Predicate": {
      "_t": "Variable",
      "_id": "4d243af99b1f26166cb760fc",
      "Uri": "V3-Predicate10"
    },
    "Object": {
      "_t": "Variable",
      "_id": "4d243af99b1f26166cb760fe",
      "Uri": "V3-Something11"
    }
  }
}

That is essentially the whole semantic store. I connected it up to a reasoner and have successfully run a few test cases against it. Next time I get a chance to experiment with this technology I plan to try loading a larger ontology and will rework the reasoner so that it can work directly against the database instead of taking in-memory copies of most queries that it performs.

At this point this is JUST AN EXPERIMENT but hopefully someone will find this blog entry useful. I hope later to connect this up to the home automation system so that it can begin reasoning across an ontology of the house and a set of ABox assertions about its current and past state.

Since I’m still relatively new to the semantic web I’d welcome feedback on this approach to storing ontologies in NOSQL databases from any experienced semanticists.

MongoDB C# Driver – arrays, lists and hashsets

Here’s a nice feature of the C# MongoDB driver: when you save .NET arrays, lists or Hashsets (essentially an IEnumerable<T>) to MongoDB you can retrieve it as any other IEnumerable<T>. This means you can migrate your business objects between these different representations without having to migrate anything in your database. It also means that any other language can access the same MongoDB database without needing to know anything about .NET data types.

For example, the following will all serialize to the same BSon data and any can be retrieved.

        public class Test1
        {
            [BsonId]
            public ObjectId Id { get; set; }
            public List<string> array { get; set; }
        }

        public class Test2
        {
            [BsonId]
            public ObjectId Id { get; set; }
            public string[] array { get; set; }
        }

        public class Test3
        {
            [BsonId]
            public ObjectId Id { get; set; }
            public HashSet<string> array { get; set; }
        }