Persist or Lose

TL;DR: I suggest that web services should not lose properties of the documents they expose. This can’t be a totally ridiculous idea since it is how Google Contacts API works.

I have seen several trends in software development as I have attended CodeMash conferences over the last decade. One of the things that I hear about a lot is that people are excited to use dynamic languages like JavaScript everywhere they can. Another growing trend is the use of document databases or NoSQL.

Both dynamic languages and document databases expose easily extensible objects. If you need to save additional properties on an object it is completely painless. It may be just a few objects that need the additional property or it may be that going forward all objects of that class/type will be getting the additional properties.

Deserialization Exclusion

I have read a lot of code for web services that have a JSON or XML document API yet immediately transform the document into a strongly typed object that loses any additional properties that the document had. Take the following Item class — it will lose the color property if the JSON document is serialized into it and then the JSON is discarded.

JSON Document
{
     "id": "ABC",
     "name": "Tree",
     "color": "Green"
}

class ItemDTO
{ 
     public String id { get; set;} 
     public String name {get; set;}
}

Keep a Flexible Document

I propose that instead of deserializing into strongly typed objects that rather the data object remains as a flexible document. A “flexible document” could at the worst case be a string representation of the entire JSON document. A flexible document could also be a parsed object — in JavaScript is simply is an object, in Java it might be a GSON JsonObject and in C# it may be a Newtonsoft JObject.

What about my relational database?

If you are using a relational database to store the objects/documents then this may seem like an impossibility. With a little thought I am sure you can find a solution that doesn’t take excessive amounts of code or add excessive complication. A few examples:

Option A: One option could be to store the whole document in one field (as JSON or even string if you have to) and pull out properties of the document that you need in the columns (thus the document representation and the column would both have the value). These will stay in sync because you are only writing to the database from one place — right? In some databases you could add constraints (CHECK or trigger or calculated column) to ensure that the JSON and the columns are in sync.
Option B: Another option is to remove the properties from the document/JSON that will be stored in columns. This can take extra time and even more code.
Option C: Only allow extensibility underneath a “customData” property of the document.
Option X: I’m sure you can think of another way to accomplish what is needed…

What about my nice backend models?

Rather than a lossy deserialization, you could probably use a “wrapper” to provide a nicer interface over your flexible document. It appears that Google Contacts offers many client-side libraries that do this. In one commercial application I worked on, the desktop application data was always passed around as a document (in the form of a DataSet) and we had wrappers or view-models where appropriate for access, but we only had a couple of pieces of code that would ever remove columns that we didn’t expect to be there.

Perhaps JsonPath (supported by Json.net) or something similar may make it easy enough to use as a JSON representation without a wrapper.

Writer’s Contract

To make these “extra” properties work, there has to be a contract about not losing properties. Otherwise as soon as you turn around someone will deserialize the data into their own custom class and lose half of the data. For example, Google Contacts API has this specified:

Note: To ensure forward compatibility, be sure that when you PUT an updated entry you preserve all the XML that was present when you retrieved the entry from the server. Otherwise the ignored elements will be deleted. The Google Data API client libraries all handle this correctly, so if you’re using one of the libraries you’re all set.

If you wanted to try to verify if your API consumers are properly preserving additional properties, you could (always, or below production, or for 1% of items etc) return an additional property such as “cnf_e3a639c2”:null where the name of the property is for example a hashcode generated from the item’s id. As long as when to return (and when to expect) this “conformance” property is deterministic, then you can verify that the property is present on a PUT. This is only needed if you suspect that any of your consumers that are able to update items may be breaking the contract and erasing other consumers’ properties.

Be aware that if your consumers are also storing the documents in their own database, having a unique property name on each document could cause unexpected performance issues for some NoSQL databases.

Fewer version changes required

Since everyone must not lose additional properties with this extensible model, existing consumers won’t have to be updated every time new official properties are added to the API (as long as those new properties are not required and do not have rules based on other things). A complete discussion of document versioning and API versioning is a separate topic, but perhaps additional consumers that are adding their own properties may also end up adding their own version marker for their set of properties.

Evolving API

Since consumers of the API are free to add additional properties that makes sense for them (with some consideration to the name of the properties) the objects may evolve over time. The persisted objects can be queried to learn about what additional properties are getting stored on them, and as properties start to become used with higher frequency, perhaps they can be promoted to “official” properties and documented. If properties start appearing that seem suspect, then a discussion can be started with the consumers that are adding them.

Already Happening

I will leave you with this thought. If your API is being consumed by a web application (especially a SPA) then this idea of not losing properties is likely already happening since that application is likely using JavaScript, has retrieved a JSON object from a GET request to your web service, made modifications to that object and then PUT it back to your web service. If you introduced a new property today, then it likely is already be coming back to you on saves.

Ross Coded Classes

Programming tips with no rose colored glasses.