Ross Coded Classes

Programming tips with no rose colored glasses.

Persist or Lose

TL;DR: I suggest that web services should not lose properties of the documents they expose. This can’t be a totally ridiculous idea since it is how Google Contacts API works.

I have seen several trends in software development as I have attended CodeMash conferences over the last decade. One of the things that I hear about a lot is that people are excited to use dynamic languages like JavaScript everywhere they can. Another growing trend is the use of document databases or NoSQL.

Both dynamic languages and document databases expose easily extensible objects. If you need to save additional properties on an object it is completely painless. It may be just a few objects that need the additional property or it may be that going forward all objects of that class/type will be getting the additional properties.

Deserialization Exclusion

I have read a lot of code for web services that have a JSON or XML document API yet immediately transform the document into a strongly typed object that loses any additional properties that the document had. Take the following Item class — it will lose the color property if the JSON document is serialized into it and then the JSON is discarded.

JSON Document
{
     "id": "ABC",
     "name": "Tree",
     "color": "Green"
}

class ItemDTO
{ 
     public String id { get; set;} 
     public String name {get; set;}
}

Keep a Flexible Document

I propose that instead of deserializing into strongly typed objects that instead the data object remains as a flexible document. A “flexible document” could at the worst case be a string representation of the entire JSON document. A flexible document could also be a parsed object — in JavaScript is simply is an object, in Java it might be a GSON JsonObject and in C# it may be a Newtonsoft JObject.

What about my relational database?

If you are using a relational database to store the objects/documents then this may seem like an impossibility. With a little thought I am sure you can find a solution that doesn’t take excessive amounts of code or add excessive complication. A few examples:

  • Option A: One option could be to store the whole document in one field (as JSON or even string if you have to) and pull out properties of the document that you need in the columns (thus the document representation and the column would both have the value). These will stay in sync because you are only writing to the database from one place — right?  In some databases you could add constraints (CHECK or trigger or calculated column) to ensure that the JSON and the columns are in sync.
  • Option B: Another option is to remove the properties from the document/JSON that will be stored in columns. This can take extra time and even more code.
  • Option C: Only allow extensibility underneath a “customData” property of the document.
  • Option X: I’m sure you can think of another way to accomplish what is needed…

What about my nice backend models?

Rather than a lossy deserialization, you could probably use a “wrapper” to provide a nicer interface over your flexible document. It appears that Google Contacts offers many client-side libraries that do this. In one commercial application I worked on, the desktop application data was always passed around as a document (in the form of a DataSet) and we had wrappers or view-models where appropriate for access, but we only had a couple of pieces of code that would every remove columns that we didn’t expect to be there.

Perhaps JsonPath (supported by Json.net) or something similar may make it easy enough to use as a JSON representation without a wrapper.

Writer’s Contract

To make these “extra” properties work, there has to be a contract about not losing properties. Otherwise as soon as you turn around someone will deserialize the data into their own custom class and lose half of the data. For example, Google Contacts API has this specified:

Note: To ensure forward compatibility, be sure that when you PUT an updated entry you preserve all the XML that was present when you retrieved the entry from the server. Otherwise the ignored elements will be deleted. The Google Data API client libraries all handle this correctly, so if you’re using one of the libraries you’re all set.

If you wanted to try to verify if your API consumers are properly preserving additional properties, you could (always, or below production, or for 1% of items etc) return an additional property such as “cnf_e3a639c2”:null where the name of the property is for example a hashcode generated from the item’s id. As long as when to return (and when to expect) this “conformance” property is deterministic, then you can verify that the property is present on a PUT. This is only needed if you suspect that any of your consumers that are able to update items may be breaking the contract and erasing other consumers’ properties.

Be aware that if your consumers are also storing the documents in their own database, having a unique property name on each document could cause unexpected performance issues for some NoSQL databases.

Fewer version changes required

Since everyone must not lose additional properties with this extensible model, existing consumers won’t have to be updated every time new official properties are added to the API (as long as those new properties are not required and do not have rules based on other things). A complete discussion of document versioning and API versioning is a separate topic, but perhaps additional consumers that are adding their own properties may also end up adding their own version marker for their set of properties.

Evolving API

Since consumers of the API are free to add additional properties that makes sense for them (with some consideration to the name of the properties) the objects may evolve over time. The persisted objects can be queried to learn about what additional properties are getting stored on them, and as properties start to become used with higher frequency, perhaps they can be promoted to “official” properties and documented. If properties start appearing that seem suspect, then a discussion can be started with the consumers that are adding them.

Already Happening

I will leave you with this thought. If your API is being consumed by a web application (especially a SPA) then this idea of not losing properties is likely already happening since that application is likely using JavaScript, has retrieved a JSON object from a GET request to your web service, made modifications to that object and then PUT it back to your web service. If you introduced a new property today, then it likely is already be coming back to you on saves.

Diagnosing Excessive SQL Server Table Growth

I was asked for help on a problem that can be simplified to:

A SQL Server is exhibiting excessive disk space usage. Much of that space is used by one table and appears to be unused space within the table. It appears to be growth in the LOB_DATA. In this table one column named body is stored as nvarchar(max). The number of rows in the table is not varying much over time.

 

To assist with diagnosing this I asked some questions starting with the simplest explanations I could think of. 

Question 0: What is the edition and version of the SQL Server instance and what is the database compatibility level? We can use this information to look for known issues.

Question 1: What is the “large value types out of row” setting for the table? You should be able to query something like select large_value_types_out_of_row from sys.tables WHERE 

 

My primary bet is that code is repeatedly writing to the database. That by itself may not be a huge problem, but combined with snapshot isolation being enabled, SQL Server may be keeping ghost records around for transactions that are in progress aiding in the growth.

 

Question 2: Do we know if there are repeated writes to the table (i.e. instead of just one insert for each row that exists, there may be multiple updates or even deletes and inserts replacing the rows)?  

Question 3: If we haven’t confirmed that there are repeated writes, have we proved there are not repeated writes? We might do this by adding a trigger to the table to record at least some minimal data about inserts, updates and deletes.

Question 4: If records are repeatedly being updated can that be prevented or reduced? Examples I have seen in the past include trimming the whitespace from columns but not in all places, thus that data being synced with the database is different during a comparison process but becomes the same during the save process.

 

If Questions 2, 3 or 4 indicate repeated writing of the same values, an INSTEAD OF trigger that skips the UPDATE if all the fields match could be a temporary workaround to prevent the excessive file growth.

 

Question 5: What percentage of rows have data_length(body) > 7850? A short-term workaround might be to make some views and/or triggers such that we limit the length of the body to fit within rows by changing the actual storage of body to nvarchar(4000) or whatever fits and truncate the rest. If there isn’t any (or much) data exceeding what could be stored in a nvarchar(4000) field then perhaps a temporary workaround is changing the column definition from nvarchar(max) to nvarchar(4000) and the application code may not even need to be changed at all for that schema change.

Question 6: Has rebuilding the table by copying it to a new table been tried? Much less likely, but I have seen a problem before that due to a bug somewhere around 2000-2005 SQL Server had extra space in the table that could not be reclaimed. We had to copy the data into a new table and delete the old one after updating to a version of SQL Server that had the fix to prevent it from happening again.  

Resolution

I believe that the resolution for this problem may have been a reduction in the number of updates to the rows of the table combined with a drastic reduction of how many transactions were rolled back after performing updates to the table.

Chromecast for multiple wall displays/dashboards

I did some research on using Chromecasts for dashboard TVs.

A custom Chromecast app can be made in HTML5 but that seems like too much hassle. A simpler solution is to just cast from a computer. Casting a Chrome browser tab works, but doesn’t seem to be the best image quality. A higher quality image is obtained by casting a whole 1920×1080 monitor.

Initial Setup

  1. For the best image quality with Chromecasts, add a 1920×1080 display to the computer.
  2. For each Chromecast, make a new user in the Chrome browser named the same as the Chromecast if the user doesn’t already exist.

Start Casting

For each Chromecast:

  1. Open a new Chrome window as a different “chrome user”
  2. Begin casting the whole 1920×1080 monitor (… menu, Cast…, click “Cast to”, choose “Cast Desktop”, Choose right Chromecast, Choose monitor, click Share)
  3. Optional: Go to http://bigtextbox.com/ and type in the Chromecast name so you can find the window again
  4. Optional: Minimize the Chrome window.

Fix Casting

Find the right chrome window, and follow the same steps above.

Observations

  • It seems to send somewhere around or under 200 KB/s per Chromecast to stream a 1920×1080 static display (no animations or motion).
  • 2nd generation Chromecast seems to have a better picture quality than 1st generation.
  • The sending computer will use some CPU power for each Chromecast that will depend on the computer used and what is being cast.
  • A dedicated monitor is not really needed on the sending computer and instead a fake HDMI monitor could be used, but the lag will drive you crazy setting it up by using mouse and keyboard while watching a Chromecast.
  • Instead of dedicated physical monitor and computer, a virtual machine can be used on a computer connected to the same WiFi as the Chromecasts. Then the whole virtual machine can be minimized etc.

Doing work in the middle of a ReactiveX chain

I struggled for a while figuring out how to do some work in the middle of a chain of Observables.

I needed to make four HTTP requests. The second and third can be done at the same time and I need the combined results for the fourth call, but the application needed the results of the third call.

Observable<User> source = sessionService
        // First HTTP request
        .createSessionRx(username, password)
        
        // Second and third HTTP request
        // When they both are successful emit one observable
        .flatMap(new Func1<AuthorizationToken, Observable<User>>()
        {
            @Override
            public Observable<User> call(final AuthorizationToken token)
            {
                // Result from first request
                String authorization = token.AuthorizationToken;

                // Second and third requests
                final Observable<Session> session = 
                    sessionService.getSessionRx(authorization);
                final Observable<SessionBindings> bindings = 
                    sessionService.getSessionBindingsRx(authorization);
                
                // Combine those together like the teeth of a zipper
                return Observable.zip(
                        // second result
                        session, 
                        // third result
                        bindings,
                        // combining funciton
                        new Func2<Session, SessionBindings, User>()
                        {
                            @Override
                            public User call(Session t1, SessionBindings t2)
                            {
                                return new User(username, session, bindings);
                            }
                        });
            }
        })

        // Now we need to do a little work and perform a fourth HTTP request
        // but the subscription wants the zipped result.
        .flatMap(new Func1<User, Observable<User>>()
        {
            @Override
            public Observable<User> call(final User user)
            {
                doTheWorkRequiredBeforeFourthCall(user);

                // This was the confusing part - I need to return the "prior" 
                // result after another successful HTTP request and finally figured 
                // out you can map the fourth resonse to the zipped response.
                return sessionService
                        .fourthRequestRx(user.EmailAddress)
                        .map(new Func1<ResponseBody, User>()
                        {
                            @Override
                            public User call(final ResponseBody responseBody)
                            {
                                // We didn't care what the response from the 
                                // fourth request was, as long as it succeeded.

                                // Everything else wants the user emitted earlier.
                                return user;
                            }
                        });
            }
        });

Subscription subscription = source
        .subscribeOn(Schedulers.io())
        .observeOn(AndroidSchedulers.mainThread())
        .subscribe(new Observer<User>()
        {
            @Override
            public void onCompleted()
            {
                Log.e("RX", "Completed");
            }

            @Override
            public void onError(final Throwable e)
            {
                Log.e("RX", "Failed");
            }

            @Override
            public void onNext(final User user)
            {
                Log.e("RX", "Success");
            }
        });