Ross Coded Classes

Programming tips with no rose colored glasses.

Persist or Lose

TL;DR: I suggest that web services should not lose properties of the documents they expose. This can’t be a totally ridiculous idea since it is how Google Contacts API works.

I have seen several trends in software development as I have attended CodeMash conferences over the last decade. One of the things that I hear about a lot is that people are excited to use dynamic languages like JavaScript everywhere they can. Another growing trend is the use of document databases or NoSQL.

Both dynamic languages and document databases expose easily extensible objects. If you need to save additional properties on an object it is completely painless. It may be just a few objects that need the additional property or it may be that going forward all objects of that class/type will be getting the additional properties.

Deserialization Exclusion

I have read a lot of code for web services that have a JSON or XML document API yet immediately transform the document into a strongly typed object that loses any additional properties that the document had. Take the following Item class — it will lose the color property if the JSON document is serialized into it and then the JSON is discarded.

JSON Document
{
     "id": "ABC",
     "name": "Tree",
     "color": "Green"
}

class ItemDTO
{ 
     public String id { get; set;} 
     public String name {get; set;}
}

Keep a Flexible Document

I propose that instead of deserializing into strongly typed objects that rather the data object remains as a flexible document. A “flexible document” could at the worst case be a string representation of the entire JSON document. A flexible document could also be a parsed object — in JavaScript is simply is an object, in Java it might be a GSON JsonObject and in C# it may be a Newtonsoft JObject.

What about my relational database?

If you are using a relational database to store the objects/documents then this may seem like an impossibility. With a little thought I am sure you can find a solution that doesn’t take excessive amounts of code or add excessive complication. A few examples:

  • Option A: One option could be to store the whole document in one field (as JSON or even string if you have to) and pull out properties of the document that you need in the columns (thus the document representation and the column would both have the value). These will stay in sync because you are only writing to the database from one place — right?  In some databases you could add constraints (CHECK or trigger or calculated column) to ensure that the JSON and the columns are in sync.
  • Option B: Another option is to remove the properties from the document/JSON that will be stored in columns. This can take extra time and even more code.
  • Option C: Only allow extensibility underneath a “customData” property of the document.
  • Option X: I’m sure you can think of another way to accomplish what is needed…

What about my nice backend models?

Rather than a lossy deserialization, you could probably use a “wrapper” to provide a nicer interface over your flexible document. It appears that Google Contacts offers many client-side libraries that do this. In one commercial application I worked on, the desktop application data was always passed around as a document (in the form of a DataSet) and we had wrappers or view-models where appropriate for access, but we only had a couple of pieces of code that would ever remove columns that we didn’t expect to be there.

Perhaps JsonPath (supported by Json.net) or something similar may make it easy enough to use as a JSON representation without a wrapper.

Writer’s Contract

To make these “extra” properties work, there has to be a contract about not losing properties. Otherwise as soon as you turn around someone will deserialize the data into their own custom class and lose half of the data. For example, Google Contacts API has this specified:

Note: To ensure forward compatibility, be sure that when you PUT an updated entry you preserve all the XML that was present when you retrieved the entry from the server. Otherwise the ignored elements will be deleted. The Google Data API client libraries all handle this correctly, so if you’re using one of the libraries you’re all set.

If you wanted to try to verify if your API consumers are properly preserving additional properties, you could (always, or below production, or for 1% of items etc) return an additional property such as “cnf_e3a639c2”:null where the name of the property is for example a hashcode generated from the item’s id. As long as when to return (and when to expect) this “conformance” property is deterministic, then you can verify that the property is present on a PUT. This is only needed if you suspect that any of your consumers that are able to update items may be breaking the contract and erasing other consumers’ properties.

Be aware that if your consumers are also storing the documents in their own database, having a unique property name on each document could cause unexpected performance issues for some NoSQL databases.

Fewer version changes required

Since everyone must not lose additional properties with this extensible model, existing consumers won’t have to be updated every time new official properties are added to the API (as long as those new properties are not required and do not have rules based on other things). A complete discussion of document versioning and API versioning is a separate topic, but perhaps additional consumers that are adding their own properties may also end up adding their own version marker for their set of properties.

Evolving API

Since consumers of the API are free to add additional properties that makes sense for them (with some consideration to the name of the properties) the objects may evolve over time. The persisted objects can be queried to learn about what additional properties are getting stored on them, and as properties start to become used with higher frequency, perhaps they can be promoted to “official” properties and documented. If properties start appearing that seem suspect, then a discussion can be started with the consumers that are adding them.

Already Happening

I will leave you with this thought. If your API is being consumed by a web application (especially a SPA) then this idea of not losing properties is likely already happening since that application is likely using JavaScript, has retrieved a JSON object from a GET request to your web service, made modifications to that object and then PUT it back to your web service. If you introduced a new property today, then it likely is already be coming back to you on saves.

Locking on rows or values that don’t exist in PostgreSQL

Image from https://www.pexels.com/photo/door-green-closed-lock-4291/

I came across a situation where for concurrency of two requests trying to make the same resource in the database, I wanted the other requests to block until the resource was inserted and then continue without error.

A similar problem is perhaps trying to determine the largest invoice number in a system and acquire the next number for the invoice you are inserting, without an error because concurrent requests inserted the same invoice number.

I had to think about this one quite a bit to find alternatives. I researched how to lock a non-existing record and it appears to me that you can’t in PostgreSQL even at ISOLATION LEVEL SERIALIZABLE.

I did find that in READ COMMITTED that “UPSERT” (INSERT … ON CONFLICT) can help solve the problem by blocking until another transaction inserts the record and then not attempt to insert it again.

-- set up a test table
create table t (id text primary key, other text);

This works: Using ON CONFLICT to make sure that the record is only inserted once and is not updated by a second concurrent transaction:

-- run two transactions with these statements (statement for statement)
begin transaction isolation level read committed;
select txid_current();
-- optionally abort if it already exists
select * from t where id = 'tryupsert';
-- optionally if found exit

INSERT INTO t (id, other) VALUES ('tryupsert', txid_current()::text)
ON CONFLICT (id) DO NOTHING;

-- second transaction blocks on the above
select * from t where id = 'tryupsert';
commit;
-- second transaction unblocked now
-- The first transaction inserted the row if it did not exist
-- The second transaction executed the DO NOTHING.

This works: Using ON CONFLICT to make sure that the record exists and the last transaction wins on the other column value (all transactions change the value of the row):

-- run two transactions with these statements (statement for statement)
begin transaction isolation level read committed;
select txid_current();

INSERT INTO t (id, other)
SELECT 'tryupsert' AS id, txid_current()::text AS other
ON CONFLICT (id) DO UPDATE SET (id, other) = (excluded.id, excluded.other)
RETURNING *;

-- second transaction blocks on the above
commit;
-- second transaction unblocked now
-- The first transaction inserted the row if it did not exist
-- or updated it if it did exist.
-- The second transaction updated the row.

This works: Using advisory locks:

-- run two transactions with these statements (statement for statement)
begin transaction isolation level read committed;
select txid_current();
select * from t where id = 'advisory';
-- if found exit

select pg_advisory_xact_lock(hashtext('advisory'));
-- second transaction blocks on the above
select * from t where id = 'advisory';
-- if found exit
-- second transaction will unblock when first tx commits and see the row 
-- was already inserted from the first transaction and skip the insert.

INSERT INTO t (id, other) VALUES ('advisory', txid_current()::text);

select * from t where id = 'advisory';
commit;

Using SERIALIZABLE doesn’t work. This makes sense after understanding better (yet again because I keep forgetting how it works – it is kind of designed for less blocking and code that is built to retry on serialization failure)

begin transaction isolation level serializable;
select * from t where id = 'tryserializable' for update;
-- if found exit

-- second transaction does not block
insert into t values ('tryserializable', txid_current()::text);
-- second transaction block on the above
select * from t where id = 'tryserializable';
-- id | other
-- -----------------+-------
-- tryserializable | 628
commit;

The second transaction ends with this error:

ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.

Diagnosing Excessive SQL Server Table Growth

I was asked for help on a problem that can be simplified to:

A SQL Server is exhibiting excessive disk space usage. Much of that space is used by one table and appears to be unused space within the table. It appears to be growth in the LOB_DATA. In this table one column named body is stored as nvarchar(max). The number of rows in the table is not varying much over time.

 

To assist with diagnosing this I asked some questions starting with the simplest explanations I could think of. 

Question 0: What is the edition and version of the SQL Server instance and what is the database compatibility level? We can use this information to look for known issues.

Question 1: What is the “large value types out of row” setting for the table? You should be able to query something like select large_value_types_out_of_row from sys.tables WHERE 

 

My primary bet is that code is repeatedly writing to the database. That by itself may not be a huge problem, but combined with snapshot isolation being enabled, SQL Server may be keeping ghost records around for transactions that are in progress aiding in the growth.

 

Question 2: Do we know if there are repeated writes to the table (i.e. instead of just one insert for each row that exists, there may be multiple updates or even deletes and inserts replacing the rows)?  

Question 3: If we haven’t confirmed that there are repeated writes, have we proved there are not repeated writes? We might do this by adding a trigger to the table to record at least some minimal data about inserts, updates and deletes.

Question 4: If records are repeatedly being updated can that be prevented or reduced? Examples I have seen in the past include trimming the whitespace from columns but not in all places, thus that data being synced with the database is different during a comparison process but becomes the same during the save process.

 

If Questions 2, 3 or 4 indicate repeated writing of the same values, an INSTEAD OF trigger that skips the UPDATE if all the fields match could be a temporary workaround to prevent the excessive file growth.

 

Question 5: What percentage of rows have data_length(body) > 7850? A short-term workaround might be to make some views and/or triggers such that we limit the length of the body to fit within rows by changing the actual storage of body to nvarchar(4000) or whatever fits and truncate the rest. If there isn’t any (or much) data exceeding what could be stored in a nvarchar(4000) field then perhaps a temporary workaround is changing the column definition from nvarchar(max) to nvarchar(4000) and the application code may not even need to be changed at all for that schema change.

Question 6: Has rebuilding the table by copying it to a new table been tried? Much less likely, but I have seen a problem before that due to a bug somewhere around 2000-2005 SQL Server had extra space in the table that could not be reclaimed. We had to copy the data into a new table and delete the old one after updating to a version of SQL Server that had the fix to prevent it from happening again.  

Resolution

I believe that the resolution for this problem may have been a reduction in the number of updates to the rows of the table combined with a drastic reduction of how many transactions were rolled back after performing updates to the table.

Chromecast for multiple wall displays/dashboards

I did some research on using Chromecasts for dashboard TVs.

A custom Chromecast app can be made in HTML5 but that seems like too much hassle. A simpler solution is to just cast from a computer. Casting a Chrome browser tab works, but doesn’t seem to be the best image quality. A higher quality image is obtained by casting a whole 1920×1080 monitor.

Initial Setup

  1. For the best image quality with Chromecasts, add a 1920×1080 display to the computer.
  2. For each Chromecast, make a new user in the Chrome browser named the same as the Chromecast if the user doesn’t already exist.

Start Casting

For each Chromecast:

  1. Open a new Chrome window as a different “chrome user”
  2. Begin casting the whole 1920×1080 monitor (… menu, Cast…, click “Cast to”, choose “Cast Desktop”, Choose right Chromecast, Choose monitor, click Share)
  3. Optional: Go to http://bigtextbox.com/ and type in the Chromecast name so you can find the window again
  4. Optional: Minimize the Chrome window.

Fix Casting

Find the right chrome window, and follow the same steps above.

Observations

  • It seems to send somewhere around or under 200 KB/s per Chromecast to stream a 1920×1080 static display (no animations or motion).
  • 2nd generation Chromecast seems to have a better picture quality than 1st generation.
  • The sending computer will use some CPU power for each Chromecast that will depend on the computer used and what is being cast.
  • A dedicated monitor is not really needed on the sending computer and instead a fake HDMI monitor could be used, but the lag will drive you crazy setting it up by using mouse and keyboard while watching a Chromecast.
  • Instead of dedicated physical monitor and computer, a virtual machine can be used on a computer connected to the same WiFi as the Chromecasts. Then the whole virtual machine can be minimized etc.

Doing work in the middle of a ReactiveX chain

I struggled for a while figuring out how to do some work in the middle of a chain of Observables.

I needed to make four HTTP requests. The second and third can be done at the same time and I need the combined results for the fourth call, but the application needed the results of the third call.

Observable<User> source = sessionService
        // First HTTP request
        .createSessionRx(username, password)
        
        // Second and third HTTP request
        // When they both are successful emit one observable
        .flatMap(new Func1<AuthorizationToken, Observable<User>>()
        {
            @Override
            public Observable<User> call(final AuthorizationToken token)
            {
                // Result from first request
                String authorization = token.AuthorizationToken;

                // Second and third requests
                final Observable<Session> session = 
                    sessionService.getSessionRx(authorization);
                final Observable<SessionBindings> bindings = 
                    sessionService.getSessionBindingsRx(authorization);
                
                // Combine those together like the teeth of a zipper
                return Observable.zip(
                        // second result
                        session, 
                        // third result
                        bindings,
                        // combining funciton
                        new Func2<Session, SessionBindings, User>()
                        {
                            @Override
                            public User call(Session t1, SessionBindings t2)
                            {
                                return new User(username, session, bindings);
                            }
                        });
            }
        })

        // Now we need to do a little work and perform a fourth HTTP request
        // but the subscription wants the zipped result.
        .flatMap(new Func1<User, Observable<User>>()
        {
            @Override
            public Observable<User> call(final User user)
            {
                doTheWorkRequiredBeforeFourthCall(user);

                // This was the confusing part - I need to return the "prior" 
                // result after another successful HTTP request and finally figured 
                // out you can map the fourth resonse to the zipped response.
                return sessionService
                        .fourthRequestRx(user.EmailAddress)
                        .map(new Func1<ResponseBody, User>()
                        {
                            @Override
                            public User call(final ResponseBody responseBody)
                            {
                                // We didn't care what the response from the 
                                // fourth request was, as long as it succeeded.

                                // Everything else wants the user emitted earlier.
                                return user;
                            }
                        });
            }
        });

Subscription subscription = source
        .subscribeOn(Schedulers.io())
        .observeOn(AndroidSchedulers.mainThread())
        .subscribe(new Observer<User>()
        {
            @Override
            public void onCompleted()
            {
                Log.e("RX", "Completed");
            }

            @Override
            public void onError(final Throwable e)
            {
                Log.e("RX", "Failed");
            }

            @Override
            public void onNext(final User user)
            {
                Log.e("RX", "Success");
            }
        });

Installing Multiple Instances (Clusters) of PostgreSQL on CentOS 7

Today I installed multiple instances of PostgreSQL 9.4 on one CentOS 7 machine. I learned a little bit about systemd and how it can adjust the Out-Of-Memory killer with OOMScoreAdjust.

I also learned about data page checksums that were added in PostgreSQL 9.3 but not enabled by default. (Coming from SQL Server this corresponds to PAGE_VERIFY CHECKSUM which is a per-database setting. CHECKSUM was introduced in SQL Server 2005 and is the default for new databases and can be changed at any time.)

I happened to find a similar post about installing multiple PostgreSQL instances but I already had the postgres binaries, I just needed to configure additional services to run the different clusters.

# Installing multiple instances of Postgresql 9.4 on CentOS 7 64-bit using systemd
# This disables the Out-Of-Memory killer for the main process

# Change these three variables to pick an instance name, port 
# and data directory. I suggest using a directory name that is the
# same as the instance name but you can make that different.
MYPGINSTANCE=cluster1
MYPGPORT=5432
MYPGDATA=/data/pgsql/9.4/${MYPGINSTANCE}

# Create a data directory and change its ownership
mkdir –p ${MYPGDATA}
chown postgres:postgres ${MYPGDATA}

# initialize the database using user postgres and enable page checksums (introduced in 9.3)
su postgres -c &quot;/usr/pgsql-9.4/bin/initdb -k -D ${MYPGDATA} -U postgres&quot;

# Use heredoc to create a new .service file for this instance:
cat &gt;/etc/systemd/system/postgresql-9.4-${MYPGINSTANCE}.service &lt;&lt;EOF
.include /lib/systemd/system/postgresql-9.4.service

[Service]
Environment=PGDATA=${MYPGDATA}
Environment=PGPORT=${MYPGPORT}
EOF

# Lets see if this looks right...
cat /etc/systemd/system/postgresql-9.4-${MYPGINSTANCE}.service

# Enable the service at startup, start it, and check its status
systemctl enable postgresql-9.4-${MYPGINSTANCE}
systemctl start postgresql-9.4-${MYPGINSTANCE}
systemctl status postgresql-9.4-${MYPGINSTANCE}

# Lets see the logs for this new service before the postgresql logging takes over
journalctl -u postgresql-9.4-${MYPGINSTANCE}

Programmer Expectations

Here are a few things that I have thought about and expect from programmers, whether they are on my team or not. I have tried to be honest with myself and really think about what I have written. I hope that readers will use these thoughts constructively to find something they can improve upon in their work as programmers or otherwise, either from what I have said or some thought of your own that my words sparked.

-Ross Bradbury

Don’t Reinvent the Wheel

Search for similar functionalities that already exist in the solution. This pertains to functions, user interface, data constraints etc. Building something from scratch should not be the first option even though it is often the most fun.  Reuse existing practices, controls or code when possible. The amount of time spent searching for similarities should be proportional to the complexity of the new work. Include our own code base and third-party code bases that we already use.  Also consider third party solutions we don’t already use for non-trivial functionality. Do not hesitate to ask other team members if they know of similar functionality within the application already, but please do perform a search of the resources you know on your own first.

Consider Uses and Contracts (Explicit and Implied)

Search for all uses of members, classes, controls, components, database views, table schema etc that have been modified in such a way that the existing contract has been changed.  This requires critical thinking about what can be considered a contract, for example:

  • Could a function never return null before, but now it can?
  • Did dependency on string casing change in a way that wasn’t clearly a bug fix?  Even if it was a bug fix, does anything depend on the prior behavior?

Follow the tentacles in order to be as confident as reasonable that changes will not cause something else to break.

Know, Understand and Test Your Changes

Examine and understand all code differences, including changes in designer-generated code, before committing changes (checking them in). If you do not know why the change happened, it is likely it was accidental.  More often than I expect files get checked in that should not have been changed (for example .sln, .vsmdi, .reSharper) or do not compile. Be careful to understand not only what the code does, but how the code works – there is a difference between syntax and semantics.   Knowing what the code does keeps you out of the realm of cargo cult programming.

The compiler is not a code tester – every error the compiler finds can be used as a personal indicator that there are more errors remaining even after the code compiles.  It also isn’t the most efficient use of time waiting for the compiler.  Ideally high-coverage unit tests should be written to test the new code.  Either with or without unit tests, make sure that all of the code has executed before checking in changes – manually cover the code in that case, for example by setting breakpoints and removing the breakpoints as you hit them.  Too often when looking at code from a bug report, code is found to handle a rare condition that couldn’t possibly have ever executed successfully even though it compiles – it was code that was never been run before it was checked it.

Do Not Use a Shotgun to Kill Bugs

When fixing bugs (or really when changing any existing code) it is important to understand all of what the existing code does.  Not every bug can be reproduced in running code but that may actually be a positive thing.  It should be possible to see where the code can go wrong and understand why it happens, but it may take time and practice to develop the skill. Fixes should be justifiable.  For example, fixes checked in with comments like “this should probably fix the crash,” or preventing a null reference by skipping the code if the reference was null without understanding why or how the reference is null are not the correct solution and just end up moving the bugs to different places.  Do not re-write, re-implement or re-factor code that you do not understand and do not know the requirements of; doing so will only move the bug or create new bugs.

Attention to Detail

This is listed as a requirement on just about every job posting I’ve ever read, but sometimes it isn’t easy to understand what it means.

Proofread

One of the things attention to detail means to me is that the same care given to the actual code is also given to everything else – specifically including all comments, whether in code or in changeset notes.  Typographical errors (slips of the hand or finger) found in comments could be taken as an indication of carelessness and I feel they have the potential to significantly reduce trust in the quality of code.  While spelling,  grammar and punctuation mistakes can be errors of ignorance (impossible to catch by the original author,) the majority of typos can be found by a single proofread.  If comments aren’t proofread, was the code?

Create and Maintain Documentation

Document code where it seems non-trivial to understand what is going on. When changing code, double check the documentation of the modified members and class for accuracy.  Comment changes committed to source control with more than just what bug number it fixed – explain a tiny bit about what changed, preferably why if possible.  It is an art to balance what comments go in the source code and which go in the source control comments.

Ask Why? and Consider the Problem Being Solved

Consider the problem being solved by the requested feature or change rather than just jumping straight to coding. Consider what the problem is separately from exactly how the analyst has asked for it to work (why vs. how.) Discuss with analyst your understanding of the problem to confirm that you understand. Once you understand what the problem is and why the analyst has requested the feature work the way it does, you have a much better chance at how to do it with the best design. It is best to do this as early in the design as possible because your input is likely to influence the design. Even if the design seems finalized, programmers can often come up with scenarios that were not considered. Do not be afraid of suggesting simplifications to the design, but offer these as suggestions rather than complaints. Be prepared to have your suggestions turned down but use it as an opportunity to learn more about what and why; it is likely that a simpler approach may have already been considered but was not documented as to why it was not good enough.

As you consider the design, try to think critically about what can happen if the system is used in ways other than what the analysts think the normal use will be.  It is very tempting to jump in and implement what was asked for. The skill of taking some time to think critically to identify problems while they are easier to avoid is applicable to almost everything.

Typesafe access to nullable DataColumns

I recommend using these standards when reading values from a DataRow.

  • Prefer not to use the as operator with the DataRow indexer because it hides type conversion errors and it is slower than other options.
  • Use the Field extension method from System.Data.DataSetExtensions.dll for value-type columns that allow DBNull.
  • Use casting for columns that do not allow DBNull.
  • Prefer checking DataRow.IsNull and casting for nullable reference types

The extension methods Field and SetField are in System.DataSetExtensions.dll (System.Data namespace).

// Instead of using the as operator to access a DateTime? type column...
var entryDate = row[&quot;entry_date&quot;] as DateTime?;
// Use the Field extension method instead.
var entryDate = row.Field&lt;DateTime?&gt;(&quot;entry_date&quot;);

All the signatures that the DataRow indexer accepts are accepted by the Field method.

It really is more of a type-safety and understanding/clarity issue than performance. Timings for 50 million calls (I recommend uses in blue) :

DateTime, AllowDBNull = true, not-null:

 row.Field<DateTime?>(column)                        4172 ms Type-safe and fastest
 row.IsNull(column) ? null : (DateTime?)row[column]  9722 ms
 row[column] as DateTime?                            9025 ms Hides InvalidCastException

DateTime, AllowDBNull = true, null:

 row.Field<DateTime?>(column)                        3935 ms Type-safe and pretty fast
 row.IsNull(column) ? null : (DateTime?)row[column]  3080 ms
 row[column] as DateTime?                            5429 ms Hides InvalidCastException

DateTime, AllowDBNull = false, not-null:

 row.Field<DateTime>(column)                         4639 ms
 (DateTime) row[column]                              2708 ms Type-safe and fast

String, AllowDBNull = false, not-null:

 row.Field<string>(column)                           3362 ms
 (string) row[column]                                1317 ms Type-safe and fast

String, AllowDBNull = true, not-null:

 row.Field<string>(column)                           3467 ms less to type, typesafe, slower.
 row.IsNull(column) ? null : (string) row[column]    2106 ms A lot to type, but fast
 row[column] as string                               1320 ms Hides InvalidCastException

String, AllowDBNull = true, null:

 row.Field<string>(column)                           2857 ms less to type, typesafe, slower.
 row.IsNull(column) ? null : (string) row[column]    1182 ms A lot to type, but fast
 row[column] as string                               1279 ms Hides InvalidCastException

DataRow.IsNull(column) is preferred over testing for DBNull.Value

Please try to use the IsNull method on DataRow to determine if a value is null or not. DataRow/DataColumn does not actually store null or DBNull – it has a BitArray that says whether or not a non-null value is stored in which columns in the row.  Retrieving the column’s value for the row creates a copy of the DBNull.Value structure which doesn’t get created when calling the IsNull method.  Ideally this performance difference is never observable, but I feel it does tend to make the code cleaner and is the more correct implementation.


// This is better than any of the many possible 
// variations of checking the value

myDataRow.IsNull(someColumnName)

// discouraged

myDataRow[someColumnName].Equals(DBNull.Value)

myDataRow[someColumnName] == DBNull.Value

myDataRow[someColumnName] is DBNull

!(myDataRow[someColumnName] is Int32) // (or appropriate type)

Using ReSharper 4.1 to edit C# XML Documentation Comments

This is a repost of this one updated to use ReSharper 4.1…

I have recently been working on improving some of our code documentation.  I’ve found ReSharper to be very helpful in this area. (I’m using ReSharper 4.1 with the “Visual Studio” keyboard shortcuts.)
 
ReSharper’s Surround With Templates
 
I find a lot of places where paragraphs (<para>) need to be inserted, or where a member or class has been referenced by name only, and a reference using the <see> tag would be better.  One thing that I have found extremely helpful is to add my own “Surround With” templates in ReSharper.  Now I just have to highlight the text that I want to be in its own paragraph, and press Ctrl+E, U, D.  If I want to change a member name to a see tag I press Ctrl+E, U, E or use the ReSharper | Code | Surround with… menu and select the template I want.
 
Adding your own templates to ReSharper is easy:
  1. Navigate to ReSharperLive Templates… | Surround Templates.
  2. Expand User Templates.
  3. Expand C#.
  4. Click the New Template toolbar button.
  5. Fill in a Description.
  6. Choose where it will be Available by clicking the hyperlink.  For these XML code documentation comments, I choose Language-specific for C# and check Everywhere (everywhere because none of the other options seemed to work for the comments.)
  7. Edit the template text to wrap $SELECTION$ with the text you want.
  8. Choose OK.
  9. If you like, you may now choose a position for your new template in the quick access list to choose which letter it can be accessed by.  Drag your template into the list on the right to choose a shortcut.

Here are some of the Surround With templates I’ve set up for editing documentation comments:

Name Template Text
<para>
<para>$SELECTION$</para>
<see>
<see cref=”$SELECTION$”/>
<see> label
<see cref=”$SELECTION$”>$SELECTION$</see>
<note> <note type=”caution”>$SELECTION$</note>
 
I’ve found that using the template to insert the <see> tags sometimes cause ReSharper or Visual Studio to incorrectly complain that the code won’t compile by highlighting it; I can usually work normally, but if I get tired of the squiggly lines or if it gets too confused I just close the file and open it again.  I find this to be a minor inconvenience compared to the amount of time it saves.
 
ReSharper’s Live Templates
 
If you want to be able to insert text with ReSharper but you don’t want the text to surround the selection, you can use a Live Template.
 
Adding your own templates to ReSharper is pretty easy:
  1. Navigate to ReSharper | Live Templates | Live Templates.
  2. Expand User Templates.
  3. Expand C#.
  4. Click the New Template toolbar button.
  5. Fill in an shortcut and description.  (I usually set the description to the same thing as the template text since these are usually small.)
  6. Choose where it will be Available by clicking the hyperlink.  For these XML code documentation comments, I choose Language-specific for C# and check Everywhere (everywhere because none of the other options seemed to work for the comments.)
  7. Edit the template text you want to insert.
  8. Choose OK.

Here are some of the Live templates I’ve set up for editing documentation comments:

Abbreviation Template Text
seet
<see langword=”true”/>
seef
<see langword=”false”/>
seen
<see langword=”null”/>