Author’s note: Since publishing this, I’ve learned that the array representation was first proposed in a joint paper by George James and Rob Tweed, A Universal NoSQL Engine, Using a Tried and Tested Technology.

Given the ubiquity of MUMPS in Health IT, and the increasing use of JSON as a data format, there is a need for a simple way of converting between the two formats. But before presenting such a mapping, let us briefly review the two formats.

In MUMPS, arrays are key/value pairs in which keys may be organized hierarchically. Not only are

ROOT(1)="abc"
ROOT(2)="def"

and

ROOT("abc")="def"

legal arrays, but so is

ROOT("abc")="d"
ROOT("abc",0,12)="ef"
ROOT(4,"def")=1.1

There are a few things to notice here. For one thing, numbers and arrays may be freely intermixed. In fact, the language itself doesn’t distinguish between 1 and “1”. We could have quoted the numbers appearing in the above array and the semantics would have been the same. The next thing to notice is that values may be associated with array nodes at any subscript level, not just the deepest one, but they are not required.

By contrast, JSON objects are key/value pairs, the keys of which are strings. The values may be strings, numbers, arrays, other objects, or the special values true, false, and null. For a full description of the format, including train track diagrams for the syntax, see ECMA-404.

Just keep in mind that

  1. JSON objects are key/value pairs enclosed in curly braces ({}).
  2. The keys must be strings, but values may be primitives, arrays, or other objects.
  3. Arrays are sequences of values enclosed in square bracket ([]). Examples include [1, 2, 3], [1, “two”, 3] and [].

Now, let’s consider how a JSON object might be encoded in MUMPS. The following approach appears to be folklore, but it is described by Rob Tweed on his blog, The EWD Files, in JSON – Interfacing VistA (and Other Legacy MUMPS Systems). The idea is to use the fact that MUMPS allows strings as subscripts, and represent a JSON object as an array having, as it subscript, the keys of the object. For example, we would represent

{ "one": 1, "two": 2, "three": 3 }

as

ROOT("one")=1
ROOT("two")=2
ROOT("three")=3

If an object has other objects as values, we can then just add a layer of subscripts. For example,

ROOT("data","one")=1
ROOT("data","two")=2

would represent

{"data": {"one: 1, "two: 2} }

But what about arrays? An obvious idea is to use numeric subscripts. For example,

ROOT("data",1)=1
ROOT("data",2)=2
ROOT("data",3)=3

would represent

{"data": [1, 2, 3] }

Unfortunately, there is a problem. If we were to write

ROOT("data","1")=1
ROOT("data","2")=2
ROOT("data","3")=3

We would probably want it to be interpreted as
{" data": {
    "1": 1,
     "2": 2,
     "3": 3 } }

(However unnatural it is to write JSON this way.)

As an aside, in JavaScript, indexing with square brackets is equivalent to using attributes, but JSON is not JavaScript.

One possible solution is add a new node that identifies arrays as arrays and objects as objects. In particular, we could write

ROOT("data",0)="0^array"
ROOT("data",1)=1
ROOT("data",2)=2
ROOT("data",3)=3

and

ROOT("data",0)="1^object"
ROOT("data","1")=1
ROOT("data","2")=2
ROOT("data","3")=3

This is a bit ugly, but it removes the ambiguity.