Health Level Seven (usually abbreviated HL7) version 2 is a standard for interaction between (and within) healthcare systems and data exchange developed by Health Level Seven International. It is actually only one of several standards developed by HL7, but it is one of the most well-known, and most ubiquitous. This means, among other things, that a working knowledge of HL7 version 2 is a crucial skill for HIT professionals. It continues to be updated, with the most recent version being 2.8. Some other HL7 standards you may have heard of are version 3, the Clinical Document Architecture (CDA), and FHIR.

In HL7 version 2 (and version 3) systems interact by exchanging messages. By contrast, CDA is an architecture for representing clinical documents (such as orders, discharge summaries, or the continuity of care document – a method of storing information a patient may wish to make available to his or her new provider when moving, or changing providers for any reason.)

Because of the traditional method of encoding HL7 version  2 messages, it is sometimes (perhaps disparagingly) referred to as “vertical bar syntax”. To see why, let’s look at an excerpt of a Patient Identificatiton (PID) segment.


Fields in HL7 v2 messages are separated by a field separator character. It is traditionally “|”, but it can be set to a different character, if desired. The segment always begins with a standard identifier (in this case “PID”) which allows the parser to determine the type of segment. There are then a series of three fields with no value. PID.1 is an optional field containing an identifier sometimes used for message orchestration. PID.2 is the patient identifier, but this field has been withdrawn from the current version of the specification, in favor PID.3, which is a required patient identifier list. In this this case, there is only one identifier, butif there were more than one, they would be listed here, separated by the repetition separator (traditionally “~”). For example, if there were two patient identifiers, we might have


Originally, there was a single patient identifier field, and then a repeating field was added for additional identifiers, then PID.2 was withdrawn from the specification, and all the identifiers were placed in a single field. Naturally, this means that PID.2 must be left unvalued. At one point, an alternate PID could be placed in PID.4, but this field was also withdrawn from the specification (for similar reasons). Instead, all PIDs are now placed in a single repeating field, PID.3. (Incidentally, this is a good example of how HL7 v2 is an evolving specification.)

The next field is the patient name, and it is divided into a number of components. You probably noticed that the patient’s first and last name were separated by a new character (“^”), the component separator. HL7 (hereafter, this will mean HL7 version 2 unless specified otherwise) defines a number of data types. The data type for a personal name is XPN. As of version 2.8, it includes 15 different components. We won’t list them here, but they can be found in chapter 2A of the specification. Incidentally, the remaining separators are the subcomponent separator (traditionally “&”), the escape character (traditionally (“\”), and, in newer versions of the specification, a truncation character (“#”) that is used to indicate that a field was truncated because the value exceeds the maximum field length. Not all fields permit this. For examplee, identifiers cannot be truncated.

We now have some idea of how a single segment can be structured. What of an entire message? Messages are sequences of segments, separated by the carriage return (<cr>) character.But before looking more closely at how this works, we need to talk about trigger events. HL7 is an event driven protocol. For example, an Admission/Discharge/Transfer  (ADT) message may be triggered in a number of ways. Some of the trigger events are:

  1. Admit/Visit Notification (A01)
  2. Transfer (A02)
  3. Discharge/End Visit (A03)
  4. Register a Patient (A04)
  5. Pre-admit a Patient (A05)

There are too many event codes associatd with this message to list here. As of version 2.8 of the specification, there are 55.

As you might expect, the data requirements for each event type are different, so there is a different message structure for each message/event pair (say, ADT^A01). The structure of the messaage (as a sequence of segments) is described using a regular grammar. For example, for ADT^A05 (pre-admit) the structure is:

MSH, [{SFT}], [EVN], PID, [PD1], [{ARV}], [{ROL}], [NK1], PV1, [PV2], ...

I have truncated the message grammar for brevity. There are only a few rules needed to make sense of this notation.

  1. Every segment is identified by its three letter nam
  2. Segments surrounded by square brackets ([]) are optional.
  3. Segments surrounded by curly braces ({}) may repeat zero or more times.
  4. Entire groups of segments can be placed in either type of bracket (e.g., {a, b}, [c], d matches abcd, ababcd, ababd, but not acd.
  5. Except as modified by repetition and optionality, segments appear in the order listed.

A message parser will read a message one segment at a time, and then parse the message using, for instance, a finite state machine.

Looking back at our example, certain essential segments (such as PID) cannot be omitted, but others are required. This makes sense: the patient identifier (PID) is needed, but the additional data in PD1 might not be, similary, visit information (PV1) is required, but next of kin (NOK1) may not be required at pre-admit time.

Once a message is sent, it must be processed by the receiving system and acknoowledged by the receiving system (if acknowledgments are requested).

As a concrete example, ADT^A05 is acknowledged at the application level by ACK^A05 (application acknowledgments have trigger events, too).

This leaves with the matter of message orchestration to consider. Of course, there are complexities beyond this process, but we only consider the basic “enhanced mode” flow.

  1. The origin system sends a message to a system to a peer system (which need not be the destination system, it could be an ESB or interface engine, or it could be the destination system).
  2. If the receiving system is able to read and save the message, it sends a commit acknowledgment (“CA”) over the same connection. If an error occurs, such as not being able to read the full message, or if an exception occurs, it sends a commit error (“CE”). If for some reason the message should not be accepted (e.g., if it’s a duplicate), the system returns a commit reject (“CR”). Note, by the way, that these terms are historical, and have nothing to do with database commits.
  3. This process is then repeated for each intermediate system, if the message requires several steps for delivery. Rememember, the CA, CE and CR ACKs are hop-by-hop messages.
  4. When the destination system receives the message, it takes the requested action (e.g., record lab results), and sends a CA to the system from which it received the message.
  5. Messages are queued at each step until they are acknowledged (positively or negatively). If you think about it, this implies a lock-step protocol in which messages are held until the message ahead of them is delivered.
  6. Next, the application acknowledgement (which may be a message containing quite a bit of data) is sent back to the origin system. It is also possible to send an application error (“AE”) or application reject (“AR”).

Of course, there is a lot of detail not considered here, but that, in a nutshell, is how HL7 version 2 works at an operational level. What we have not focused on here is the role of HL7 messages and segments as a content model for healthcare. We also have not considered the use of controlled vocabularies, either in the form of HL7 tables, or vocabularies such as SNOMED CT, LOINC, or ICD-10.

That will have to wait for future posts.