Developing a Complex External DSL. Part 1.

The use of a domain-specific language, or DSL, is becoming a realistic and even necessary solution for software developers on all sorts of software development projects. You’ve heard about DSLs, and you may know that DSLs are divided into a few different styles, internal and external. But what is an internal DSL and external DSL? When would you decide to use one or the other? And, primarily, how would you go about developing a complex external DSL? This article answers these questions, with a focus on developing a complex external DSL.

Defining Domain-Specific Language

A domain-specific language (DSL) is a computer language that is developed to specialize in addressing the needs of a given problem domain. The domain itself could be many things. It could be specific to an industry, such as insurance, education, aerospace, medicine, etc., or to a technology or methodology, such as JEE, .NET, database, services, messaging, architecture, or domain-driven design.

The reason I would develop a DSL is to make dealing with a set of challenges in a domain I am working in more elegant and easier to deal with. That’s because the language I create will be just what I need to address my unique set of challenges, and no more than that. And of course, if I provide my language for others to use it may have to broaden a bit to address what they need, but still nothing more. The effort has the goal of making it feel more natural to use the DSL than to use a general purpose programming language or some other non-targeted tool.

An important distinction for this article is internal DSLs and external DSLs. Each of these is a style of DSL, and it is important to understand what style is appropriate for a specific problem domain. It is out of the scope of this article to delve deeply into what defines a DSL in general and what makes an internal and external DSL. Martin Fowler and others have lead such thought experiments and I suggest that you read such work for more detail on the subject. However, I do provide some basic context setting here.
Internal DSL

The language that is developed could be very closely tied to and implemented on top of the primary general purpose programming language in use in your project, such as Java, C#, or Ruby.

The Rails framework has been called a Ruby-based DSL that manages web applications written in Ruby. One of the reasons that Rails is called a DSL is because some of the Ruby language features are used to make programming against Rails seem different from programming against the general purpose Ruby language. When thought of as a language, Rails being created on top of Ruby as its foundation had a big head start in becoming a language in its own right.

I am not sure if Dave Thomas (PragDave) views Rails as a whole as a DSL, but he notes that several features of Rails are supported by different DSLs. The example he presents is Active Record Declarations as a DSL. By using some simple jargon specific to domain model entity associations, Rails developers allow the DSL to manage all sorts of complex infrastructure and operations behind the scenes while themselves focusing on high-level entity association concepts.

Whether its creators or huge consumer base view Rails as a whole a DSL, or only some features within Rails (Active Record Declarations), what I have been discussing here is an internal DSL. This DSL style has been labeled internal because, again, it is closely tied to and implemented on top of a primary programming language, but employs techniques to make it seem like it comprises a specialized language in its own right.

One of the key defining characteristics of the API of a framework or library fitting the bill of internal DSL, according to Martin Fowler and Eric Evans, is that it has a fluent interface. It basically means that you can stitch together short object expressions to form longer expressions that read more naturally.

I have been using and designing fluent APIs off and on for a while. For example, much of my early experience with fluent APIs was in Smalltalk. There are two ways to develop and use fluent interfaces in Smalltalk. First you can make the answer (return value) from one object message expression (method invocation) become the receiver of the next message:

1 + 2 + 4

Here the number (object) 1 receives the + message with the number (object) 2 as its argument and the resulting number 3 (implicit) itself becomes the receiver of the next + message with the argument of number 4. (For clarity, number literals are not primitives in Smalltalk; they are first-class objects.) Of course this seems natural to all of us, so much so that its roots aren’t even in programming. Well, that’s the point. I could have accomplished the same thing like this:

result := 1 add: 2.
result := result add: 4.

But that’s not natural or fluent to most. At first glance the fluent expression says it’s clearly 7; not so the second example. And since this technique is not limited to a numeric math domain, the fluent nature of Smalltalk programming makes it easy to deal with the domain-specific aspects of many different domains. You are welcome to look into Smalltalk cascades, which is a second language facility that supports fluent interfaces. I demonstrated the first approach since it is supported by modern object-oriented languages (but in some cases not in the same way with number literals).

The significance here is that the fluency around a given API is designed in to make dealing with the given problem domain more elegant and efficient, even natural to experts within the domain. That’s what makes it a DSL.

Of course we are each free to think of things like this in a manner that makes most sense to us. Whether or not we personally choose to think of a given API as a DSL or not is our choice. But it is important to recognize what Martin Fowler and others see what I have described above as internal DSLs. They tend to have a lot of influence on what the rest of the industry thinks and does. It’s a nomenclature that is likely to stick around and come up in discussions for a while.

In my experience we tend to design and develop internal DSLs when we need a technical API for ourself and fellow developers to use. It helps when the general purpose programming language on which we base the internal DSL has the rich rich set of facilities that support this style. Clearly some languages such as Smalltalk and Ruby make this easier. Some languages such as Java and C# make it less easy. Targeting a fluent API and other internal DSL facilities at capable software developers cuts complexity and development time. However, if we are trying to simplify and add power to non-programmer domain experts, internal DSLs are not an option.
External DSL

Frankly I never thought of using, designing, and developing fluent APIs as work in domain-specific languages. So I have to admit that because of a long history with fluent APIs, it is difficult for me to shoehorn the concept into the internal DSL category. But I’m learning. On the other hand, when I first read about DSLs I immediately associated the definition with my work of creating a number of “little languages.” Because of this I’d suggest that if you had trouble swallowing the definition of internal DSL above, this flavor will taste much better.

Defining an external DSL is much easier than defining an internal DSL. It is analogous to creating a general purpose programming language that is either compiled or interpreted. The language has a formal grammar; that is to say, the language is constrained to allow only well-defined keywords and expression types. The source code of a program written in the language is kept in one or more files that have a text format, or might be more of a tabular or even graphical format. In the textual DSL case you’d create the source file using a text editor or a full fledged IDE. You then compile the source code and run it as part of a resulting program, or otherwise run the source code directly through an interpretor.

The main difference between a general purpose language source and an external DSL source is that when compiled the DSL is generally not output as a directly executable program (but it could be). Generally the external DSL will be translated into a resource that is compatible with the core application’s operational environment, or it may be translated into source code of the same primary general purpose programming language used by and built as part of the core application.

An example of an external DSL that is translated into a resource that is consumed by the application is the object-relational mapping files used by Hibernate and NHibernate. Another example is presented by Jay Fields of ThoughtWorks on “Business Natural Languages.” You can imagine an external DSL that contains metadata that your application needs to, say, validate user input. You would read the metadata, transform it into an efficient internal format, and use it at runtime.

An example of an external DSL that gets translated into the source code of a target application are the languages developed using the MetaCase MetaEdit + Workbench and the Jetbrains Meta Programming System (MPS). Another example is Markus Völter’s work on “Architecture as Language.” In this example Markus is able to define a software architecture, validate it, and generate code from textual architecture descriptions.

An external DSL may be used to directly support the design and development effort and thus be used by software developers. In that case it could be used to generate code that consumes a fluent API developed as an internal DSL. An external DSL is also appropriate for use by non-programmer domain experts, if properly designed and targeted for that class of user.

In many cases an external DSL is best when accompanied with language support, such as tooling. When only a small group of software developers, including the DSL’s authors, are its users, you may only need a simple text editor. But when the DSL is distributed outside its team of creators, or when it is targeted at non-programmer domain experts, a syntax highlighting and code assist editor is essential to the success of the DSL. Other tooling may also be appropriate or necessary.
Language Complexity

I consider a complex external DSL one that:

1.

Is not easy to parse. A comma delimited file of text records is relatively easy to parse. A language such as Java is not relatively easy to parse. A complex external DSL is somewhere in between the two, and probably leans closer to the Java language than a CSV in terms of complexity. (You may still want to develop a formal grammar for a CSV anyway, but my point is it is not essential and possibly quicker not to.)
2.

Requires a complex internal representation once parsed. The complex representation is a tree or object graph that contains the source artifact expressions in a convenient and optimized state. This representation supports validation and both/either interpreted and/or generative tooling.
3.

Facilities the generation of one, several, or even many complex target artifacts from a single source file or multiple source files. This of course is only the case if the language is not only interpreted but is also/instead used by generative tooling. If you only decorate the target artifact with a few minor details not provided by the source DSL, I’d question why the target source language was not used in the first place.

With the above context you may be wondering how you’d actually implement a complex external DSL. I describe that in the next section.

Leave a Reply