Developing a Complex External DSL. Part 2.
Design and Development
Designing and developing any complex language is a big challenge. Even if you have a good idea of what you want out of a language, working out the details of a complex language can bend your mind. It seems like just about the time you think you’ve got all the language features identified and syntax worked out, your future user base (including yourself) thinks of something new. This is many times even more so the case than when designing a general purpose language.
Of course, like anything useful in our field, a language will undergo enhancement after enhancement over time. One thing we need to do is make sure that we can support those enhancements over time. The language design, therefore, needs to embrace change. Further, a good language design will make developing the language far easier.
I mention a wide range of language syntaxes in this section, including graphical and textual. However, I must limit my scope to just one syntax type in order to keep this article to a reasonable length. Thus, I have chosen to focus on textual DSLs. A textual DSL is easily understood, and is likely applicable even when a graphical DSL is in use (see why below).
Syntax Design For Now and Later
Fortunately most language designers don’t wake up in the morning and say “I’m going to create a language today, and I wonder what it will be.” If we are considering the development of a language we have a good idea why. This is important because if we really don’t have a clear vision for the language our resulting design will be just as weak. So an important first step in language design is to know a lot about what you want your language to do.
Knowing what you want your language to do doesn’t necessarily mean you know what it’s syntax will be. A language’s syntax is not only key to its usability, but can also influence its ability to adapt to enhancements. Of course a language’s syntax must be appropriate to its audience. If your language’s users are technically inclined, even programmers, the language syntax you choose will be different from the language you choose for non-programmer domain experts.
In discussing syntax I am not limiting my context to textual DSLs, even though that is my focus for the remainder of the article. It is possible that your language will have two syntaxes: one that the language user interfaces with and the one that gets saved in a file, and gets parsed and translated. In such a case you will likely develop the user-centric syntax as a graphical user interface (such as a table with rows and columns) or even a glyph/shape based language (similar to Visio diagrams) where the user draws the syntax instead of writing it. In the background, at the file level, your language can be as technical as necessary. We tend to think of graphical syntaxes as models, but models are not limited to graphics just as complex DSLs are not limited to the textual variety. As you will soon see, at some point you will start to think of your language in terms of a model no matter what its syntax is like.
If the technical syntax of your complex DSL is not flexible enough to support future enhancements you face the danger of limited or zero backward compatibility. That is, you may create a new language feature that invalidates sources that existed before the enhancements. I believe that this danger is greater when supporting dual syntaxes. Of course it is possible to provide a source file (model) upgrade transformation tool. However, even file syntax upgrade utilities have limited benefits depending on the nature and complexities of the enhancements. They also are an unwelcome necessity to the user and spell resistance to upgrades.
I can provide a few suggestions for choosing an appropriate and extensible syntax, which are in order of relevance as your design efforts advance from early to later stages:
1.
Study other languages. Think about why languages such as Java and C# are designed the way they are and are so successful. For certain Java and C# syntaxes are designed for acceptance by communities of developers that existed before these newer languages. But there are other reasons. Block based scoping languages are inherently extensible because new blocks of various types may be added adjacent to existing block types and also nested within them. This does not mean that you should reinvent such languages. It means that you can reuse certain aspects of successful languages in your own. Make sure you think about other languages such as Ruby, Smalltalk, Perl, and Python. What do you like and dislike about them? If you could change and blend any set of languages, what would you do? Can you reuse aspects of an existing successful language to improve your language ideas?
2.
Experiment with various syntax ideas using agile techniques. How does it feel to write using each syntax? What do others think of the experimental syntaxes? Can the syntaxes be defined as a formal grammar? What would it be like to support tooling for your favorite syntax?
3.
Identify as many language features as possible. Experiment with various syntaxes (as suggested in point #2) that supports the top 70-80% of those features and withhold the rest. Once you think you have a winning syntax consider what it would require to add the remaining 20-30% of withheld features. Is the first version of the language brittle or is it extensible? Besides withholding and later adding features, purposely implement some of the syntax incorrectly. Consider what is involved in correcting the mistakes and ask yourself what would happen if you had to continue to support both the wrong syntax and the new corrected and enhanced syntax. Is there something about the syntax that makes dealing with these issues better or worse? Is there anything you could do to the syntax to make the issues easier to deal with?
4.
Present your language before communities of potential users. What do they think of the syntax? Is it intimidating to even smart people? Ask people you want to use your language for their honest opinion.
5.
Beta test your language. It is easier to get users to accept high impact changes to a language at version 0.9 than at or above version 1.0. Prepare your beta testers for the possibility that their feedback will result in obsoleting their work of creating source artifacts.
Of course if you are developing a graphical DSL and it will never be authored outside the graphical environment, it is appropriate and perhaps best to use XML. However, I would never suggest that an XML schema be used as a source syntax for any directly edited textual DSL. Just imagine your language users authoring Ant (or Nant) programs and I think you’ll get my drift. As Martin Fowler stated: “XML makes it easy to parse, although not as easily readable as a custom format might be. People do write plug-ins for IDEs to help manipulate the XML files for those who find that angle brackets hurt the eyes.” If you are today designing a directly edited textual DSL avoid XML-based syntax like the plague.
As a final reemphasizing statement about DSL syntax, a complex textual DSL needs to be definable as a formal grammar using a BNF (or EBNF). If your language cannot be expressed as a formal grammar then it is going to be very difficult or impossible to parse. More on parsing and BNF is provided below.
Designing the Language Metamodel
Think of source code written in a language’s grammar (syntax) as a model of concepts that you are describing. The concepts you are describing could be data, structure, and behavior, which are typical concepts in computerlandia. From the language designer’s point of view, the descriptions of these concepts are a model, not just source code. Thus, when you parse a source model and put its representational contents into objects, the objects are called a metamodel.
Even if a language source artifact is loaded into an abstract syntax tree (AST), an AST is a metamodel of sorts. An AST is a metamodel of the parts of a source syntax pertinent to describing its abstract structure, albeit closely tied to the syntax. In any event, I’d suggest that many complex textual DSLs should not be loaded into an AST but rather into a richer metamodel. I favor making the metamodel a graph (as necessary) that is much like the Model layer of the Model-View-Controller pattern; a Domain Model. In this case, however, the graph is not a model but a metamodel of the source model. (Note that Martin Fowler uses the name Semantic Model for what I here call Metamodel. He also defines this concept as an object model that is a Domain Model.)
Although this topic follows that of syntax design, it does not mean that the language’s metamodel cannot be considered prior to a finalized syntax. The fact is, your language’s metamodel is as important to the internal workings of your DSL as your syntax is to its external acceptance and future enhancements. The metamodel’s design can begin even before the language syntax is conceived because the metamodel is not (or should not be) strongly tied to a syntax.
To illustrate this, recently James Gosling called Java’s formal grammar (syntax) “a fraud” (video at approximately 27:00 and also 60:00) because the language’s original design didn’t call for a C-like syntax. Nonetheless, Java internally still had interfaces, classes, methods, fields, threads, primitives, and ran on byte codes. If there was no effort to attract C/C++ programmers to Java by use of a familiar syntax Java could have ended up looking to us much different than it does today. However, you can be sure that Java’s metamodel didn’t have to change (or at least not drastically) in order to facilitate a C-like syntax (perhaps for concepts such as pre- and post-increments, unless those where already supported by a non-C syntax). That’s because the underlying metamodel defined the language’s concepts in an abstract way that could be mapped from multiple syntaxes. This feature is what makes the Java VM a great host for scripting languages such as Groovy and JRuby.
When thinking about a metamodel remember that it is an object model that holds meta information about the source model. Thus, any of the concepts of your language should be expressed richly in the metamodel. Let’s take a familiar example, a metamodel of an object-oriented language. The classes in the object-oriented language metamodel would include:
1
The class MetaClass would contain metadata about the class in the source model that it represents. For instance, if some source code defined a class named EmailAddress then you’d have a MetaClass instance that has a nameattribute/field set to the string “EmailAddress”. Class MetaClass would also contain a collection of MetaFieldinstances and another collection of MetaMethod instances. If the model source class EmailAddress had a field declared with the name address then the MetaClass would contain at least one element in its fields collection, an instance of class MetaField that has its name attribute/field set to the string “address”. Further, each MetaFieldinstance would have a reference to the MetaClass type of that model field. Thus, the metamodel forms a graph.
You would use meta classes analogous to this example, but for the particular model representation expressed in your DSL. I suggest considering a meta class hierarchy that starts with the abstract base class MetaObject. The MetaObjectwould be used to provide default state and behavior for all meta subclasses. For example, perhaps many meta objects your language supports have a name. In that case class MetaObject would contain the name attribute/field and accessors in behalf of all subclasses. After you have defined a useful MetaObject you can start designing the full class hierarchy of your metamodel. Of course over time as your metamodel grows you would refactor common state and behavior down into classMetaObject.
To take this approach one step further, if you are familiar with Eric Evans’ “Domain-Driven Design” patterns (DDD), you could apply those patterns to your metamodel. I use this approach for my DomainMETHOD tool, which is a DSL that facilitates DDD and generates a working application domain layer. So I get the best of both worlds: I design and develop my tool using DDD, and I support the design and generation of working DDD-based domain layers. My tool’s design is complete with Entities, Value Objects, Aggregates, Repositories, and more. I store metamodel object Aggregates that are read from model source in my MetaObjectRepository. I use a ProjectRepository to find project configurations and custom generation directives, and to create, find, and store generated target artifacts, which are eventually persisted to output files. I also have a repository that I use for finding and for managing the state of source templates, aptly namedTemplateRepository.
Using DDD can be a practical and powerful means of conceptualizing, designing, and implementing your metamodel. This might not always work out depending on the DSL characteristics, but if it can you should consider using DDD.