Domain Specific Language (DSL) for People in a Hurry – The Matrix version

What if I told you that you need to create an application that can transform a customer’s data involving multiple files, into a single, unified model? Pretty straightforward, right?

Now imagine you had a second customer who had another, completely different layout of the data, that you had to transform. The files are 5 times the size of the first!

A few painstaking hours later, you may have transformed it to the unified model. Not exactly easy, but doable.

Alright! Now, imagine 100 customers data, each of which could possibly have any number of files in any format (TXT. CSV, EDI, XML, JSON and so on), of any size and you would need to transform all of it.

At this point, you realize that you’ve prepped for a hand-to-hand combat, instead of a nun chuck fight!

When the problem statement is polymorphic in nature, there are handful of approaches you can take to architect the solution.

Enter Domain Specific Language (DSL)

DSL is what you resort to, when dealing with complex business rules pertinent to the domain. You can abstract complex logic to simple DSL operators for business users, and write simple rules to transform the data, aka “meta-programming”.

Now, you have a choice!

You could take the “Blue pill”; wake up in your bed, and code your application using general programming languages.

Or you take the “Red pill” and you stay in ‘DSL’ wonderland, and I show you how deep the rabbit hole goes!

Made your choice? Let’s get started!

DSL could easily fly out of control if not properly designed. Any new structure not familiar to the common world, would almost be a new programming language!

So, we need a structure that is familiar to all, but configurable to our needs. On top of this, we need a structure that would be easy for users to adhere to.

Enter LISP

LISP is the second oldest programming language – it’s been around for over half a century – and its functions are written as a list structure. This enables the meta-programming we are trying to achieve.

Before knowing how to use LISP with DSL, we would first need to understand what an S-expression is.

S-expression – s-expr or s-exp – is simply a way to represent nested structure. This nested structure could be anything; data, algorithms, code, arithmetic formula and so on.

Consider an example:

I want to read the date input of two customers. One in US, and the another in the UK where the date format is different. The application should support a:

(parse-date (“20/06/1988” “MM/dd/yyyy”) “yyyy-MM-dd”)

Here the “parse-date” operator would read the raw string in the given format, and convert to a date object of a desired format. The operator implementation could be written in any programming language, so I chose Scala and Spark.

case class ParseDate(dateFormat: String) {
         override getDate(df: DataFrame) = {
                   to_date(df(‘date’), dateFormat)

If you noticed, it is a pretty simple representation, since we adapted the S-expression model to represent the transformation. You can configure the LISP for different customers to satisfy their models.

No matter how complex the holistic solution is, we could break it down to simple S-expressions, as explained above, and construct the complete model piece-by-piece using LISP and S-expressions.

Now that we have our ingredients – DSL for framework, LISP for syntactic structure, S-Expression for Atom representation – lets start designing the application.

Our first step would be to define a grammar to the LISP that is being written by the users. Antlr can define the .g files to restrict the users from being carried away!

There are quite a few projects on GitHub that are available for this, so pick a beast that suits you.

Once we have the grammar files, choose your favorite programming language to start coding. The beauty of this framework is that you could change the technology stack anytime and it would not impact the business rules as they are encapsulated in the LISP. You could just rewire your code base, and things would work as they always have.

Once you have the created your S-expression operator support, we get the users to start writing the LISP files to transform the customer data.