How does one handle Data in Process? It depends!

Yesterday, I had the opportunity to lead a session at Camunda Unconference.

This was a rather unique event, with an “unconference” format designed specifically for the community; it was most conducive to great peer-to-peer discussions, and exhange of ideas, experiences and learnings.

What was also interesting was that the topics for the sessions were sourced from within the community, and each topic was voted in to a final shortlist. The sessions themselves were discussion-led, to encourage collaboration and creativity.

Data in Process

I had voted for “Data in Process” as one of the topics; it has always intrigued me, and needless to say, I was thrilled at the opportunity to lead the session!

I have been consulting for customers building Cloud Native applications using BPM engines such as Camunda for their business processes, for the past several years. “Control flow” and “data flow” are key to any business process solution that the customers build.

Based on my experience across several projects, I believe designing for “how to handle data in business process” is a particularly complex design problem, and is key to the success of your implementation.

You can create a business process instance that has all the data required. Or you can just pass references to external data, and the business process retrieves the data during run-time. Or you can use a mixed strategy. There really is no ‘one size fits all’ solution to this question.

This personal observation was validated during the session, when one participant mentioned how they discussed “Data in Process” when they began their BPM journey a few years ago.

There are a lot of nuances to the topic irrespective of the BPM engine or tool you use. And it is primarily a design decision.

I structured my discussions around the The Why, The What and The How framework.

The Why

In a business process, data is necessary not just to work with from a business perspective, but to also drive the business process to the correct, next steps. Data can be:

  • input by the users
  • loaded from external systems
  • used for decision making
  • denerated or transformed during the execution of business process

One participant also pointed out how Data for KPI and Reporting is of interest. As you operate business process workflows in production, measuring various KPIs such as average time to complete the workflow, time required for a particular step are all very important. This type of data is inferred based on workflow execution.

The What

The most complex decision while designing a BPM solution, in my experience, has been the type of data, or “what” you want included in the business process.

Do you “store the entire payload,” or “store only references to data in external systems,” or do you use a mixed strategy?

I’ve come to realize the answer is, always, “it depends,” and that practical experiences and learnings across projects is what helps identify and refine that strategy.

As expected, this is the area we spent most of our discussion time. I took back some interesting insights myself, as I learnt from the experiences of the participants.

The discussions were lively, with several of us agreening and disagreeing on the thoughts shared. But this only allowed us to reflect with a keener observation, on our own experiences, and that brought about additional points for us to consider. Here’s how we began:

  • “Store only references to data.” Most participants agreed this is the typical default everyone should turn to.
  • “Store the entire payload.” Choose this approach provided you don’t have any other system of record.
  • In many cases a “mixed strategy” would be ideal.

One point that stood out for me was how availability of  a “source system” was a challenge for the “Store only references to data” option. One participant explained that their source system was a mainframe, and how the workflow process stalled when the source system was unavailable.

Another problem highlighted was the performance of process execution, if we were to access data from a source system each time we needed it.

Ease of Testing / Troubleshooting was a grey area. Some strongly disagreeing that it is easy to do in one approach vs the other. This is primarily a result of number of tools and techniques available to help test and debug activities.

We used a Miro board for our discussion, which I believe is a great tool for collaboration! Here is our final board. The discussions were rich, and we ran out of time before we could arrive at a consensus!

The How

We discussed options such as “process variables,” solutions that allow us to model complex data objects, concurrent access issues and so on. I only wish we had had more time to discuss this aspect a bit more!

At the end of the day, it was an amazing event, made thoroughly engaging and interactive through the “unconference” format!

Thank you, Camunda, the rest of the event organizing team, and participants for the opportunity!

References

  1. Best Practice document from Camunda. https://camunda.com/best-practices/handling-data-in-processes/
    This article provides great information about data in process and how to handle.
  2. Great tool to handle data in process – something that I learnt of during the conference. https://www.holunda.io/camunda-bpm-data/quick-start/

Enterprise speed, skill, and quality
are yours for the taking.

PeerIslands brings you the technology, innovation,
and talent necessary for your IT transformation.

Get Started