In my previous post on NiFi, I mentioned the differences between flow-based programming and other models. This post will help bridge the gap for those new to the flow based model and NiFi Development.
What is flow-based programming?
In brief, flow-based programming (FBP) is a development approach where a series of processes send messages (data) along one or more configurable paths.
Think of it like as a bunch of processes that read input and provide output to a message bus. The processes don’t need to know anything about where the message came from or who is going to consume it next. They only need to perform their function and send the result to a pre-determined spot.
Comparison with other programming models
More mainstream models like procedural, OO, or functional programming are set up like a hierarchical tree. The top “leaf” invokes another down a branch — which may do the same many times using various paths, eventually returning an outcome to the top level.
FBP is like a series of robots (processors) that perform a singular function and don’t need to know what is before or after them.
The benefit of the FBP approach is twofold:
- The processors are easy to re-use.
- The pathways connecting the processors are changeable without affecting processor behavior.
Yes, this is simplified for illustration purposes. One can argue that some programming models have similar characteristics, but that’s another blog post…
An Assembly Line of Asynchronous Processors
In NiFi, there are hundreds of out-of-the-box processors available. Latest count was 292.
From the first node to the last, each processor needs to know nothing about any other. It performs a task and routes the resulting data down one or more paths.
The data passed from processor to processor is a flow file. Flow files consists of content and attributes. Evaluation of attributes determines, among other things, which path a flow file will take on exit.
NiFi also makes conditional routing easy. Send results that pass a condition down one path, and those that don’t down another. Want to change the destination processor of one of those paths? Drag and drop the connecting path where you want it to be.
One of the most common concerns for any NiFi Development project is scalability. Since NiFi itself is a bit of a black-box, does this make it more difficult to scale?
Actually, quite the opposite in my experience. NiFi scales at the processor level and not the module level. Deployed in a ZooKeeper cluster, each processor scales based on demand. You can define scaling limits of a processor as part of its definition.
This gives a NiFi implementation a scalability edge because the scaling logic is inherent in the platform.
What NiFi offers with scalability is the best of both worlds. Granular scalability within configurable limits without having to alter how modules deploy.
Configuration vs code
Most of the NiFi development work I’ve done has had minimal custom coding needed to process data from end-to-end. There are places to use a scripting processor to perform a specific function, but those conditions are rare. I’d say that less than 5% of the processors I’ve implemented with clients have had the need for a custom script.
The nice things about the scripting processor is that it’s there when you need it. But be careful. Inexperienced NiFi programmers overuse this when they lack knowledge about other processors that can do the job.
What is NiFi Expression Language?
OK. The NiFi Expression Language has a learning curve. My first attempts with using this were frustrating. I caught on, but it reminded me of my early experience with Regular Expressions. OK, maybe not that bad. But close.
You’ll have to learn NiFi Expression Language because you’ll use it for interacting with processor properties. NiFi processor documentation is pretty clear about which attributes support the Expression Language.
Read more about NiFi Expression Language here.
Easy to Test in Parallel
One of the nicest parts of using NiFi is that it can run in parallel with existing processes. Pull the data from the same source, run it through the NiFi application, and compare the output.
In some cases, you can use NiFi to compare the two outputs for you! Here’s one way you can do that:
- Write the results from your NiFi application into a DistributedMapCache NiFi service.
- Create a processor that reads the results from your non-NiFi application and use a DetectDuplicate processor to find duplicates in the cache.
- If all results are duplicate, the results should be identical.
Give NiFi a Try
It’s easy to try NiFi Development using Docker. Use the commands below to get NiFi up and running on your local machine. I encourage you to check out what’s possible with this unique and powerful tool.
docker run --rm -p 8080:8080 -p 8181:8181 webcenter/alpine-nifi
Give it a minute or two, then open: http://localhost:8080/nifi.
If you missed it, you can read the first NiFi post here.