Features

In this section, we’re going to describe some of the more advanced features you can do with NStack when building your modules and composing them together to build workflows.

Composition

Workflows can contain as many steps as you like, as long as the output type of one matches the input type of the other. For instance, let’s say we wanted to create the following workflow based on the Iris example in In-Depth Tutorial - Productionising a Classifier and available on GitHub

  • Expose an HTTP endpoint which takes four Doubles
  • Send these Doubles to our classifier, Iris.Classify, which will tell us the species of the iris
  • Count the number of characters in the species of the iris using our Demo.numChars function
  • Write the result to the log

We could write the following workflow:

module Iris.Workflow:0.0.1-SNAPSHOT
import Iris.Classify:0.0.1-SNAPSHOT as Classifier
import Demo:0.0.1-SNAPSHOT as Demo

def multipleSteps =
  Sources.http<(Double, Double, Double, Double)> { http_path = "/irisendpoint" } |
  Classifier.predict |
  Demo.numChars |
  sinks.log<Integer>

Note

numChars and predict can be composed together because their types – or schemas – match. If predict wasn’t configured to output Text, or numChars wasn’t configured to take Text as input, NStack would not let you build the following workflow.

Streaming multiple values

Sometimes it’s desirable to return more than one value from a function. For example, we might want to asynchronously query an HTTP endpoint and process each response independently. If we don’t care about any connection between the different results, we can return each result independently.

Let’s look at a toy example: a function that takes in a list of numbers and returns them as strings. In this case, each transformation takes a reasonable amount of time to compute.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
NumString:0.0.1-SNAPSHOT Service
"""
import nstack

class Module(nstack.Module):
    def stringToNum (self, xs):
        return [self.transform(x) for x in xs]

    def transform(self, x):
        time.sleep(5) # TODO: Work out how to make this more efficient
        return str(x)

If we don’t need the entire list at once, we can change this to a Python generator. Rather than working on a list, our next function will have to work on an individual string. That is, when we return a generator, each output is passed individually to the next function.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
NumString:0.0.1-SNAPSHOT Service
"""
import nstack

class Module(nstack.Module):
    def stringToNum (self, xs):
        return (self.transform(x) for x in xs)

    def transform(self, x):
        time.sleep(5) # TODO: Work out how to make this more efficient
        return str(x)

Workflow Reuse

All of the workflows that we have written so far have been fully composed, which means that they contain a source and a sink. Many times, you want to split up sources, sinks, and functions into separate pieces you can share and reuse. In this case, we say that a workflow is partially composed, which just means it does not contain a source and a sink. These workflows cannot be started by themselves, but can be shared and attached to other sources and/or sinks to become fully composed.

For instance, we could combine Iris.Classify.predict and demo.numChars from the previous example to form a new workflow speciesLength like so:

module Iris.Workflow:0.0.1-SNAPSHOT
import Iris.Classify:0.0.1-SNAPSHOT as Classifier
import Demo:0.0.1-SNAPSHOT as Demo

def speciesLength = Classifier.predict | Demo.numChars

Because our workflow Iris.Workflow.speciesLength has not been connected to a source or a sink, it in itself is still a function. If we build this workflow, we can see speciesLength alongside our other functions by using the list command:

~/Iris.Workflow/ $ nstack list functions
Iris.Classify:0.0.1-SNAPSHOT
  predict : (Double, Double, Double, Double) -> Text
Demo:0.0.1
  numChars : Text -> Integer
Iris.Workflow:0.0.1-SNAPSHOT
  speciesLength : (Double, Double, Double, Double) -> Integer

As we would expect, the input type of the workflow is the input type of Iris.Classify.predict, and the output type is the output type of demo.numChars. Like other functions, this must be connected to a source and a sink to make it fully composed, which means we could use this workflow it in another workflow.

module Iris.Endpoint:0.0.1-SNAPSHOT
import Iris.Workflow:0.0.1-SNAPSHOT as IrisWF
def http = Sources.http<(Double, Double, Double, Double)> |
  IrisWF.speciesLength |
  Sinks.log<Integer>

Often times you want to re-use a source or a sink without reconfiguring them. To do this, we can similarly separate the sources and sinks into separate workflows, like so:

module Iris.Workflow:0.0.1-SNAPSHOT
import Iris.Classify:0.0.1-SNAPSHOT as Classifier

def httpEndpoint = sources.http<(Double, Double, Double, Double)> { http_path = "speciesLength" }
def logSink = sinks.log<Text>

def speciesWf = httpEndpoint | Classifier.predict | logSink

Separating sources and sinks becomes useful when you’re connecting to more complex integrations which you don’t want to configure each time you use it – many times you want to reuse a source or sink in multiple workflows. In the following example, we are defining a module which provides a source and a sink which both sit ontop of Postgres.

module Iris.DB:0.0.1-SNAPSHOT

def petalsAndSepals = Sources.postgres<(Double, Double, Double, Double)> {
  pg_database = "flowers",
  pg_query = "SELECT * FROM iris"
}

def irisSpecies = Sinks.postgres<Text> {
  pg_database = "flowers",
  pg_table = "iris"
}

If we built this module, petalsAndSepals and irisSpecies could be used in other modules as sources and sinks, themselves.

We may also want to add a functions to do some pre- or post- processing to a source or sink. For instance:

module IrisCleanDbs:0.0.1-SNAPSHOT

import PetalTools:1.0.0 as PetalTools
import TextTools:1.1.2 as TextTools
import Iris.DB:0.0.1-SNAPSHOT as DB

def roundedPetalsSource = DB.petalsAndSepals | PetalsTools.roundPetalLengths
def irisSpeciesUppercase = TextTools.toUppercase | DB.irisSpecies

Because roundedPetalsSource is a combination of a source and a function, it is still a valid source. Similarly, irisSpeciesUppercase is a combination of a function and a sink, so it is still a valid sink.

Because NStack functions, source, and sinks can be composed and reused, this lets you build powerful abstractions over infrastructure.

Versioning

Modules in NStack are versioned with a 3-digit suffix that is intended to follow semantic versioning, e.g.:

Demo:0.0.1

This is specified in the nstack.yaml for code-based modules, and in module.nml for workflow modules. A module of a specific version is completely immutable, and it’s not possible to build another copy of the module with the same version without deleting it first.

Snapshots

When creating a new module, i.e. with nstack init, your module will have the version number (0.0.1-SNAPSHOT). The SNAPSHOT tag tells NStack to allow you to override it every time you build. This is helpful for development, as you do not need to constantly increase the version number. When you deem your module is ready for release, you can remove the SNAPSHOT suffix and NStack will create an immutable version of 0.0.1.

Note

Care is needed importing SNAPSHOT modules, NStack will warn you if your snapshot module changes in such a way that your imports/pipeline are no longer valid and ask you to rebuild if needed. You can also resolve this using project files that rebuild all dependencies as needed - see NStack Projects

Configuration

In addition to receiving input at runtime, modules, sources, and sinks often need to be able to configured by a workflow author. To do this, we use brackets and pass in a list of named records:

Sources.Postgres<Text> {
     pg_host = "localhost",
     pg_port = "5432",
     pg_user = "user",
     pg_password = "123456",
     pg_database = "db",
     pg_query = "SELECT * FROM tbl;"
 }

For sources and sinks, some parameters are mandatory, and some provide sensible defaults. This is documented in Supported Integrations.

To pass configuration parameters to a module, we use the same syntax

FirstLastName.full_name { first_name = "John" }

NStack passes in configuration parameters as a dictionary, args, which is added to the base class of your module. For instance, in Python you can access configuration parameters in the following manner:

class Module(nstack.Module):

    def full_name(self, second_name):
      full_name = "{} {}".format(self.args.get("first_name", "Tux"), second_name)
      return full_name

Framework Modules

It is often useful to create a common parent module with dependencies already installed, either to save time or for standardisation. NStack supports this with Framework Modules. Simply create a new module similar to above, nstack init framework [parent], and modify the resulting nstack.yaml as needed.

You can then build this module using nstack build, and refer to it from later modules within the parent field of their nstack.yaml config file.

NStack Projects

When using NStack you may find that you are working on several different modules at once that are imported into a main module where they are composed into a workflow. In these cases it can be cumbersome to to ensure you rebuilt every module manually to ensure all changes are propagated. NStack provides projects in these cases that are used to logically group a set of modules together so that they are all built together in the correct order.

Assuming your modules are all contained as directories within ./modules, an NStack project can be formed by creating a file called nstack-project.yaml in the root directory, e.g.

modules/
├── Acme.PCA/
├── Acme.CustomerChurn/
└── nstack-project.yaml

where nstack-project.yaml simply contains the list of module directories that can be ordered as needed, e.g.

# NStack Acme Project File
modules:
  - Acme.PCA
  - Acme.CustomerChurn

Simply run nstack build from the root modules directory and all listed modules will be compiled in the order given.