Persistence

Automatiko by default runs without any persistent storage so it can be used without additional components. Though in many cases workflow instances are long lived and thus require storage to persist its state.

Automatiko comes with several options to chose from

File system based storage
Database based storage
Amazon DynamoDB based storage
Apache Cassandra based storage

More is on their way and as soon as they will be implemented they will be listed here.

File system based storage

The most basic but quite powerful storage option is based on file system. This means each workflow instance will be stored in dedicated file in configured directory.

In addition to file itself, Automatiko will store some metadata that allows to more efficiently look up instances without loading it completely.

Depending on file system capabilities these metadata will be stored as

extended file attributes if file system supports it
dot file next to the actual file as a fallback option when file system does not support extended attributes

Configuration

To use file system based perssitence your service must have following dependency

<dependency>
  <groupId>io.automatiko.addons</groupId>
  <artifactId>automatiko-filesystem-persistence-addon</artifactId>
</dependency>

Note that each persistence addon comes with job service implementation as well to provide persistence of timers defined in workflows

Additional configuration should be placed in application.properties

Property name

Environment variable

Description

Required

Default value

BuildTime only

quarkus.automatiko.persistence.type

Specify what persistence should be used

Yes

quarkus.automatiko.persistence.filesystem.path

QUARKUS_AUTOMATIKO_PERSISTENCE_FILESYSTEM_PATH

Location on file system that will be used to store persistent state

Yes

quarkus.automatiko.jobs.type

Specifies type of jobs implementation to be used

Yes

quarkus.automatiko.jobs.filesystem.path

QUARKUS_AUTOMATIKO_JOBS_FILESYSTEM_PATH

Location on file system where jobs persistent state will be stored

Yes

quarkus.automatiko.jobs.filesystem.threads

QUARKUS_AUTOMATIKO_JOBS_FILESYSTEM_THREADS

Specifies how many threads should be used for jobs execution

This storage option can be used for pretty much any use case but it has certain drawbacks depending on the file system being used. It might not be the best option when horizontal scalability is important.

Database based storage

Database based storage is mainly dedicated to cover Database record processing as it requires workflow data objects to be

simple types
entities
collection of entities

Its main applicability is to store workflow state next to the data it is processing.

Data model of the workflow becomes an entity as well, which on database level will be represented as table.

Every workflow definition will be stored in dedicated table that is constructed based on workflow identifier and version. The table will also have all simple type data objects in the same table and reference any entity data objects.

db persistence model

Green tables are those defined as data objects and annotated as entities. Orange tables are generated based on workflow definitions

vacations is the top level (public) process for handling vacation requests
requests is the sub workflow (private) that handles individual request

Blue table is used by jobs service that stores its data in db as well to support persistent timers defined in the workflow.

Data management

Since workflows operate on database records (entities defined as data objects) it has an option to manage these records. One of the important aspects is what should happen to the data after workflow instance completes?

There are two options provided by Automatiko

keep the data as they are at workflow instance completion
remove data that the workflow instance has worked with

By default data is not removed at the completion of workflow instance

Configuration

To use database based persistence your service must have following dependency

<dependency>
  <groupId>io.automatiko.addons</groupId>
  <artifactId>automatiko-db-persistence-addon</artifactId>
</dependency>

Note that each persistence addon comes with job service implementation as well to provide persistence of timers defined in workflows

Additional configuration should be placed in application.properties

Table 1. Automatiko specific configuration properties
Property name	Environment variable	Description	Required	Default value	BuildTime only
quarkus.automatiko.persistence.type		Specify what persistence should be used	No		Yes
quarkus.automatiko.persistence.db.remove-at-completion		Specifies if entities created during instance execution should be removed when instance completes	No	false	Yes

quarkus.automatiko.jobs.db.interval	QUARKUS_AUTOMATIKO_JOBS_DB_INTERVAL	Specifies interval (in minutes) how often look for another chunk of jobs to execute	No	60	No
quarkus.automatiko.jobs.db.threads	QUARKUS_AUTOMATIKO_JOBS_DB_THREADS	Specifies how many threads should be used for job execution	No	1	No

Table 2. Datasource specific configuration properties
Property name	Environment variable	Description	Required	BuildTime only
quarkus.datasource.db-kind		Specify what kind of data base should be used	Yes	Yes
quarkus.datasource.username	QUARKUS_DATASOURCE_USERNAME	Specify user name to be used to connect to data base	Yes	No
quarkus.datasource.password	QUARKUS_DATASOURCE_PASSWORD	Specify password to be used to connect to data base	Yes	No
quarkus.datasource.jdbc.url	QUARKUS_DATASOURCE_JDBC_URL	Specify url to be used to connect to data base	Yes	No

Full configuration reference for data source can be found here

DynamoDB based storage

DynamoDB based storage allows to rely on Amazon managed DynamoDB. It allows to offload most of the storage heavy lifting (such as replication, scalability etc) to managed persistence service.

Automatiko takes advantages of highly scalable nature of DynamoDB that nicely fits into workflow data model - being key value based.

Every workflow definition will be stored in dedicated table that is constructed based on workflow identifier and version. That will result in each workflow definition to have dedicated storage location. Automatiko by default creates all required tables so there is no need to dive into the details but if such need arise automatic table creation can be disabled.

The table structure will have following attributes

InstanceId - (of type string) this is the unique id of each workflow instance
Content - (of type binary) this is the actual workflow instance serialized
Tags - (of type list) this is the workflow instance tags - it also includes instance id and business key

In addition when table is created it assigns default values for read and write capacity set to 10, it can be changed via configuration properties

Configuration

To use DynamoDB based persistence your service must have following dependency

<dependency>
  <groupId>io.automatiko.addons</groupId>
  <artifactId>automatiko-dynamodb-persistence-addon</artifactId>
</dependency>

Note that each persistence addon comes with job service implementation as well to provide persistence of timers defined in workflows

Additional configuration should be placed in application.properties

Table 3. Automatiko specific configuration properties
Property name	Environment variable	Description	Required	Default value	BuildTime only
quarkus.automatiko.persistence.type		Specify what persistence should be used	No		Yes
quarkus.automatiko.persistence.dynamodb.create-tables	QUARKUS_AUTOMATIKO_PERSISTENCE_DYNAMODB_CREATE_TABLES	Specifies if tables should be automatically created	No	true	No
quarkus.automatiko.persistence.dynamodb.read-capacity	QUARKUS_AUTOMATIKO_PERSISTENCE_DYNAMODB_READ_CAPACITY	Specifies read capacity to be applied to created DynamoDB tables	No	10	No
quarkus.automatiko.persistence.dynamodb.write-capacity	QUARKUS_AUTOMATIKO_PERSISTENCE_DYNAMODB_WRITE_CAPACITY	Specifies write capacity to be applied to created DynamoDB tables	No	10	No

quarkus.automatiko.jobs.dynamodb.create-tables	QUARKUS_AUTOMATIKO_JOBS_DYNAMODB_CREATE_TABLES	Specifies if tables should be automatically created	No	true	No
quarkus.automatiko.jobs.dynamodb.read-capacity	QUARKUS_AUTOMATIKO_JOBS_DYNAMODB_READ_CAPACITY	Specifies read capacity to be applied to created DynamoDB tables	No	10	No
quarkus.automatiko.jobs.dynamodb.write-capacity	QUARKUS_AUTOMATIKO_JOBS_DYNAMODB_WRITE_CAPACITY	Specifies write capacity to be applied to created DynamoDB tables	No	10	No
quarkus.automatiko.jobs.dynamodb.interval	QUARKUS_AUTOMATIKO_JOBS_DYNAMODB_INTERVAL	Specifies interval (in minutes) how often look for another chunk of jobs to execute	No	60	No
quarkus.automatiko.jobs.dynamodb.threads	QUARKUS_AUTOMATIKO_JOBS_DYNAMODB_THREADS	Specifies how many threads should be used for job execution	No	1	No

Table 4. Dynamodb specific configuration properties
Property name	Environment variable	Description	Required	BuildTime only
quarkus.dynamodb.endpoint-override	QUARKUS_DYNAMODB_ENDPOINT_OVERRIDE	The endpoint URI with which the SDK should communicate	Yes	No
quarkus.dynamodb.aws.region	QUARKUS_DYNAMODB_AWS_REGION	An Amazon Web Services region that hosts the given service	Yes	No
quarkus.dynamodb.aws.credentials.type	QUARKUS_DYNAMODB_AWS_CREDENTIALS_TYPE	Configure the credentials provider that should be used to authenticate with AWS	Yes	No

Full configuration reference for dynamodb can be found here

Apache Cassandra based storage

Apache Casandra based storage allows to rely on advanced distributed data store for highly available use cases. It enables use of multiple data centers to replicate workflow instance data securely to make it available in all available locations.

The table structure will have following attributes

InstanceId - (of type text) this is the unique id of each workflow instance
Content - (of type blob) this is the actual workflow instance serialized
Tags - (of type set) this is the workflow instance tags - it also includes instance id and business key

In addition to tables, Automatiko will create key space as well with configured name. If name is not given then it defaults to automatiko.

Automatic keyspace creation is most basic setup as it uses simple replication strategy with replication factor set to 1. For more advanced (production like) use cases key space should be created manually and its name should be given via configuration.

Next to tables, each workflow definition will get an index to be able to query more efficiently.

Configuration

To use Apache Cassandra based persistence your service must have following dependency

<dependency>
  <groupId>io.automatiko.addons</groupId>
  <artifactId>automatiko-cassandra-persistence-addon</artifactId>
</dependency>

Note that each persistence addon comes with job service implementation as well to provide persistence of timers defined in workflows

Additional configuration should be placed in application.properties

Table 5. Automatiko specific configuration properties
Property name	Environment variable	Description	Required	Default value	BuildTime only
quarkus.automatiko.persistence.type		Specify what persistence should be used	No		Yes
quarkus.automatiko.persistence.cassandra.create-keyspace	QUARKUS_AUTOMATIKO_PERSISTENCE_CASSANDRA_CREATE_KEYSPACE	Specifies if keyspace should be automatically created	No	true	No
quarkus.automatiko.persistence.cassandra.create-tables	QUARKUS_AUTOMATIKO_PERSISTENCE_CASSANDRA_CREATE_TABLES	Specifies if tables should be automatically created	No	true	No
quarkus.automatiko.persistence.cassandra.keyspace	QUARKUS_AUTOMATIKO_PERSISTENCE_CASSANDRA_KEYSPACE	Specifies key space name to be used for tables	No	10	No

quarkus.automatiko.jobs.cassandra.create-keyspace	QUARKUS_AUTOMATIKO_JOBS_CASSANDRA_CREATE_KEYSPACE	Specifies if keyspace should be automatically created	No	true	No
quarkus.automatiko.jobs.cassandra.create-tables	QUARKUS_AUTOMATIKO_JOBS_CASSANDRA_CREATE_TABLES	Specifies if tables should be automatically created	No	true	No
quarkus.automatiko.jobs.cassandra.keyspace	QUARKUS_AUTOMATIKO_JOBS_CASSANDRA_KEYSPACE	Specifies key space name to be used for tables	No	10	No
quarkus.automatiko.jobs.cassandra.interval	QUARKUS_AUTOMATIKO_JOBS_CASSANDRA_INTERVAL	Specifies interval (in minutes) how often look for another chunk of jobs to execute	No	60	No
quarkus.automatiko.jobs.cassandra.threads	QUARKUS_AUTOMATIKO_JOBS_CASSANDRA_THREADS	Specifies how many threads should be used for job execution	No	1	No

Table 6. Datasource specific configuration properties
Property name	Environment variable	Description	Required	Default value	BuildTime only
quarkus.cassandra.contact-points	QUARKUS_CASSANDRA_CONTACT_POINTS	Specify url to the cassandra data base	Yes		No
quarkus.cassandra.local-datacenter	QUARKUS_CASSANDRA_LOCAL_DATACENTER	Specify data senter to be used on cassandra	Yes		No

Full configuration reference for Apache Cassandra can be found here