Handling files
One of common requirements when working with workflows is the ability to handle files or documents. These can be of various types such as
-
documents (word, excel, pdf, etc)
-
images
-
text files
Regardless of the type of file there will be a need to store it as part of execution of a workflow instance.
Automatiko comes with following types of file that abstract the concept of binary data
-
Content based file - a file that contains its data
-
Location based file - a file that reference data using a location usually represented as URL
These two file types are provided as following implementation classes that should be used within workflow definition as types of data objects (aka variables).
-
Content based file -
io.automatiko.engine.workflow.file.ByteArrayFile
-
Location based file -
io.automatiko.engine.workflow.file.UrlFile
These types are properly configured so they can be used as API types and these will only expose relevant information. |
By default, file types are treated exactly the same way as other type and thus will be stored together with other data objects of the workflow instance. That might cause certain issues depending on type of persistence store used and the size of the files being managed.
To address this, there are addons that bring in various back ends that can store files in reliable and secure way. Currently there are following addons
-
File System
-
MongoDB based on GridFS
Depending on the selected file addon there will be additional configuration needed. Though regardless of the type used they work
in the same way - intercept use of Content based files (ByteArrayFile
) and replace it with its dedicated flavor
that extends the ByteArrayFile
.
Most important role of the addon is to offload the workflow from the content of the file and instead provide link to
get the content on demand. At the same time, call to the content
method of the ByteArrayFile
will automatically load the
content so when content is actually needed within the workflow instance execution it can be easily retrieved.
An example of that is when such file needs to be sent as email attachment where the content is mandatory.
All files that are content based (with ByteArrayFile
) will be intercepted and replaced with flavor of that type provided by addon.
At the time it is stored, content is removed and stored externally and an url to fetch this file will be set.
So in a case where service receives a content based file through the service interface in following format
{
"file": {
"content": "Y29udGVudA==",
"name": "test.txt"
}
}
it will be replaced with addon specific file that will look like
{
"file": {
"name": "test.txt",
"attributes": null,
"url": "http://localhost:8080/management/files/download/files/1.0/5184aa18-0b93-4ba0-abba-636d95ce91e0/file/test.txt"
}
As can be seen, content of the file is no longer shipped but the url is set with location where the content can be easily fetched.
The host and port of the url is set based on quarkus.automatiko.service-url property.
|
File System AddOn
File System AddOn allows to put file content on local file system. The data will not be persisted with workflow instance itself but externalized. In addition to that a dedicated endpoint to download the file is also provided.
To enable this addon add following dependency to the project
<dependency>
<groupId>io.automatiko.addons</groupId>
<artifactId>automatiko-files-filesystem-addon</artifactId>
</dependency>
In addition to that, add following property to application.properties
quarkus.automatiko.files.fs.location=ABSOLUTE_PATH
Replace ABSOLUTE_PATH with the actual path where files should be stored.
|
Amazon S3 AddOn
Amazon S3 AddOn allows to put file content in to Amazon S3 bucket. The data will not be persisted with workflow instance itself but externalized. In addition to that a dedicated endpoint to download the file is also provided.
To enable this addon add following dependency to the project
<dependency>
<groupId>io.automatiko.addons</groupId>
<artifactId>automatiko-files-s3-addon</artifactId>
</dependency>
In addition to that, add following property to application.properties
quarkus.automatiko.files.s3.bucket=BUCKET_NAME
quarkus.s3.aws.region=YOUR_REGION
quarkus.s3.aws.credentials.type=default
Replace BUCKET_NAME with the actual name of the bucket where files should be stored and
YOUR_REGION with the region where your S3 bucket was created
|
quarkus.s3.aws.credentials.type
- use the default
credentials provider chain that looks for credentials in this order:
-
Java System Properties -
aws.accessKeyId
andaws.secretAccessKey
-
Environment Variables -
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
-
Credential profiles file at the default location (
~/.aws/credentials
) shared by all AWS SDKs and the AWS CLI -
Credentials delivered through the Amazon ECS if the
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
environment variable is set and the security manager has permission to access the variable, -
Instance profile credentials delivered through the Amazon EC2 metadata service
Google Storage AddOn
Google Storage AddOn allows to put file content in to Google Storage bucket. The data will not be persisted with workflow instance itself but externalized. In addition to that a dedicated endpoint to download the file is also provided.
To enable this addon add following dependency to the project
<dependency>
<groupId>io.automatiko.addons</groupId>
<artifactId>automatiko-files-google-storage-addon</artifactId>
</dependency>
In addition to that, add following property to application.properties
quarkus.automatiko.files.google-storage.bucket=BUCKET_NAME
quarkus.google.cloud.project-id=PROJECT_ID
quarkus.google.cloud.service-account-location=/path/service-account-key.json
Replace BUCKET_NAME with the actual name of the bucket where files should be stored and
PROJECT_ID with the Google Cloud project to be used. Lastly, point to service account key to
authorize access to the Google Cloud Storage service
|
MongoDB (GridFS) AddOn
MongoDB (based on GridFS) AddOn allows to put file content into MongoDB instance. The data will not be persisted with workflow instance itself but externalized. In addition to that a dedicated endpoint to download the file is also provided.
See more information about MongoDB GridFS |
To enable this addon add following dependency to the project
<dependency>
<groupId>io.automatiko.addons</groupId>
<artifactId>automatiko-files-mongodb-addon</artifactId>
</dependency>
In addition to that, following properties can be set in application.properties
quarkus.automatiko.files.mongodb.database=NAME_OF_DB
quarkus.automatiko.files.mongodb.chunk-size=12345
Replace NAME_OF_DB with the name of database where files should be stored, if not given it defaults to automatiko . Chunk size
can also be given (in bytes) to control the size of the chunks stored in Mongo via GridFS.
|
Using files as part of data objects (POJOs)
Files are usually used as type of data objects but sometimes there is a need to have files embedded into other types. Like an email message can consist of both body and attachments and this requires to have files included in the object representing these properties.
To make this happen such object needs to implement io.automatiko.engine.api.workflow.files.HasFiles<T>
interface.
This interface provides access to files managed by the instance and allows to accept the augmented versions of the files
that are changed by the storage mechanism used.
io.automatiko.engine.api.workflow.files.HasFiles<T>
uses a parametrized type which can be one of the following
-
single file represented by
io.automatiko.engine.api.workflow.files.File<T>
-
collection of files represented by
Collection<io.automatiko.engine.api.workflow.files.File<T>>