Reference Architecture

Baseline reference architecture for a Nuix REST Cluster

This reference architecture provides a recommended baseline infrastructure architecture to deploy a Nuix REST cluster. It is based on our architectural best practices and is intended as a guide for teams deploying this general-purpose infrastructure. The information here is the minimum recommended baseline for most Nuix REST clusters.

Nuix Core Engine REST Cluster

Nuix introduced clustering into the Core Engine REST service because of conversations with customers and partners. Customers and partners sought to manage work across multiple virtual or physical servers to optimize tasks, increase automation, and increase throughput. We designed the Nuix REST Cluster to provide:

  • Horizontal scalability.
  • Optimization of computing resources.
  • Reduction of bespoke code under maintenance.

REST Cluster Core Concepts

Producer Nodes

Producer nodes can respond to synchronous tasks, such as returning function status, and are responsible for placing asynchronous worker-based operations in the cluster task queue. Producer nodes do not perform asynchronous tasks or worker-based operations such as processing, optical character recognition (OCR), or export.

Consumer Nodes

Consumer nodes pick up and execute asynchronous or worker-based operations. Using a filter chain, these nodes poll the task queue and take on the tasks they are capable of processing. If all conditions pass the filters, then the consumer can claim the task. This results in a natural load-balancing effect.

Filter Chain

The Nuix REST cluster uses a filter chain to determine if a cluster task can be claimed by a consumer node.

Task Status Filter

Ensures the consumer node will only claim a task in the NOT_STARTED state.

Capacity Filter

Ensures the consumer node will only claim a task if the task executor on the node has capacity.

Node Tag Filter

It is possible to “steer” a task to a specific node in the cluster. If node steering is applied to a task, this filter ensures the consumer node will only claim the task if the task is intended for this node.

Worker Based Task Filter

Ensures the consumer node will only claim a task provided it has enough workers to complete the task.

Memory Filter

Ensures the consumer node will only claim a task provided its current memory usage is not beyond what is specified in the configuration.

Architecture Diagram

Configuration Best Practices

Cluster size and configuration will vary depending on the amount of data that needs to be processed and how fast you would like to process that data. Throughput will also vary depending on operating system.

For a 1 TB/h scenario, we benchmarked both Ubuntu and Windows clusters with engine version 9.6.0.122. The Ubuntu cluster achieved a sustained processing rate of 1086.66 GB/h or 6.037 GB/h per worker. The Windows cluster achieved a sustained processing rate of 1038.29 GB/h or 5.2439 GB/h per worker. Please see our Nuix REST Cluster Whitepaper for more information.

Nuix repeated the 1 TB/h scenario with engine version 9.10.11.720 for Ubuntu and RHEL. The Ubuntu cluster achieved a sustained processing rate of 1884.3906 GB/h or 10.4688 GB/h per worker. The RHEL cluster achieved a sustained processing rate of 1802.9138 GB/h or 10.0162 GB/h per worker.

Minimum Physical Hardware Specifications

Linux/Ubuntu/Windows
CPU 1 20
Memory (GiB) 2 256
Network (Gbps) 10
Root OS Disk (GB) 512
Temp Disk (GB) 3 1024
Total Disk Size (GB) 1536
Shared Disk Size (TB) 5
Disk Bandwidth (Mbps) 1000
Disk Speed (MB/s) 500

Configuration Guidelines

Linux/Ubuntu
500 GB/h
Windows
500 GB/h
Linux/Ubuntu
1 TB/h
Windows
1 TB/h
OCR
Producers 1 1 1 1 -
Consumers 5 6 10 11 1
Total Workers 90 108 180 198 8
Workers per Consumer 18 18 18 18 8
Memory per Worker (GB) 11 11 11 11 16
REST Heap Size (GB) 16 16 16 16 16
  1. CPUs should be greater than 2 + Workers Per Consumer

  2. Memory should be greater than: (2 * REST Heap Size) + (Memory Per Worker * Workers Per Consumer). After configuring worker memory, ideally there will be 25%-30% free memory on the host.

  3. As a general guideline, the temp disk should be 4 times the size of the largest container you intend to ingest and be a fast disk.

Application Configuration

Configurable properties for the Nuix RESTful Service application are located within the application.properties file. To access this file, navigate to the settings directory within the Nuix RESTful Service installation directory.

  • Windows default: C:\Program Files\Nuix\Nuix RESTful Service\settings\application.properties
  • Linux default: /opt/nuix-restful-service/settings/application.properties

The following Java Virtual Machine (JVM) settings are included in the default configuration of Nuix RESTful Service. To access these settings, locate the following file within the Nuix RESTful Service installation directory.

  • Windows default: C:\Program Files\Nuix\Nuix RESTful Service\Nuix-REST.vmoptions
  • Linux default: /opt/nuix-restful-service/Nuix-REST.vmoptions

Licensing Configuration

We recommend using Nuix Cloud License Server (CLS) when working with the Nuix platform.
In order to configure the Nuix REST cluster to use CLS the following properties must be set in the application.properties file.

Property Example Value
nuix.registry.servers cloud-server
nuix.license.source licence-api.nuix.com

Storage Configuration

We recommend 3 disks for a REST Cluster installation. The Root OS Disk should have a minimum of 512 GB, a local temp disk with fast read/write access and a minimum of 1024 GB for temporary data and worker-based operations, and a shared disk for shared data and case data with a minimum of 5 TB. Shared data and case data can be on SAN/NAS as needed and requires maximum throughput for optimal performance.

Temporary Directories

The following settings are recommended for the temp directory assuming that the temp disk was mounted to /opt/nuix/tmp/

  • /opt/nuix/tmp/
  • /opt/nuix/tmp/java_io/
  • /opt/nuix/tmp/shared/
  • /opt/nuix/tmp/bulk/
  • /opt/nuix/tmp/worker/

These temporary directories should be configured with the following JVM settings:

  • -Djava.tio.tmpdir=/opt/nuix/tmp/java_io/
  • -Dnuix.processing.sharedTempDirectory=/opt/nuix/tmp/shared/
  • -Dnuix.investigator.bulkProcessingTime=/opt/nuix/tmp/bulk/
  • -Dnuix.worker.tmpdir=/opt/nuix/tmp/worker/

Shared Directories

The following settings are recommended for the shared disk assuming that the disk was mounted to /opt/nuix/shared/.

  • /opt/nuix/shared/
  • /opt/nuix/shared/nuix-cases/
  • /opt/nuix/shared/nuix-exports/
  • /opt/nuix/shared/raw-data/
  • /opt/nuix/shared/thumbnails/
  • /opt/nuix/shared/user-scripts/
  • /opt/nuix/shared/user-data/

Update the following properties in the application.properties file.

Property Example Value
inventoryLocations /opt/nuix/shared/nuix-cases/
nuix.engine.userDataDirs /opt/nuix-restful-service/engine/user-data,/opt/nuix/share
exportsFolder /opt/nuix-shared/nuix-exports/
searchThumbnailsExportDirectory /opt/nuix/shared/thumbnails
userScriptsLocation /opt/nuix/shared/user-scripts

SSL Configuration

It is recommended that SSL be implemented with a valid certificate obtained from a trusted Certificate Authority (CA). To enable SSL, Nuix RESTful Service requires:

  • A PKCS12 (Recommended) or JKS keystore
  • A PEM or DER based certificate

The following properties should be updated in the application.properties file.

Property Example Value
server.port 443
server.ssl.key-store-password changeit
server.ssl.key-store /path/to/keystore/nuixrest.p12
server.ssl.key-store-type PKCS12
server.ssl.key-alias nuixrest
server.ssl.key-password changeit
server.ssl.enabled-protocols TLSv1.2

Network Configuration

The following table outlines ports that you may want to open depending on your environment.
At a minimum, you will want to open the REST application port, Hazelcast, and Derby Server ports.

Port Description
8080 REST application port default
80 REST application port (HTTP)
443 REST application port (HTTPS)
5601-5700 Hazelcast
1527-1597 Derby Server
9200, 9300 Elasticsearch
27443 Nuix Management Server
3389 RDP
5896 WinRM Service Port
22 SSH
2049 NFS Server

Derby Server Configuration

Prior to engine version 9.11.2.3117 it was not possible to use native Lucene/Derby cases.
Instead, REST required the use of Elasticsearch cases only. Lucene/Derby cases can now be used with Nuix RESTful Service Clustering. To ensure this feature is enabled the following properties must be set in the application.properties file.

Property Example Value
cluster.enabled true
cluster.rejectDerbyLuceneCases false