Reference Architecture
7 minute read
This reference architecture provides a recommended baseline infrastructure architecture to deploy a Nuix REST cluster. It is based on our architectural best practices and is intended as a guide for teams deploying this general-purpose infrastructure. The information here is the minimum recommended baseline for most Nuix REST clusters.
Nuix Core Engine REST Cluster
Nuix introduced clustering into the Core Engine REST service because of conversations with customers and partners. Customers and partners sought to manage work across multiple virtual or physical servers to optimize tasks, increase automation, and increase throughput. We designed the Nuix REST Cluster to provide:
- Horizontal scalability.
- Optimization of computing resources.
- Reduction of bespoke code under maintenance.
REST Cluster Core Concepts
Producer Nodes
Producer nodes can respond to synchronous tasks, such as returning function status, and are responsible for placing asynchronous worker-based operations in the cluster task queue. Producer nodes do not perform asynchronous tasks or worker-based operations such as processing, optical character recognition (OCR), or export.
Note
A synchronous task is one that completes immediately. Examples of synchronous tasks in Nuix are invoking an operation or retrieving the status of an operation.Note
An asynchronous task is one that does not complete immediately and should be monitored through execution, until complete. Examples include ingestion of any data into a Nuix case, adding items to item sets for deduplication, OCR, export, and most other operations.Important
We are currently in the process of evaluating configurations for high availability (HA) with producers. We currently recommend a single producer pending the completion of those tests.Consumer Nodes
Consumer nodes pick up and execute asynchronous or worker-based operations. Using a filter chain, these nodes poll the task queue and take on the tasks they are capable of processing. If all conditions pass the filters, then the consumer can claim the task. This results in a natural load-balancing effect.
Filter Chain
The Nuix REST cluster uses a filter chain to determine if a cluster task can be claimed by a consumer node.
Note
A filter chain is a sequence of checks to ensure that asynchronous operations are properly assigned to a consumer node and executed optimally.Task Status Filter
Ensures the consumer node will only claim a task in the NOT_STARTED
state.
Capacity Filter
Ensures the consumer node will only claim a task if the task executor on the node has capacity.
Node Tag Filter
It is possible to “steer” a task to a specific node in the cluster. If node steering is applied to a task, this filter ensures the consumer node will only claim the task if the task is intended for this node.
Worker Based Task Filter
Ensures the consumer node will only claim a task provided it has enough workers to complete the task.
Memory Filter
Ensures the consumer node will only claim a task provided its current memory usage is not beyond what is specified in the configuration.
Architecture Diagram
Configuration Best Practices
Cluster size and configuration will vary depending on the amount of data that needs to be processed and how fast you would like to process that data. Throughput will also vary depending on operating system.
For a 1 TB/h scenario, we benchmarked both Ubuntu and Windows clusters with engine version 9.6.0.122
. The Ubuntu cluster achieved
a sustained processing rate of 1086.66 GB/h
or 6.037 GB/h
per worker. The Windows cluster achieved a
sustained processing rate of 1038.29 GB/h
or 5.2439 GB/h
per worker. Please see our
Nuix REST Cluster Whitepaper
for more information.
Nuix repeated the 1 TB/h scenario with engine version 9.10.11.720
for Ubuntu and RHEL. The Ubuntu cluster achieved
a sustained processing rate of 1884.3906 GB/h
or 10.4688 GB/h
per worker. The RHEL cluster achieved a
sustained processing rate of 1802.9138 GB/h
or 10.0162 GB/h
per worker.
Minimum Physical Hardware Specifications
Linux/Ubuntu/Windows | |
---|---|
CPU 1 | 20 |
Memory (GiB) 2 | 256 |
Network (Gbps) | 10 |
Root OS Disk (GB) | 512 |
Temp Disk (GB) 3 | 1024 |
Total Disk Size (GB) | 1536 |
Shared Disk Size (TB) | 5 |
Disk Bandwidth (Mbps) | 1000 |
Disk Speed (MB/s) | 500 |
Configuration Guidelines
Linux/Ubuntu 500 GB/h |
Windows 500 GB/h |
Linux/Ubuntu 1 TB/h |
Windows 1 TB/h |
OCR | |
---|---|---|---|---|---|
Producers | 1 | 1 | 1 | 1 | - |
Consumers | 5 | 6 | 10 | 11 | 1 |
Total Workers | 90 | 108 | 180 | 198 | 8 |
Workers per Consumer | 18 | 18 | 18 | 18 | 8 |
Memory per Worker (GB) | 11 | 11 | 11 | 11 | 16 |
REST Heap Size (GB) | 16 | 16 | 16 | 16 | 16 |
-
CPUs should be greater than
2 + Workers Per Consumer
-
Memory should be greater than:
(2 * REST Heap Size) + (Memory Per Worker * Workers Per Consumer)
. After configuring worker memory, ideally there will be 25%-30% free memory on the host. -
As a general guideline, the temp disk should be
4 times the size of the largest container
you intend to ingest and be a fast disk.
Note
Any linear increase in consumers/workers will require a corresponding network/disk bandwidth addition.Application Configuration
Configurable properties for the Nuix RESTful Service application are located within the application.properties file. To access this file, navigate to the settings directory within the Nuix RESTful Service installation directory.
- Windows default: C:\Program Files\Nuix\Nuix RESTful Service\settings\application.properties
- Linux default: /opt/nuix-restful-service/settings/application.properties
The following Java Virtual Machine (JVM) settings are included in the default configuration of Nuix RESTful Service. To access these settings, locate the following file within the Nuix RESTful Service installation directory.
- Windows default: C:\Program Files\Nuix\Nuix RESTful Service\Nuix-REST.vmoptions
- Linux default: /opt/nuix-restful-service/Nuix-REST.vmoptions
Licensing Configuration
We recommend using Nuix Cloud License Server (CLS) when working with the Nuix platform.
In order to configure the Nuix REST cluster to use CLS the following properties must be set in the
application.properties
file.
Property | Example Value |
---|---|
nuix.registry.servers | cloud-server |
nuix.license.source | licence-api.nuix.com |
Storage Configuration
We recommend 3 disks for a REST Cluster installation. The Root OS Disk should have a minimum of 512 GB, a local temp disk with fast read/write access and a minimum of 1024 GB for temporary data and worker-based operations, and a shared disk for shared data and case data with a minimum of 5 TB. Shared data and case data can be on SAN/NAS as needed and requires maximum throughput for optimal performance.
Temporary Directories
The following settings are recommended for the temp directory assuming that the temp disk was
mounted to /opt/nuix/tmp/
- /opt/nuix/tmp/
- /opt/nuix/tmp/java_io/
- /opt/nuix/tmp/shared/
- /opt/nuix/tmp/bulk/
- /opt/nuix/tmp/worker/
These temporary directories should be configured with the following JVM settings:
- -Djava.tio.tmpdir=/opt/nuix/tmp/java_io/
- -Dnuix.processing.sharedTempDirectory=/opt/nuix/tmp/shared/
- -Dnuix.investigator.bulkProcessingTime=/opt/nuix/tmp/bulk/
- -Dnuix.worker.tmpdir=/opt/nuix/tmp/worker/
Shared Directories
The following settings are recommended for the shared disk assuming that the disk was mounted to /opt/nuix/shared/.
- /opt/nuix/shared/
- /opt/nuix/shared/nuix-cases/
- /opt/nuix/shared/nuix-exports/
- /opt/nuix/shared/raw-data/
- /opt/nuix/shared/thumbnails/
- /opt/nuix/shared/user-scripts/
- /opt/nuix/shared/user-data/
Update the following properties in the application.properties file.
Property | Example Value |
---|---|
inventoryLocations | /opt/nuix/shared/nuix-cases/ |
nuix.engine.userDataDirs | /opt/nuix-restful-service/engine/user-data,/opt/nuix/share |
exportsFolder | /opt/nuix-shared/nuix-exports/ |
searchThumbnailsExportDirectory | /opt/nuix/shared/thumbnails |
userScriptsLocation | /opt/nuix/shared/user-scripts |
SSL Configuration
It is recommended that SSL be implemented with a valid certificate obtained from a trusted Certificate Authority (CA). To enable SSL, Nuix RESTful Service requires:
- A PKCS12 (Recommended) or JKS keystore
- A PEM or DER based certificate
The following properties should be updated in the application.properties
file.
Property | Example Value |
---|---|
server.port | 443 |
server.ssl.key-store-password | changeit |
server.ssl.key-store | /path/to/keystore/nuixrest.p12 |
server.ssl.key-store-type | PKCS12 |
server.ssl.key-alias | nuixrest |
server.ssl.key-password | changeit |
server.ssl.enabled-protocols | TLSv1.2 |
Network Configuration
The following table outlines ports that you may want to open depending on your environment.
At a minimum, you will want to open the REST application port, Hazelcast, and Derby Server ports.
Port | Description |
---|---|
8080 | REST application port default |
80 | REST application port (HTTP) |
443 | REST application port (HTTPS) |
5601-5700 | Hazelcast |
1527-1597 | Derby Server |
9200, 9300 | Elasticsearch |
27443 | Nuix Management Server |
3389 | RDP |
5896 | WinRM Service Port |
22 | SSH |
2049 | NFS Server |
Derby Server Configuration
Prior to engine version 9.11.2.3117 it was not possible to use native Lucene/Derby cases.
Instead, REST required the use of Elasticsearch cases only. Lucene/Derby cases can now be used with Nuix
RESTful Service Clustering. To ensure this feature is enabled the following properties must be set
in the application.properties
file.
Property | Example Value |
---|---|
cluster.enabled | true |
cluster.rejectDerbyLuceneCases | false |
Feedback
Was this page helpful?
Thank you for your feedback.
Thank you for your feedback.