“Private Cloud” Operating Model

The following requirements apply to operating the Fabasphere in the “Private Cloud” operating model.

Infrastructure

The following infrastructure is required.

Kubernetes Cluster

Red Hat OpenShift (at least version 4.15) or
k3s (at least version 1.31.0)

Recommendation: At least six servers with at least 768 GB RAM and 32 CPU cores should be used to operate the pods.

Data Storage via NFS File Share

3 x NFS file shares (version NFSv3 or NFSv4.1)

Database

PostgreSQL (version 17.6)

Container Registry

Container registry (e.g., Habor or JFrog Artifactory) for synchronizing the Fabasphere images from registry.fabasoft.com

Operation

The following requirements apply for operation.

Required Services

Load balancer (recommendation: nginx)
OpenLDAP (at least version 2.6.10)

Note: The required services are not part of the Fabasphere deployment.

Optional Services

KEDA Operator (optional)
Istio (optional)
Kubernetes cluster logging stack
Kubernetes cluster monitoring stack

Note: The optional services are not part of the Fabasphere deployment.

Configuration Management/Deployment

Git (e.g., GitLab, Gitea)
Deployment tool (e.g., Argo CD)
Alternatively with Helm (version 3)

External Cluster Access (TCP)

TCP/IP addresses must be provided for services with the service type “LoadBalancer” (e.g., MetalLB).

Mindbreeze AI

Mindbreeze AI is operated on the same Kubernetes cluster. The required language model must be obtained directly, for example, from Hugging Face. Mindbreeze AI requires a “Persistent Volume Claim” to store the data necessary for AI use cases.

Recommendations:

To improve performance, it is recommended running Mindbreeze AI pods on servers with GPUs (Nvidia H100).
For the operation of large language models (LLM), it is recommended providing own servers with graphics cards (GPU) in the Kubernetes cluster.
For fail-safe operation, it is recommended running two servers per LLM.
One Nvidia H100 graphics card per server is recommended, each fully dedicated to the LLM.
The LLM should have at least 7b parameters.
For more general usability, the LLM should be multilingual or at least offer good support for the languages used.
Depending on the use case, the LLM should provide at least 8 to 10 tokens per second per user.
The LLM used should be instruction-tuned or chat-tuned.

“Private Cloud” Operating Model

Infrastructure

Operation

Mindbreeze AI

Download PDF

Download PDF