What is Aerospike:

Aerospike is a distributed NoSQL database supporting the key-value store and document-oriented data models. — Providing robustness and strong consistency with no downtime. Aerospike works on a “Shared Nothing” Architecture.

  1. Speed:- Low Latency is maintained at a high scale (which makes a better decision in real-time).
  2. Ease of Deployment and Management.
  3. Low Total Cost Of Ownership:- Fueled by a hybrid memory architecture and compression, Aerospike provides significantly lower (~20%) TCO than first-generation No SQL and relational databases.

What Is “Shared Nothing” Architecture?

  1. Eliminates Single Point of Failure: With shared systems, a single point of failure can take down your site or app entirely.
  2. Avoids Unexpected Downtime: This allows for some amount of self-healing that can be another line of defense against unexpected downtime.

Keywords

  1. Sets: Set is more similar to a collection in MongoDB, or a table in RDBMS. It contains many records and bins.
  2. Records: Records are more similar to rows in RDBMS. One record has one PK (key) and has one or many bins. And in one set/collection, it may have many records.
  3. Bins: Bins in Aerospike is more similar to a column in RDBMS. We can add the index to any bin. The difference is, it’s more flexible and dynamic. It can have a lot of bins in one record. And for a single bin, it’s can store any data type ( Int, String, Byte, etc). It’s more like the column but more flexible.

Technology Behind Aerospike Database

  1. Responsible for Reading and Writing data upon request while providing consistency and isolation (which involves synchronous and asynchronous replication).
  2. Requests to an alternate node if a node becomes unavailable as well as conflict/duplicate resolution after node rejoins the cluster.
  3. Multiple Core System — Improves latency by reducing data across multiple regions.
  4. Context Switch.
  5. Data Structure Design — Safe and concurrent read, write and delete access to index tree without holding multiple locks.
  6. Scheduling and Prioritization — In addition to key-value store operations, Aerospike supports batch queries, scans, and secondary index queries.
  7. Memory Allocation.
  1. The performance of database operations is predictable.
  1. Remote cluster management
  2. Data shipping
  3. Pipe-lining
  1. Aerospike has been designed from the ground up to leverage SSD technology. This allows Aerospike to manage dozens of terabytes of data on a single machine with sub-millisecond record access times. Aerospike supports three kinds of storage structures: Hybrid-Memory, All-Flash, and In-Memory.
  1. Clustering subsystem.
  2. Exchange Subsystem.

Aerospike Client

Aerospike provides client libraries and we use these libraries to connect to the cluster and perform operations.

How aerospike distributes data randomly?

  1. 12 bits of this hash are used as the partition id.
  2. 4096 partitions per namespace. Each namespace has its own partition map.
  3. This hash and some additional data are stored as Primary Index in RAM.
  4. The master and replica for a partition are decided when a cluster is formed.

What happens when a node fails?

In an example four-node cluster, if node #3 has a hardware failure, nodes #1, #2, and #4 automatically detect the failure. Node #3 is the master for 1/4th of the data, but those partitions also exist as replicas on nodes #1, #2, and #4. These nodes automatically perform data migration to copy the replica partitions and create data masters. For example, partition 23 is replicated on node #4 and copied to node #2, which becomes the new master for partition 23. At the same time, your application (which includes the Aerospike Smart Client) becomes aware of the node #3 failure and automatically calculates the new partition map. This process occurs in reverse when a node is added to the cluster.

Key Decisions

  1. Persistence supported are- in Memory and Hybrid (memory + Persistence). Aerospike recommends SSD for persistent storage of data.
  2. Each namespace (Schema) separately configured to support different persistence types.
  3. Replication factor: depends on copies of data. Suppose replication factor: 2 means storing two copies of the data- master, and replica. Replication factor decides on nodes in cluster and data need — minimum should be 1 and maximum should be a number of nodes in the cluster.
namespace <namespace-name> {# memory-size 4G           # 4GB of memory to be used for index and data# replication-factor 2     # For multiple nodes, keep 2 copies of the data# high-water-memory-pct 60    # Evict non-zero TTL data if capacity exceeds 60% of 4GB# stop-writes-pct 90       # Stop writes if capacity exceeds 90% of 4GB# default-ttl 0            # Writes from client that do not provide a TTL  will default to 0 or never expire# storage-engine memory    # Store data in memory only}
namespace <namespace-name> {memory-size <SIZE>G             # Maximum memory allocation for secondary indexes (if any).storage-engine device {           # Configure the storage-engine to use  # persistence. Maximum size is 2 TiB.  file /opt/aerospike/<filename>  # Location of data file on server.  filesize <SIZE>G                       # Max size of each file in   GiB.  data-in-memory true             # Indicates that all data should   also be  in memory.  }}

Strongly Consistent (SC) and Available (AP) Modes:

AP: The AP mode has the “eventual consistency” guarantee that a typical NoSQL database provides.

When to Use Aerospike vs. Redis:

Need for scalability and elasticity

Some challenges in Redis:

  1. One Master, multiple Slaves — i.e. the ‘write’ throughput is limited by the one machine on which the master is running on.
  2. Redis is single-threaded, which means there is no vertical scalability in terms of CPU.
  3. Real-time master-slave synchronization issues — with the huge amount of writes on the master, all the changes had to be synchronized with the slaves. This can lead to slaves having to be taken offline for synchronization because of the inability to sync huge chunks of data and serve data to incoming requests from the RTB application at the same time.
  4. There is no handy way of storing multiple different types of data in the same database — we had to store different entities in different Redis instances, having to deal with multiple connections on different ports.

How Aerospike helps:

  1. Partitioning — it has 4096 partitions by default, which are spread across your nodes in the cluster. This helps us with the ‘write’ throughput.
  2. Aerospike is multithreaded — makes for the most effective usage of resources.
  3. No downtime for master-replica synchronizations — you can configure the ‘write’ policy so that the write-request is considered ‘finished’ after the replica creation confirmation.
  4. Namespaces — all different types of data can be stored in the same cluster under different namespaces, leading to the following hierarchy: namespace > set > record.
  5. SSD or in-memory storage — Aerospike has two modes: SSD versus in-memory. Redis is in-memory only, which means it becomes very costly at scale, whereas Aerospike can offer competitive performance with the use of SSDs.

Awesome developer. New to writing. Always up for a workout :)