You are viewing the documentation for Pilosa v1.1. View the latest documentation for Pilosa v1.1.0.

Glossary

Anti-entropy: A periodic process that compares each shard and its replicas across the cluster to repair inconsistencies.

Attribute: Attributes can be associated to both rows and columns. This metadata is kept separately from the core binary matrix in a BoltDB store.

Bit: Bits are the fundamental unit of data in Pilosa. A bit lives in a field, at the intersection of a row and column.

Bitmap: The on-disk and in-memory representation of a row. Implemented with Roaring.

BSI Bit-sliced indexing is the method Pilosa uses to represent multi-bit integers. Integer values are stored in int fields, and can be used for Range, Min, Max, and Sum queries.

Cluster: A cluster consists of one or more nodes which share a cluster configuration. The cluster also defines how data is replicated and how internode communication is coordinated. Pilosa does not have a leader node, all data is evenly distributed, and any node can respond to queries.

Column: Columns are the fundamental horizontal data axis within Pilosa. Columns are global to all fields within an index.

Fragment: A Fragment is the intersection of a field and a shard in an index.

Field: Fields are used to group rows into different categories. Row IDs are namespaced by field such that the same row ID in a different field refers to a different row. For ranked fields, rows are kept in sorted order within the field. Fields are one of four types: set, int, time, and mutex. For more information, see data model and Creating fields.

Frame: Prior to Pilosa 1.0, fields were known as frames.

Gossip: A protocol used by Pilosa for internal communication.

Index: An Index is a top level container in Pilosa, analogous to a database in an RDBMS. Queries cannot operate across multiple indexes.

Jump Consistent Hash: A fast, minimal memory, consistent hash algorithm that evenly distributes the workload even when the number of buckets changes.

Max: A PQL query that returns the maximum integer value stored in an integer field.

MaxShard: The total number of shards allocated to handle the current set of columns. This value is important for all nodes to efficiently distribute queries. MaxShard is zero-indexed, so if an index contains six shards, its MaxShard will be 5.

Min: A PQL query that returns the minimum integer value stored in an integer field.

Node: An individual running instance of Pilosa server which belongs to a cluster.

Partition: The consistent hash maps keys to partitions (or locations on the unit circle), based on a preset maximum number of partitions. Partitions are then evenly mapped to physical nodes. To add nodes to the cluster, the partitions must be remapped, and data is then associated across the new cluster topology. DefaultPartitionN is 256. It can be modified, but only at compile time, and before ingesting any data.

PQL: Pilosa Query Language.

Protobuf: Protocol Buffers is a binary serialization format which Pilosa uses for internal messages, and can be used by clients as an alternative to JSON.

Range:: A PQL query that returns bits based on comparison to timestamps, set according to the time quantum.

Range (BSI):: A PQL query that returns bits based on comparison to integers stored in BSI fields.

Replica: A copy of a fragment on a different node than the original. The cluster.replicas configuration parameter determines how many replicas of a fragment exist in the cluster. This includes the original, so a value of 1 means no extra copies are made.

Roaring Bitmap: the compressed bitmap format which Pilosa uses to implement bitmaps, for both storage and logical query operations.

Row: Rows are the fundamental vertical data axis within Pilosa. They are namespaced to each field within an index. Represented as a Bitmap.

Slice: Prior to Pilosa 1.0, shards were known as slices.

Shard: Columns are sharded on a preset width. Shards are operated on in parallel and are evenly distributed across the cluster via a consistent hash.

ShardWidth: This is the number of columns in a shard. ShardWidth defaults to 220 or about one million. It can be modified, but only at compile time, and before ingesting any data.

Sum: A PQL query that returns the sum of integers stored in an integer field.

Time quantum: Defines the granularity to be used for Range queries on time fields.

TOML: the language used for Pilosa’s configuration file.

TopN: A PQL query that returns a list of rows, sorted by the count of columns set in the row, within a specified field.

View: Views separate the different data layouts within a Field. The primary view is standard, which represents the typical row/column data. Time based field views are automatically generated for each time quantum. Views are internally managed by Pilosa, and never exposed directly via the API. This simplifies the functional interface by separating it from the physical data representation.


View markdown source on Github. Last updated 9 weeks ago.