You are viewing the documentation for Pilosa v0.10. View the latest documentation for Pilosa v1.0.2.

Glossary

Anti-entropy: A periodic process that compares each slice and its replicas across the cluster to repair inconsistencies.

Attribute: Attributes can be associated to both rows and columns. This metadata is kept separately from the core binary matrix in a BoltDB store.

Bit: Bits are the fundamental unit of data in Pilosa. A bit lives in a frame, at the intersection of a row and column.

Bitmap: The on-disk and in-memory representation of a row. Implemented with Roaring. Bitmap is also the basic PQL query for reading a Bitmap.

BSI Bit-sliced indexing is the method Pilosa uses to represent multi-bit integers. Integer values are stored in fields, and can be used for Range, Min, Max, and Sum queries.

Cluster: A cluster consists of one or more nodes which share a cluster configuration. The cluster also defines how data is replicated throughout and how internode communication is coordinated. Pilosa does not have a leader node, all data is evenly distributed, and any node can respond to queries.

Column: Columns are the fundamental horizontal data axis within Pilosa. Columns are global to all frames within an index.

Field: A group of rows used to store integer values with BSI, for use in Range and Sum queries.

Fragment: A Fragment is the intersection of a frame and a slice in an index.

Frame: Frames are used to group rows into different categories. Row IDs are namespaced by frame such that the same row ID in a different frame refers to a different row. For ranked frames, rows are kept in sorted order within the frame.

Gossip: A protocol used by Pilosa for internal communication.

Index: An Index is a top level container in Pilosa, analogous to a database in an RDBMS. Queries cannot operate across multiple indexes.

Jump Consistent Hash: A fast, minimal memory, consistent hash algorithm that evenly distributes the workload even when the number of buckets changes.

Max: A PQL query that returns the maximum integer value stored in BSI fields.

MaxSlice: The total number of slices allocated to handle the current set of columns. This value is important for all nodes to efficiently distribute queries.

Min: A PQL query that returns the minimum integer value stored in BSI fields.

Node: An individual running instance of Pilosa server which belongs to a cluster.

Partition: The consistent hash maps keys to partitions (or locations on the unit circle), based on a preset maximum number of partitions. Partitions are then evenly mapped to physical nodes. To add nodes to the cluster, the partitions must be remapped, and data is then associated across the new cluster topology. DefaultPartitionN is 256. It can be modified, but only at compile time, and before ingesting any data.

PQL: Pilosa Query Language.

Protobuf: Protocol Buffers is a binary serialization format which Pilosa uses for internal messages, and can be used by clients as an alternative to JSON.

Range:: A PQL query that returns bits based on comparison to timestamps, set according to the time quantum.

Range (BSI):: A PQL query that returns bits based on comparison to integers stored in BSI fields.

Replica: A copy of a fragment on a different node than the original. The cluster.replicas configuration parameter determines how many replicas of a fragment exist in the cluster. This includes the original, so a value of 1 means no extra copies are made.

Roaring Bitmap: the compressed bitmap format which Pilosa uses to implement bitmaps, for both storage and logical query operations.

Row: Rows are the fundamental vertical data axis within Pilosa. They are namespaced to each frame within an index. Represented as a Bitmap.

Slice: Columns are sharded on a preset width. Each shard is referred to as a slice in Pilosa. Slices are operated on in parallel and are evenly distributed across the cluster via a consistent hash.

SliceWidth: This is the number of columns in a slice. SliceWidth defaults to 220 or about one million. It can be modified, but only at compile time, and before ingesting any data.

Sum: A PQL query that returns the sum of integers stored in BSI fields.

Tanimoto: Used for similarity queries on Pilosa data. The Tanimoto Coefficient between two Bitmaps A and B is the ratio of the size of their intersection to the size of their union (|A∩B|/|A∪B|).

Time quantum: Defines the granularity to be used for time Range queries.

TOML: the language used for Pilosa’s configuration file.

TopN: A PQL query that returns a list of row IDs, sorted by the count of bits set in the row, within a specified frame.

View: Views separate the different data layouts within a Frame. The primary view is standard, which represents the typical row/column data. Time based frame views are automatically generated for each time quantum. Views are internally managed by Pilosa, and never exposed directly via the API. This simplifies the functional interface by separating it from the physical data representation.


View markdown source on Github. Last updated 3 months ago.