You are viewing the documentation for Pilosa master. View the latest documentation for Pilosa v0.8.8 .

Tutorials

Setting Up a Secure Cluster

Introduction

Pilosa supports encrypting the communication between and to nodes in a cluster using TLS. In this tutorial, we will be setting up a three node Pilosa cluster running on the same computer. The same steps can be used for a multi-computer cluster but that requires setting up firewalls and other platform-specific configuration which is out of the scope of this tutorial.

This tutorial assumes that you are using a UNIX-like system, such as Linux or MacOS. Windows Subsystem for Linux (WSL) works equally well on Windows 10 systems.

Installing Pilosa and Creating the Directory Structure

If you haven’t already done so, install Pilosa server on your computer. For Linux and WSL (Windows Subsystem for Linux) use the Installing on Linux instructions. For MacOS use the Installing on MacOS. We do not support precompiled releases for other platforms, but you can always compile it yourself from source. See Build from Source.

After installing Pilosa, you may have to add it to your $PATH. Check that you can run Pilosa from the command line:

pilosa --help

Let’s create a directory for the tutorial to put all of our files and switch to that directory:

mkdir $HOME/pilosa-tls-tutorial && cd $_

Creating the TLS Certificate and Gossip Key

Securing a Pilosa cluster consists of securing the communication between nodes using TLS and Gossip encryption. Pilosa Enterprise additionally supports authentication and other security features, but those are not covered in this tutorial.

The first step is acquiring an SSL certificate. You can buy a commercial certificate or retrieve a Let’s Encrypt certificiate but we will be using a self signed certificate for practical reasons. Using self-signed certificates is not recommended in production, since it makes man in the middle attacks easy.

The following command creates a 2048bit self-signed wildcard certificate for *.pilosa.local which expires 10 years later.

openssl req -x509 -newkey rsa:2048 -keyout pilosa.local.key -out pilosa.local.crt -days 3650 -nodes -subj "/C=US/ST=Texas/L=Austin/O=Pilosa/OU=Com/CN=*.pilosa.local"

The command above creates two files in the current directory: * pilosa.local.crt is the SSL certificate. * pilosa.local.key is the private key file which must be kept as secret.

Having created the SSL certificate, we can now create the gossip encryption key. Gossip encryption key file must be exactly 16, 24, or 32 bytes to select one of AES-128, AES-192, or AES-256 encryption. Reading random bytes from cryptographically secure /dev/random serves our purpose very well:

head -c 32 /dev/random > pilosa.local.gossip32

We now should have pilosa.local.gossip32 in the current directory with 32 random bytes.

Creating the Configuration Files

Pilosa supports passing configuration items using the command line, environment variables or a configuration file. We will use the last option in this tutorial and create three configuration files for our three nodes.

Create node1.config.toml in the project directory and paste the following in it:

# node1.config.toml

data-dir = "node1_data"
bind = "https://01.pilosa.local:10501"

[cluster]
hosts = ["https://01.pilosa.local:10501", "https://02.pilosa.local:10502", "https://03.pilosa.local:10503"]

[tls]
certificate = "pilosa.local.crt"
key = "pilosa.local.key"
skip-verify = true

[gossip]
seed = "01.pilosa.local:15000"
port = 15000
key = "pilosa.local.gossip32"

Create node2.config.toml in the project directory and paste the following in it:

# node2.config.toml

data-dir = "node2_data"
bind = "https://02.pilosa.local:10502"

[cluster]
hosts = ["https://01.pilosa.local:10501", "https://02.pilosa.local:10502", "https://03.pilosa.local:10503"]

[tls]
certificate = "pilosa.local.crt"
key = "pilosa.local.key"
skip-verify = true

[gossip]
seed = "01.pilosa.local:15000"
port = 16000
key = "pilosa.local.gossip32"

Create node3.config.toml in the project directory and paste the following in it:

# node3.config.toml

data-dir = "node3_data"
bind = "https://03.pilosa.local:10503"

[cluster]
hosts = ["https://01.pilosa.local:10501", "https://02.pilosa.local:10502", "https://03.pilosa.local:10503"]

[tls]
certificate = "pilosa.local.crt"
key = "pilosa.local.key"
skip-verify = true

[gossip]
seed = "01.pilosa.local:15000"
port = 17000
key = "pilosa.local.gossip32"

Here is some explanation of the configuration items: * data-dir points to the directory where the Pilosa server writes its data. If it doesn’t exist, the server will create it. * bind is the address to which the server listens for incoming requests. The address is composed of three parts: scheme, host, and port. The default scheme is http so we explicitly specify https to use the HTTPS protocol for communication between nodes. * [cluster] section contains the settings for a cluster. hosts field is the most important, which contains the list of addresses of other nodes. See Cluster Configuration for other settings. * [tls] section contains the TLS settings, including the path to the SSL certificate and the corresponding key. Set skip-verify to true in order to disable host name verification and other security measures. Do not set skip-verify to true on production servers. * [gossip] section contains settings for the Gossip protocol. seed is the host and port for the main gossip node which coordinates other nodes. The port setting is the gossip listen address for the node. It should be different for each node, if the cluster is running on the same computer, otherwise you can set it to the same value. Finally, the key points to the gossip encryption key we created before.

Final Touches Before Running the Cluster

Before running the cluster, let’s make sure that 01.pilosa.local, 02.pilosa.local and 03.pilosa.local resolve to an IP address. If you are running the cluster on your computer, it is adequate to add them to your /etc/hosts. Below is one of the many ways of doing that (mind the >>):

sudo sh -c 'printf "\n127.0.0.1 01.pilosa.local 02.pilosa.local 03.pilosa.local\n" >> /etc/hosts'

Ensure we can access the hosts in the cluster:

ping -c 1 01.pilosa.local
ping -c 1 02.pilosa.local
ping -c 1 03.pilosa.local

If any of the commands above return ping: unknown host, make sure your /etc/hosts contains the failed hostname.

Running the Cluster

Let’s open three terminal windows and run each node in its window. This will enable us to better observe what’s happening on which node.

Switch to the first terminal window, change to the project directory and start the first node:

cd $HOME/pilosa-tls-tutorial
pilosa server -c node1.config.toml

Switch to the second terminal window, change to the project directory and start the second node:

cd $HOME/pilosa-tls-tutorial
pilosa server -c node2.config.toml

Switch to the third terminal window, change to the project directory and start the third node:

cd $HOME/pilosa-tls-tutorial
pilosa server -c node3.config.toml

Let’s ensure that all three Pilosa servers are runnning and they are connected:

curl -k --ipv4 https://01.pilosa.local:10501/status

The -k flag is used to tell curl that it shouldn’t bother with checking the certificate the server provides and --ipv4 workarounds an issue on MacOS where the curl requests take a long time if the address resolves to 127.0.0.1. You can leave it out on Linux and WSL.

All nodes should be in the UP state:

{"status":{"Nodes":[{"Host":"01.pilosa.local:10501","State":"UP"},{"Host":"02.pilosa.local:10502","State":"UP"},{"Host":"03.pilosa.local:10503","State":"UP"}]}}

Running Queries

Having confirmed that our cluster is running OK, let’s run a few queries. But before that, we need to create an index and a frame:

curl -k --ipv4 https://01.pilosa.local:10501/index/sample-index -d ''
{}

This will create index sample-index with default options. Let’s create the frame now:

curl -k --ipv4 https://01.pilosa.local:10501/index/sample-index/frame/sample-frame -d ''
{}

We just created frame sample-frame with default options.

Let’s run a SetBit query:

curl -k --ipv4 https://01.pilosa.local:10501/index/sample-index/query -d 'SetBit(frame="sample-frame", rowID=1, columnID=100)'
{"results":[true]}

Confirm that the bit was indeed set:

curl -k --ipv4 https://01.pilosa.local:10501/index/sample-index/query -d 'Bitmap(frame="sample-frame", rowID=1)'
{"results":[{"attrs":{},"bits":[100]}]}

The same response should be returned when querying other nodes in the cluster:

curl -k --ipv4 https://02.pilosa.local:10502/index/sample-index/query -d 'Bitmap(frame="sample-frame", rowID=1)'
{"results":[{"attrs":{},"bits":[100]}]}

What’s Next?

Check out our Administration Guide to learn more about making the most of your Pilosa cluster and Configuration Documentation to see the available options to configure Pilosa.

Using Integer Field Values

Introduction

Pilosa can store integer values associated to the columns in an index, and those values are used to support range and aggregate queries. In this tutorial we will show how to set up integer fields, populate those fields with data, and query the fields. The example index we’re going to create will represent fictional patients at a medical facility and various bits of information about those patients.

First, create an index called patients:

curl localhost:10101/index/patients \
     -X POST 
{}

In addition to storing rows of bits, a frame can also contain fields that store integer values. The next step creates three fields (age, weight, tcells) in the measurements frame.

curl localhost:10101/index/patients/frame/measurements \
     -X POST \
     -d '{"options":{
              "rangeEnabled": true,
              "fields": [
                  {"name": "age", "type": "int", "min": 0, "max": 120},
                  {"name": "weight", "type": "int", "min": 0, "max": 500},
                  {"name": "tcells", "type": "int", "min": 0, "max": 2000}
              ]
         }}'
{}

If you need to, you can add fields to an existing frame by posting to the Create Field endpoint.

Next, let’s populate our fields with data. There are two ways to get data into fields: use the SetFieldValue() PQL function to set fields individually, or use the pilosa import command to import many values at once. First, let’s set some field data using PQL.

This query sets the age, weight, and t-cell count for the patient with ID 1 in our system:

curl localhost:10101/index/patients/query \
     -X POST \
     -d 'SetFieldValue(columnID=1, frame="measurements", age=34, weight=128, tcells=1145)'
{"results":[null]}

In the case where we need to load a lot of data at once, we can use the pilosa import command. This method lets us import data into Pilosa from a CSV file.

Assuming we have a file called ages.csv that is structured like this:

1,34
2,57
3,19
4,40
5,32
6,71
7,28
8,33
9,63

where the first column of the CSV represents the patient ID and the second column represents the patient’sage, then we can import the data into our age field by running this command:

pilosa import -i patients -f measurements --field age ages.csv

Now that we have some data in our index, let’s run a few queries to demonstrate how to use that data.

In order to find all patients over the age of 40, then simply run a Range query against the age field.

curl localhost:10101/index/patients/query \
     -X POST \
     -d 'Range(frame="measurements", age > 40)'
{"results":[{"attrs":{},"bits":[2,6,9]}]}

You can find a list of supported range operators in the Range Query documentation.

To find the average age of all patients, run a Sum query:

curl localhost:10101/index/patients/query \
     -X POST \
     -d 'Sum(frame="measurements", field="age")'
{"results":[{"sum":377,"count":9}]}

The results you get from the Sum query contain the sum of all values as well as the count of columns with a value. To get the average you can just divide sum by count.

You can also provide a filter to the Sum() function, to find the average age of all patients over 40.

curl localhost:10101/index/patients/query \
     -X POST \
     -d 'Sum(Range(frame="measurements", age > 40), frame="measurements", field="age")'
{"results":[{"sum":191,"count":3}]}

Notice in this case that the count is only 3 because of the age > 40 filter applied to the query.

Storing Row and Column Attributes

Introduction

Pilosa can store arbitrary values associated to any row or column. In Pilosa, these are referred to as attributes, and they can be of type string, integer, boolean, or float. In this tutorial we will store some attribute data and then run some queries that return that data.

First, create an index called books to use for this tutorial:

curl localhost:10101/index/books \
     -X POST
{}

Next, create a frame in the books index called members which will represent library members who have read books.

curl localhost:10101/index/books/frame/members \
     -X POST \
     -d '{}'
{}

Now, let’s add some books to our index.

curl localhost:10101/index/books/query \
     -X POST \
     -d 'SetColumnAttrs(columnID=1, name="To Kill a Mockingbird", year=1960)
         SetColumnAttrs(columnID=2, name="No Name in the Street", year=1972)
         SetColumnAttrs(columnID=3, name="The Tipping Point", year=2000)
         SetColumnAttrs(columnID=4, name="Out Stealing Horses", year=2003)
         SetColumnAttrs(columnID=5, name="The Forever War", year=2008)'
{"results":[null,null,null,null,null]}

And add some members.

curl localhost:10101/index/books/query \
     -X POST \
     -d 'SetRowAttrs(frame="members", rowID=10001, fullName="John Smith")
         SetRowAttrs(frame="members", rowID=10002, fullName="Sue Perkins")
         SetRowAttrs(frame="members", rowID=10003, fullName="Jennifer Hawks")
         SetRowAttrs(frame="members", rowID=10004, fullName="Pedro Vazquez")
         SetRowAttrs(frame="members", rowID=10005, fullName="Pat Washington")'
{"results":[null,null,null,null,null]}

At this point we can query one of the member records by querying that row.

curl localhost:10101/index/books/query \
     -X POST \
     -d 'Bitmap(frame="members", rowID=10002)'
{"results":[{"attrs":{"fullName":"Sue Perkins"},"bits":[]}]}

Now let’s add some data to the matrix such that each pair represents a member who has read that book.

curl localhost:10101/index/books/query \
     -X POST \
     -d 'SetBit(frame="members", rowID=10001, columnID=3)
         SetBit(frame="members", rowID=10001, columnID=5)

         SetBit(frame="members", rowID=10002, columnID=1)
         SetBit(frame="members", rowID=10002, columnID=2)
         SetBit(frame="members", rowID=10002, columnID=4)

         SetBit(frame="members", rowID=10003, columnID=3)

         SetBit(frame="members", rowID=10004, columnID=4)
         SetBit(frame="members", rowID=10004, columnID=5)

         SetBit(frame="members", rowID=10005, columnID=1)
         SetBit(frame="members", rowID=10005, columnID=2)
         SetBit(frame="members", rowID=10005, columnID=3)
         SetBit(frame="members", rowID=10005, columnID=4)
         SetBit(frame="members", rowID=10005, columnID=5)'
{"results":[true,true,true,true,true,true,true,true,true,true,true,true,true]}

Now pull the record for Sue Perkins again.

curl localhost:10101/index/books/query \
     -X POST \
     -d 'Bitmap(frame="members", rowID=10002)'
{"results":[{"attrs":{"fullName":"Sue Perkins"},"bits":[1,2,4]}]}

Notice that the result set now contains a list of integers in the bits attribute. These integers match the column IDs of the books that Sue has read.

In order to retrieve the attribute information that we stored for each book, we need to add a URL parameter columnAttrs=true to the query.

curl localhost:10101/index/books/query?columnAttrs=true \
     -X POST \
     -d 'Bitmap(frame="members", rowID=10002)'
{
  "results":[{"attrs":{"fullName":"Sue Perkins"},"bits":[1,2,4]}],
  "columnAttrs":[
    {"id":1,"attrs":{"name":"To Kill a Mockingbird","year":1960}},
    {"id":2,"attrs":{"name":"No Name in the Street","year":1972}},
    {"id":4,"attrs":{"name":"Out Stealing Horses","year":2003}}
  ]
}

The book attributes are included in the result set at the columnAttrs attribute.

Finally, if we want to find out which books were read by both Sue and Pedro, we just perform an Intersect query on those two members:

curl localhost:10101/index/books/query?columnAttrs=true \
     -X POST \
     -d 'Intersect(Bitmap(frame="members", rowID=10002), Bitmap(frame="members", rowID=10004))'
{
  "results":[{"attrs":{},"bits":[4]}],
  "columnAttrs":[
    {"id":4,"attrs":{"name":"Out Stealing Horses","year":2003}}
  ]
}

Notice that we don’t get row attributes on a complex query, but we still get the column attributes—in this case book information.