Getting Started with ClickHouse

ClickHouse is a column-oriented database system with SQL-like interface. Its main selling point is high performance when handling enormous datasets in near real time. The sacrifice you'll need to make is very limited support for updates, deletes, or transactions. If your dataset is enormous, and you mostly read, aggregate, and insert, it could work for you.

Which arguably makes it more suitable as an secondary database, not as primary one.

Run ClickHouse with Docker

You can use ClickHouse with ClickHouse Cloud, and that's what most documentation refers to, but for some practice, you can run it locally as well.

The easiest way is to use Docker. To run a local server execute this command:

$ docker run -d --name clickhouse-server clickhouse/clickhouse-server

Then we can connect to it:

$ docker exec -it clickhouse-server clickhouse-client

If you need multiple servers, just change clickhouse-server accordingly.

Let's run some queries

d7ba07d71412 :) select 2 + 2

SELECT 2 + 2

Query id: 1565da74-dc33-4d1f-84ae-d5241b052b5c

┌─plus(2, 2)─┐
│          4 │
└────────────┘

1 row in set. Elapsed: 0.001 sec.
d7ba07d71412 :) select concat('Hello', ', ', 'world!')

SELECT concat('Hello', ', ', 'world!')

Query id: 090687b1-3a8f-4300-9368-579a87c348c0

┌─concat('Hello', ', ', 'world!')─┐
│ Hello, world!                   │
└─────────────────────────────────┘

1 row in set. Elapsed: 0.001 sec.

FizzBuzz

We don't have any data yet, but we can select from system.numbers - that's a basically infinite virtual table of all numbers starting from 0.

d7ba07d71412 :) select if(number % 15 = 0, 'FizzBuzz', if(number % 5 = 0, 'Buzz', if(number % 3 = 0, 'Fizz', toString(number)))) as FizzBuzz from system.numbers where number >= 1 limit 100

SELECT if((number % 15) = 0, 'FizzBuzz', if((number % 5) = 0, 'Buzz', if((number % 3) = 0, 'Fizz', toString(number)))) AS FizzBuzz
FROM system.numbers
WHERE number >= 1
LIMIT 100

Query id: 8badf288-95b5-4121-b4ad-18730fbfafc1

┌─FizzBuzz─┐
│ 1        │
│ 2        │
│ Fizz     │
│ 4        │
│ Buzz     │
│ Fizz     │
│ 7        │
│ 8        │
│ Fizz     │
│ Buzz     │
│ 11       │
│ Fizz     │
│ 13       │
│ 14       │
│ FizzBuzz │
│ 16       │
│ 17       │
│ Fizz     │
│ 19       │
│ Buzz     │
│ Fizz     │
│ 22       │
│ 23       │
│ Fizz     │
│ Buzz     │
│ 26       │
│ Fizz     │
│ 28       │
│ 29       │
│ FizzBuzz │
│ 31       │
│ 32       │
│ Fizz     │
│ 34       │
│ Buzz     │
│ Fizz     │
│ 37       │
│ 38       │
│ Fizz     │
│ Buzz     │
│ 41       │
│ Fizz     │
│ 43       │
│ 44       │
│ FizzBuzz │
│ 46       │
│ 47       │
│ Fizz     │
│ 49       │
│ Buzz     │
│ Fizz     │
│ 52       │
│ 53       │
│ Fizz     │
│ Buzz     │
│ 56       │
│ Fizz     │
│ 58       │
│ 59       │
│ FizzBuzz │
│ 61       │
│ 62       │
│ Fizz     │
│ 64       │
│ Buzz     │
│ Fizz     │
│ 67       │
│ 68       │
│ Fizz     │
│ Buzz     │
│ 71       │
│ Fizz     │
│ 73       │
│ 74       │
│ FizzBuzz │
│ 76       │
│ 77       │
│ Fizz     │
│ 79       │
│ Buzz     │
│ Fizz     │
│ 82       │
│ 83       │
│ Fizz     │
│ Buzz     │
│ 86       │
│ Fizz     │
│ 88       │
│ 89       │
│ FizzBuzz │
│ 91       │
│ 92       │
│ Fizz     │
│ 94       │
│ Buzz     │
│ Fizz     │
│ 97       │
│ 98       │
│ Fizz     │
│ Buzz     │
└──────────┘

100 rows in set. Elapsed: 0.002 sec. Processed 65.41 thousand rows, 523.27 KB (31.21 million rows/s., 249.67 MB/s.)

d7ba07d71412 :)

Same notes:

  • even though ClickHouse isn't relational database, even surprsingly complex SQL just works
  • system.numbers is implicitly ordered, so system.numbers where number >= 1 limit 100 will return numbers from 1 to 100
  • we need explicit typecast in if(number % 3 = 0, 'Fizz', toString(number)), SQL databases tend to do a lot of automatic type conversions, ClickHouse less so

Connect from Python

We won't be coding from REPL, so let's write some code. There's no official Ruby driver, but there's a Python one. And some unofficial Ruby drivers.

$ pip3 install clickhouse-connect

We also need to expose our Docker container's Clickhouse port. First let's remove the old one (we can adjust it on running container, but it's more complex and we don't have any data there anyway):

$ docker stop clickhouse-server
$ docker rm clickhouse-server

And start it with port 8123 mapped:

$ docker run -p 8123:8123 -d --name clickhouse-server clickhouse/clickhouse-server

Python Clickhouse FizzBuzz

Now we can write some Python code to talk to our database:

#!/usr/bin/env python3

import clickhouse_connect

client = clickhouse_connect.get_client()
query = """
  select
    if(number % 15 = 0, 'FizzBuzz',
    if(number % 5 = 0, 'Buzz',
    if(number % 3 = 0, 'Fizz',
    toString(number))))
    as FizzBuzz
  from system.numbers
  where number >= 1
  limit 100
  """

print(client.command(query))

The result of client.command(query) is a multiline string, so it's mostly useful for debugging.

There are other methods that return more obvious data structures like list of tuples, but also Pandas dataframe and so on.

What's next

So we got ClickHouse running. In a future post, we'll make it work with some real data.

Code

All code for this post is available on GitHub.