Getting Started with ClickHouse
ClickHouse is a column-oriented database system with SQL-like interface. Its main selling point is high performance when handling enormous datasets in near real time. The sacrifice you'll need to make is very limited support for updates, deletes, or transactions. If your dataset is enormous, and you mostly read, aggregate, and insert, it could work for you.
Which arguably makes it more suitable as an secondary database, not as primary one.
Run ClickHouse with Docker
You can use ClickHouse with ClickHouse Cloud, and that's what most documentation refers to, but for some practice, you can run it locally as well.
The easiest way is to use Docker. To run a local server execute this command:
$ docker run -d --name clickhouse-server clickhouse/clickhouse-server
Then we can connect to it:
$ docker exec -it clickhouse-server clickhouse-client
If you need multiple servers, just change clickhouse-server
accordingly.
Let's run some queries
d7ba07d71412 :) select 2 + 2
SELECT 2 + 2
Query id: 1565da74-dc33-4d1f-84ae-d5241b052b5c
┌─plus(2, 2)─┐
│ 4 │
└────────────┘
1 row in set. Elapsed: 0.001 sec.
d7ba07d71412 :) select concat('Hello', ', ', 'world!')
SELECT concat('Hello', ', ', 'world!')
Query id: 090687b1-3a8f-4300-9368-579a87c348c0
┌─concat('Hello', ', ', 'world!')─┐
│ Hello, world! │
└─────────────────────────────────┘
1 row in set. Elapsed: 0.001 sec.
FizzBuzz
We don't have any data yet, but we can select from system.numbers
- that's a basically infinite virtual table of all numbers starting from 0.
d7ba07d71412 :) select if(number % 15 = 0, 'FizzBuzz', if(number % 5 = 0, 'Buzz', if(number % 3 = 0, 'Fizz', toString(number)))) as FizzBuzz from system.numbers where number >= 1 limit 100
SELECT if((number % 15) = 0, 'FizzBuzz', if((number % 5) = 0, 'Buzz', if((number % 3) = 0, 'Fizz', toString(number)))) AS FizzBuzz
FROM system.numbers
WHERE number >= 1
LIMIT 100
Query id: 8badf288-95b5-4121-b4ad-18730fbfafc1
┌─FizzBuzz─┐
│ 1 │
│ 2 │
│ Fizz │
│ 4 │
│ Buzz │
│ Fizz │
│ 7 │
│ 8 │
│ Fizz │
│ Buzz │
│ 11 │
│ Fizz │
│ 13 │
│ 14 │
│ FizzBuzz │
│ 16 │
│ 17 │
│ Fizz │
│ 19 │
│ Buzz │
│ Fizz │
│ 22 │
│ 23 │
│ Fizz │
│ Buzz │
│ 26 │
│ Fizz │
│ 28 │
│ 29 │
│ FizzBuzz │
│ 31 │
│ 32 │
│ Fizz │
│ 34 │
│ Buzz │
│ Fizz │
│ 37 │
│ 38 │
│ Fizz │
│ Buzz │
│ 41 │
│ Fizz │
│ 43 │
│ 44 │
│ FizzBuzz │
│ 46 │
│ 47 │
│ Fizz │
│ 49 │
│ Buzz │
│ Fizz │
│ 52 │
│ 53 │
│ Fizz │
│ Buzz │
│ 56 │
│ Fizz │
│ 58 │
│ 59 │
│ FizzBuzz │
│ 61 │
│ 62 │
│ Fizz │
│ 64 │
│ Buzz │
│ Fizz │
│ 67 │
│ 68 │
│ Fizz │
│ Buzz │
│ 71 │
│ Fizz │
│ 73 │
│ 74 │
│ FizzBuzz │
│ 76 │
│ 77 │
│ Fizz │
│ 79 │
│ Buzz │
│ Fizz │
│ 82 │
│ 83 │
│ Fizz │
│ Buzz │
│ 86 │
│ Fizz │
│ 88 │
│ 89 │
│ FizzBuzz │
│ 91 │
│ 92 │
│ Fizz │
│ 94 │
│ Buzz │
│ Fizz │
│ 97 │
│ 98 │
│ Fizz │
│ Buzz │
└──────────┘
100 rows in set. Elapsed: 0.002 sec. Processed 65.41 thousand rows, 523.27 KB (31.21 million rows/s., 249.67 MB/s.)
d7ba07d71412 :)
Same notes:
- even though ClickHouse isn't relational database, even surprsingly complex SQL just works
system.numbers
is implicitly ordered, sosystem.numbers where number >= 1 limit 100
will return numbers from 1 to 100- we need explicit typecast in
if(number % 3 = 0, 'Fizz', toString(number))
, SQL databases tend to do a lot of automatic type conversions, ClickHouse less so
Connect from Python
We won't be coding from REPL, so let's write some code. There's no official Ruby driver, but there's a Python one. And some unofficial Ruby drivers.
$ pip3 install clickhouse-connect
We also need to expose our Docker container's Clickhouse port. First let's remove the old one (we can adjust it on running container, but it's more complex and we don't have any data there anyway):
$ docker stop clickhouse-server
$ docker rm clickhouse-server
And start it with port 8123 mapped:
$ docker run -p 8123:8123 -d --name clickhouse-server clickhouse/clickhouse-server
Python Clickhouse FizzBuzz
Now we can write some Python code to talk to our database:
#!/usr/bin/env python3
import clickhouse_connect
client = clickhouse_connect.get_client()
query = """
select
if(number % 15 = 0, 'FizzBuzz',
if(number % 5 = 0, 'Buzz',
if(number % 3 = 0, 'Fizz',
toString(number))))
as FizzBuzz
from system.numbers
where number >= 1
limit 100
"""
print(client.command(query))
The result of client.command(query)
is a multiline string, so it's mostly useful for debugging.
There are other methods that return more obvious data structures like list of tuples, but also Pandas dataframe and so on.
What's next
So we got ClickHouse running. In a future post, we'll make it work with some real data.