@Version : 4.5.0
@Build : 94d077c24
By using this site, you acknowledge that you have read and understand the Cookie Policy, Privacy Policy, and the Terms. Close

Erlang Term Storage (ETS)

Posted Friday, July 31st, 2020

OTP and BEAMEr-LangElixir
Erlang Term Storage (ETS)

I have been working in the Elixir ecosystem for over a year at the time of writing this article and each time I encounter a new tool in the stack, it excites me to write for knowledge share, documentation, and further solid understanding. ETS is one such tool that really excited me. I hope you find this small article helpful.

Erlang Term storage is a high performance and powerful storage engine that is often used to achieve a lot of things that require in-memory storage of a runtime environment in the Erlang and Elixir ecosystem. ETS is part of OTP and comes by default in any OTP runtime environment (Erlang or Elixir). Before we can start exploring ETS, let's see some important notes about ETS that are good to always remember while working with it or talking about it. These are features and characteristics of ETS that are fundamental to how it works. Here is what I could come up with so far.

  • ETS organizes data in dynamic tables that can store tuples.
  • Th first value in the tuple is usually the key { key, value1, value2 }
  • :ets is the module of erlang BIFs that are used to interact with the data store in ETS.
  • ETS tables are not garbage collected
  • Tables access rights are defines at creation.
  • By default, only the owner process of the table can write to it, but every other process can read from it.
  • A table is auto-deleted when the creating process dies but can still be deleted through the BIF delete
  • You can transfer ownership of an ETS table from one process to another.
  • There is no limit to the number of tables you can create in an ETS instance except by memory.
  • You can still limit number tables by ERL_MAX_ETS_TABLES in the environment.
  • :ets module provides ways to pattern match on your data.
  • ETS tables have three access control definitions.
    • public — Read/Write available to all processes.
    • protected — Read available to all processes. Only writable by owner process. This is the default.
    • private — Read/Write limited to owner process.
  • There are four types of ETS tables you can create.
    • set - Default and Each object has a unique key. This table type is ideal for a standard key/value store with constant time access. For example {key1, some, values} and {key2, other, values}
    • ordered_set - Each tuple has a unique key but ordered using Erlang/Elixir Term Order in ascending order. This table type has a lower access time than the set type. (O(log N) where N is the number of objects stored)
    • bag - Many tuples can have the same key but only one occurrence of the tuple. For example {key, some, values} and {key, other, values} is okay but {key, some, values} and {key, some, values} is not okay.
    • duplicate_bag - Many objects can have the same key and there can be duplicates. For example {key, some, values} and {key, some, values}
  • All ETS data is store in memory thus are lost on server restarts. There are ways to persists that I have not explored here. I generally hope you will not need to persist.

The above bullet points will govern how you are supposed to use ETS safely and with the best level of efficiency and performance. From here now we will explore how to create ETS tables, save, look up, and delete data. We will also go in-depth with a case study of using ETS on concurrency reads, atomicity of writes, and table life management.

Creating ETS Tables

To create a new ETS tab :ets.new/2 which takes the table name and options for access rights, table type among others. This section provides highlights on how to create tables but we will explore these options as we move along while writing and reading data.

For example below we create a table with default characteristics like set table type and protected access control type. Notice the table is not named thus is accessible via the reference returned.

iex(1)> :ets.new(:online_users, [])
#Reference<0.1514273008.2690252803.205742>
iex(2)>

We can also create a table with desired characteristics by providing options to :ets.new/2. For example, here we create a table that is public, accessible by its name, and is ordered_set.

iex(2)> :ets.new(:table_name, [:ordered_set, :public, :named_table])
:table_name
iex(3)>

Lets now look at how to insert data into an ETS table.

Inserting data into ETS table

ETS accepts the elixir/erlang data structure as they are and thus you can insert without serialization. Below are examples of adding data to ETS tables with different types of Data Structures. Note that I used a bag because I needed to store anything in it. No schema or consistent structure.

defmodule User do
    defstruct name: "Danstan", age: 29
end

iex(18)> :ets.insert(table, {:online, [%User{}, %User{name: "Zemuldo"}]})
true
iex(11)> :ets.insert(table, {"foo", "bar"})
true
iex(12)> :ets.insert(table, {:foo, "bar"})
true
iex(13)> :ets.insert(table, {:foo, "bar", "baz"})
true
iex(14)> :ets.insert(table, {1, %User{}})
true
iex(15)> :ets.insert(table, {1, %User{}})
true
iex(16)> :ets.insert(table, {1, %User{}})
true
iex(17)> :ets.insert(table, {1, %User{}, false})
true
iex(18)> :ets.insert(table, {:online, [%User{}, %User{name: "Zemuldo"}]})
true
iex(22)> :ets.insert(table, {:online, {}})
true
iex(23)> :ets.insert(table, {:online, {:key, "value"}})
true

As you can see, you can insert just about anything into an ETS table. This makes ETS very powerful. How about reading the inserted data?

Reading data in ETS tables

:ets provides a lookup BIF for looking up data using the key. Here are examples looking up the data in the last chapter.

iex(24)> :ets.lookup(table, :foo)
[{:foo, "bar"}, {:foo, "bar", "baz"}]
iex(25)> :ets.lookup(table, :online)
[
  online: [%User{age: 29, name: "Danstan"}, %User{age: 29, name: "Zemuldo"}],
  online: {},
  online: {:key, "value"}
]

There are times when you don't just want to lookup but match things based on the structure and the values. :ets provides a number of match functions that powerfully search through the tables.

Assuming Netflix wants to use ETS to track users who are online while storing their id, movie_id, plan, and timestamp. This ETS table should provide a way to tell

  • all users online
  • all users online, on a particular plan
  • all users online, on a particular plan, watching a given movie
iex(66)> :ets.insert(table, {2, 13, :basic, DateTime.utc_now})
true
iex(67)> :ets.insert(table, {3, 12, :pro, DateTime.utc_now})
true
iex(68)> :ets.insert(table, {4, 13, :family, DateTime.utc_now})
true
iex(69)> :ets.insert(table, {5, 13, :basic, DateTime.utc_now})
true
iex(70)> :ets.insert(table, {6, 13, :pro, DateTime.utc_now})
true

To see online users and the movie they are currently watching.

iex(71)> :ets.match(table, {:"$1", :"$2", :"$3", :"$4"}) 
[
  [1, 12, :basic, ~U[2020-07-31 16:51:21.924221Z]],
  [3, 12, :pro, ~U[2020-07-31 16:51:53.345792Z]],
  [4, 13, :family, ~U[2020-07-31 16:52:13.018037Z]],
  [6, 13, :pro, ~U[2020-07-31 16:52:40.231305Z]],
  [2, 13, :basic, ~U[2020-07-31 16:51:37.729990Z]],
  [5, 13, :basic, ~U[2020-07-31 16:52:26.539982Z]]
]

To see only users online watching movie id=12

iex(72)> :ets.match(table, {:"$1", 12, :"$2", :"$3"})
[
  [1, :basic, ~U[2020-07-31 16:51:21.924221Z]],
  [3, :pro, ~U[2020-07-31 16:51:53.345792Z]]
]

To see users online on plan=basic or :pro and watching movie_id=13, they can use fun2ms to build a matching spec for the select :ets BIF

iex(115)> q =  :ets.fun2ms(fn({user_id, movie_id, plan, timestamp}) when plan in [:basic, :pro] and movie_id == 13 -> {user_id, movie_id, plan, timestamp} end)
[
  {{:"$1", :"$2", :"$3", :"$4"},
   [
     {:andalso, {:orelse, {:"=:=", :"$3", :basic}, {:"=:=", :"$3", :pro}},
      {:==, :"$2", 13}}
   ], [{{:"$1", :"$2", :"$3", :"$4"}}]}
]
iex(116)> :ets.select(table, q)                                                                                                        [
  {6, 13, :pro, ~U[2020-07-31 16:52:40.231305Z]},
  {2, 13, :basic, ~U[2020-07-31 16:51:37.729990Z]},
  {5, 13, :basic, ~U[2020-07-31 16:52:26.539982Z]}
]
iex(117)>

There is even a function to just count and return the number of matching objects in the match spec.

iex(123)> q =  :ets.fun2ms(fn({_, movie_id, plan, _}) when plan in [:basic, :pro] and movie_id == 13 -> true end)
[
  {{:_, :"$1", :"$2", :_},
   [
     {:andalso, {:orelse, {:"=:=", :"$2", :basic}, {:"=:=", :"$2", :pro}},
      {:==, :"$1", 13}}
   ], [true]}
]
iex(124)> :ets.select_count(table, q)
3
iex(125)

The Erlang docs detail many available ways of interacting with ETS. Check the bottom for links.

From here we are going to use a case study of using ETS for parts of a micro-service system that will depend on ETS tables for high-performance in-memory data handling. It is time for some serious beaming right now.

Use Cases of ETS

There are several things we can do with ETS. This usually depends on what your goals are and the size and type of content are. It is important to state that there are abstractions that can do what ETS does in Elixir that would still work fine for some situations like Agents, GenServer and Task

Process State

It is a common thing to use a Gen-Server process state to store any process state in Elixir/Erlang. Usually, this is just fine in most cases. However, in some cases, this memory can be bigger and have requirements for advanced read requirements like the following.

  • Being read by other processes
  • Advanced marching or searching
  • Persistence on process crash or restarts. There may be other reasons why the gen-server state becomes limiting but I have seen the above three so far. Let's see an example.

Let's say you have a function that needs to store information like the last time it was called and what inputs it was given. This is easy to do in Elixir as you would just put the function behind a gen-server and store the needed information in the gen-server state. Here is an example.

defmodule StampedFunction do
    use GenServer

  @impl true
  def init(_) do
    {:ok, %{}}
  end

  @impl true
  def handle_cast([arg1, arg2], _) do
    stamped(arg1, arg2)
    new_state = %{time: DateTime.utc_now(), args: [arg1, arg2]}
    {:noreply, new_state}
  end

  @impl true
  def handle_call(:current_stamp, _, state) do
    {:reply, state, state}
  end

  def stamped(arg1, arg2) do
    # Do something
    {:ok, arg1, arg2}
  end

  """
  iex(2)> {:ok, pid} = GenServer.start_link(StampedFunction, [])
  {:ok, #PID<0.1083.0>}
  iex(3)> GenServer.cast(pid, [1, 2])
  :ok
  iex(4)> GenServer.call(pid, :current_stamp)
  %{args: [1, 2], time: ~U[2020-07-25 08:03:40.602256Z]}
  iex(5)>
  """
end

The only problem with this is that getting the current state is available via handle_call which is not concurrent. That means concurrent reads are not possible. ETS will solve this by making the read public so other functions don't have to call the gen-server to read the current state. Something like this. Note that you can still do these using abstractions like Agent. So for this ETS would not make sense but will solve concurrent reads.

In-Memory Counters

Imagine you have a process that does track the incremented integer values like a Rate Limiter and exposes these counts to some other processes or interfaces. You can store the counts in the process state like this. Note that I am using a map as the store with key/value pairs to store rate limits for say different API keys.

defmodule RateLimiter do
  use GenServer

  @impl true
  def init(initial_state) do
    {:ok, initial_state}
  end

  @impl true
  def handle_cast({:update, key}, state) do
    case Map.get(state, key) do
        nil ->  {:noreply, Map.put(state, key, 1)}
        value -> {:noreply, Map.put(state, key, value + 1)}
    end
  end

  @impl true
  def handle_call({:get, key}, _, state) do
    case Map.get(state, key) do
        nil ->  {:reply, 0, state}
        value -> {:reply, Map.get(state, key), state}
    end
  end
end

"""
iex(15)> {:ok, pid} = GenServer.start_link(RateLimiter, %{})
{:ok, #PID<0.836.0>}
iex(16)> GenServer.cast(pid, {:update, "301454d5-ffbe-44f1-b25d-b9399c303436"})
:ok
iex(17)> GenServer.cast(pid, {:update, "301454d5-ffbe-44f1-b25d-b9399c303436"})
:ok
iex(18)> GenServer.call(pid, {:get, "301454d5-ffbe-44f1-b25d-b9399c303436"})
2
iex(19)>
"""

This would always work but has limitations for example it is not easy to read the values from the process state concurrently because handle_call/3 callback messages are processed synchronously. Solving this with ETS will provide a robust counter type table with concurrent reads. Here is an example using ETS to replace the gen-server state.

defmodule RateLimiter do
  use GenServer

  @table_name :rate_limiter
  @lowest_threshold 0
  @default_count_value 0

  @impl true
  def init(_) do
    :ets.new(@table_name, [:set, :named_table, read_concurrency: true])
    {:ok, []}
  end

  @impl true
  def handle_cast({:update, key}, state) do
    :ets.update_counter(@table_name, key, 1, {@lowest_threshold, @default_count_value})
    {:noreply, state}
  end

  def get(key) do
    case :ets.lookup(@table_name, key) do
        [{_, count}] -> count
        _ -> 0
     end
  end

  """
  iex(3)> {:ok, pid} = GenServer.start_link(RateLimiter, %{})
  {:ok, #PID<0.631.0>}
  iex(5)> GenServer.cast(pid, {:update, "301454d5-ffbe-44f1-b25d-b9399c303436"})
  :ok
  iex(6)> RateLimiter.get("301454d5-ffbe-44f1-b25d-b9399c303436")
  1
  iex(7)> GenServer.cast(pid, {:update, "301454d5-ffbe-44f1-b25d-b9399c303436"})
  :ok
  iex(8)> RateLimiter.get("301454d5-ffbe-44f1-b25d-b9399c303436")
  2
  """
end

Note that the reason why you would choose ETS in a case like this is that it provides atomic writes and concurrent reads. The functionality is also way powerful. You can have a look here and here for further reading on this.

Check out this cool gist on ETS counters consistency.

Cache

Most applications depend on Redis or Memcached for in-memory storage. These often exist as third party services or dependency services. In the Er-lang/Elixir ecosystem, it is common to use ETS in the place of these services because it is very capable of doing that without adding complexities to your stack.

I did not do an example here because I found these two very well written ETS base caches you can checkout.

Challenge for you

Consider a module that tracks users' scores in an online QA game. So it needs to know

  • All the questions loaded in the game.
  • All the users that have joined in the game.
  • The score of each user.
  • The answers to each question by each user.

It also needs to provide functions to read/write the following

  • Add a user to the game
  • Put an answer to the store
  • Read the score by all users on each question
  • Read the total score by each user
  • Read all questions a user answered correctly
  • Read all questions a player answered wrong.

Below is a module for your questions.

defmodule OpenTrivia do
  use Tesla

  plug Tesla.Middleware.BaseUrl, "https://opentdb.com/api.php"
  plug Tesla.Middleware.JSON

  def get() do
    with {:ok, response} <- get("/?amount=10&category=27&type=boolean") do
      response
      |> Map.get(:body)
      |> Map.get("results")
      |> Enum.with_index()
      |> Enum.map(fn {question, index} -> Map.put(question, "id", index) end)
    else
      _ -> []
    end
  end
end

Important questions to answer before using ETS.

So ETS is awesome but as you may know, in tech it is not only awesomeness that determines whether or not we use a particular technology. Same here. Here are some questions that you need to answer before using ETS.

  • What is the structure or nature of your data?
  • Does your data need to persist on server restarts?
  • Do you need to worry about the side effects of concurrent writes?
  • Is your data public to other processes?
  • Do your data need to be garbage collected?
  • Does your data need to be accessible in all nodes?

Further reading, here are a few links that will juice up your knowledge and options when it comes to learning or using ETS.

I hope you found this insightful. Thank you. Please share and follow me on Twitter and Github. That will go a long way :-). Cheers!