How to handle streaming data in Streamlit

Examples showing how to use Streamlit to display live streaming data

Streamlit is an awesome library that enables you to quickly create clean and nice looking dashboards with a bit of code.
It works great when you want to develop a research service that can be shared among your team and you have already existing Python code which you can reuse.
Another great example is converting a Jupyter notebook into a service which requires almost no development with Streamlit.
However, given the magic nature of Streamlit, it is not exactly clear what kind of an approach should one take when it comes to creating dashboards displaying live data that is being streamed from some source. In this short article I am presenting a few different ideas that solve this problem.
Let us start with the simplest of them all.

Using st.session_state and experimental_rerun

I would recommend using this approach when you already have a function to load the most recent data and you only want to find out whether
you need to restart. For example you could have a simple SQL table that has a column named `last_modified_at` which you could quickly retrieve from a database, or you have a queue to which you can listen. With this approach all you would have to do is just write something like this:

                                
import random
import time

import pandas as pd
import streamlit as st


def has_changed() -> bool:
    return True


def load_most_recent() -> pd.DataFrame:
    return pd.DataFrame({"my_col": list(range(random.randint(1, 10)))})


def wait_for_update():
    # Simplest would be to just periodically check if something has changed
    while True:
        time.sleep(1)
        if has_changed():
            return


df = load_most_recent()

st.write(df)

wait_for_update()
# This will trigger the streamlit server to run the script again from start
st.experimental_rerun()
                                
                            

The main advantage of this approach is its simplicity and the fact that you avoid doing complex operations inside streamlit.
All of the heavy processing is being pushed to your already existing functions loading most recent data, for example with Postgres that would be your database.

The disadvantages are pretty clear too, if you need to query the database at each time to load the entire unaggregated dataset that would be very inefficient.
I would only recommend this approach for displaying aggregated datasets that are not updated too often.

Using placeholder and a loop

This is probably the best way to handle streaming data in Streamlit. To display data that is changing over time dynamically we can utilize an empty placeholder and just update it in a while True loop. The biggest selling point of this approach is that you can initially download some data once and then dynamically update some parts of it until the page is refreshed or you open a new tab, where it would have to start from scratch. It will work great for displaying dynamic tables streamed from Kafka or some other streaming sources.

A minimal example could look something like this:

                                
import time

import pandas as pd
import streamlit as st

df = pd.DataFrame({"my_col": [1, 2]})

# We create the placeholder once
placeholder = st.empty()

while True:
    # It is important to exit the context of the placeholder in each step of the loop
    with placeholder.container():
        df = pd.concat(
            [
                df,
                pd.DataFrame(
                    {"my_col": [df["my_col"].max() + 1]}, index=[df["my_col"].max() + 1]
                ),
            ]
        )
        # placeholder object should have the same methods for displaying data as st
        placeholder.dataframe(df)
        time.sleep(2)
                                
                            

Sharing state between multiple sessions

Both of the approaches mentioned above have the same problem that the data is lost on refresh and that every new client has to start streaming from scratch.
The built-in st.session_state functionality only allows you to persist objects within the context of the same connection.
However, there is a third party library that you can use to maintain a server state. It is called streamlit-server-state and it will allow you to create objects that can be shared between multiple user sessions.
It is pretty simple to use, but it might have scaling problems with production applications. I would recommend creating a separate backend service that handles the streaming part of the data and would either cache or save the results in some kind of a database (e.g. Postgres, MongoDB, Redis).

Streamlit is best used for displaying data and creating simple CRUD forms.