· materialize

Materialize: Unable to automatically determine a timestamp for your query; this can happen if your query depends on non-materialized sources

This is another post describing my exploration of Materialize, a SQL streaming database. In this post I’m going to explain a confusing (to me at least) error message that you might come across when you’re getting started.

As I mentioned in my first post about Materialize, the general idea is that you create a source around a data resource and then a view on top of that. Those views can either be materialized or non-materialized.

  • A materialized view embeds a query like a traditional SQL view, but—unlike a SQL view—compute and incrementally update the results of the embedded query.

  • A non-materialized view stores a verbatim query, and provides a shorthand for performing the query.

Before we decide what query the view is going to embed, we need to create a source. In this post, we’ll use the following file that contains one JSON document:

data/covid_sample_small.json
{"id": "1341755954614861826", "conversation_id": "1341755954614861826", "created_at": "2020-12-23 14:42:02 GMT", "date": "2020-12-23", "time": "14:42:02", "timezone": "+0000", "user_id": 856240505826496513, "username": "suriyasubraman", "name": "Suriya Subramanian", "place": "", "tweet": "Impact of COVID-19 On Internet of Things (IoT) Networks Market 2020 Industry Challenges ...  https://t.co/ndGN2xRKzv #iot #data #internetofthings", "language": "en", "mentions": [], "urls": ["http://dlvr.it/RpCyyv"], "photos": [], "replies_count": 0, "retweets_count": 0, "likes_count": 0, "hashtags": ["iot", "data", "internetofthings"], "cashtags": [], "link": "https://twitter.com/SuriyaSubraman/status/1341755954614861826", "retweet": false, "quote_url": "", "video": 0, "thumbnail": "", "near": "London", "geo": "", "source": "", "user_rt_id": "", "user_rt": "", "retweet_id": "", "reply_to": [], "retweet_date": "", "translate": "", "trans_src": "", "trans_dest": ""}
Note

For brevity’s sake, I’m not including instructions for how to install or connect to Materialize in this post. If you want to learn how to do that, see my post about querying JSON files with Materialize.

We can create a source for this file by running the following query:

CREATE SOURCE
CREATE SOURCE covid_sample_source
FROM FILE '/data/covid_sample_small.json'
WITH(tail=true)
FORMAT TEXT;

Next, we’re going to create a materialized view on top of this source so that we can query the data. At the moment we don’t know which fields to include in that view, so let’s see what we’ve got to work with, by running the following query:

SELECT *
FROM covid_sample_source;
Table 1. Results

ERROR: Unable to automatically determine a timestamp for your query; this can happen if your query depends on non-materialized sources

Hmmm, that’s not what I expected to happen! I couldn’t find any documentation explaining what to do about this error, so a trip to the Materialize User Slack was needed!

In a message on there, Frank McSherry pointed out that Materialize needs to determine a time for the query since the source represents a changing source of data. If we wrap the query in a materialized view this problem is taken care for us, otherwise we need to specify a timestamp using the AS OF syntax.

Let’s give that a try:

SELECT *
FROM covid_sample_source
AS OF 1;
Table 2. Results
text mz_line_no

{"id": "1341755954614861826", "conversation_id": "1341755954614861826", "created_at": "2020-12-23 14:42:02 GMT", "date": "2020-12-23", "time": "14:42:02", "timezone": "+0000", "user_id": "856240505826496513", "username": "suriyasubraman", "name": "Suriya Subramanian", "place": "", "tweet": "Impact of COVID-19 On Internet of Things (IoT) Networks Market 2020 Industry Challenges …​ https://t.co/ndGN2xRKzv #iot #data #internetofthings", "language": "en", "mentions": [], "urls": ["http://dlvr.it/RpCyyv"], "photos": [], "replies_count": 0, "retweets_count": 0, "likes_count": 0, "hashtags": ["iot", "data", "internetofthings"], "cashtags": [], "link": "https://twitter.com/SuriyaSubraman/status/1341755954614861826", "retweet": false, "quote_url": "", "video": 0, "thumbnail": "", "near": "London", "geo": "", "source": "", "user_rt_id": "", "user_rt": "", "retweet_id": "", "reply_to": [], "retweet_date": "", "translate": "", "trans_src": "", "trans_dest": ""}

1

That’s more like it! Now we can update our query to extract just the fields that we want using the JSON operators:

SELECT val->>'id' AS id,
       val->>'tweet' AS tweet,
       val->'hashtags' AS hashtags,
       val->>'date' AS date
FROM (SELECT text::jsonb AS val FROM covid_sample_source)
AS OF 1;
Table 3. Results
id tweet hashtags date

1341755954614861826

Impact of COVID-19 On Internet of Things (IoT) Networks Market 2020 Industry Challenges …​ https://t.co/ndGN2xRKzv #iot #data #internetofthings

["iot","data","internetofthings"]

2020-12-23

Once we’re happy with our query, we can create a materialized view around the query, making sure to remove the AS OF bit:

CREATE MATERIALIZED VIEW covid_sample AS
SELECT val->>'id' AS id,
       val->>'tweet' AS tweet,
       val->'hashtags' AS hashtags,
       val->>'date' AS date
FROM (SELECT text::jsonb val FROM covid_sample_source);

We can now query the materialized view like so:

SELECT *
FROM covid_sample;
  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket