Most problems of data engineering of today would be solved in presence of a tool...

endymi0n · on Feb 22, 2021

https://www.getdbt.com/ comes extremely close in my eyes and even tackles the documentation and infra-as-code aspect. We went all in half a year ago and never looked back.

snidane · on Feb 22, 2021

DBT is interesting, but is far from what I'm describing.

1. It is only for structured SQL, not for arbitrary data. I can't use it to unpack raw zipped data for example

2. It couples logic for data transformation and view state management. Actually it makes you do it yourself, so it doesn't really help at all. You'll get burned by storing view state together with your data, eg. when a batch increment contains no data.

3. It is not built with "incremental materialized views" in mind. It still thinks in a batch refill mode and incremental mode according to this [0].

It is certaily an improvement over managing sql scripts by hand, but far from the ultimate goal of maintaining materialized views in a declarative way.

[0] https://docs.getdbt.com/docs/building-a-dbt-project/building...

smknappy · on Feb 22, 2021

Have a look @ https://www.ascend.io -- addresses the issues you highlight: 1. SQL + Python + Java + Scala 2. state management fully automated 3. automatic incremental materialized views

sammm · on Feb 22, 2021

How do you do testing out of interest? Whenever I have seen dbt used, it is usually data analysts creating new tables on the fly in data warehouse scenarios.

Maybe I am just too used to application developer workflows where models are defined in code and then there are ORMs and schema migration tools to help manage all that.

tehlike · on Feb 22, 2021

Ravendb gets close

snidane · on Feb 22, 2021

I had in mind OLAP use cases in environments with lot of both unstructured and structured tabular data. Some kind of scripting is necessary just to structure bunch of text and jsons into tables.

Ravendb seems like OLTP NoSQL database on the other hand.