Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Background: Someone who has worked in the old Microsoft Analysis Services stack 8 years ago and has moved to more columnar data formats like parquet or row based MPP's.

Transition plan I would say is find the dataset that is exploding in size or complexity and start your POC there. I did customer service datasets on OLAP so it only grew at a pretty small scale and the data model didn't change that much. So OLAP was fine except for the fact that nobody else knew how to maintain it.

The main growing pain is find what your front end developer flow will be. It will be the same governance as the OLAP but more democratized so be ready to make your back end more front end. For MSAS it was excel but more modern systems are also more wide open. The article suggests just SQL but that can get out of control. How do you reduce reinventing the wheel etc? How do you prevent a lineage mess of derivative on top of derivative if you give users write access. Etc. IMO Tableau is a great product that allows the OLAP like exploration but can use SQL as an input. Just make sure people get the sql behind under some kind of source control and governance.

From the data model perspective it I think the main difference is make the tables wider and "pre join" in your immutable dimensions with higher carnality (ie customer). Just be careful of highly mutable data and keep those in separate tables because it is very painful to rewrite columnar data. Ie if you partition by date to update a single record you rewrite the entire date.

(About governance) I mean more passive governance not gatekeeping. Pretend each end user and dataset costs you money. How do you track them passively with some thin yet easily trackable logging? Business Unit and unique Job are bare minimums.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: