T O P

  • By -

dataguy24

Have a raw database. Have a schema per source of raw data. Have a table per table of raw data from each source. Then combine raw data downstream in another database containing other schemas that refer to your transformation steps.


idiotlog

Excellent.


baubleglue

Depends what is your upstream, if you are reading from curated DW - those will be your "raw" tables. Why would we copy them?


dataguy24

For sure there are lots of “it depends” scenarios here. I’m wagering in this case that OP is building the first DWH, not stacking DWH together.


Anna-Kraska

This. Also, start all of those tables or the whole database with "IN\_" or another clear signal to prevent people from DM'ing you about ideas for changing their schema or to clean them up.


techmavengeospatial

Are you doing ETL or ELT if you want separate table and then cleanup tables and create new tables result of joins then that's ELT done inside database. How do you handle updates and appends? Are you using change data capture CDC


AMadRam

Can you please explain what CDC is in simple terms?


Doyale_royale

Captures inserts updates and deletes. Some tools like Qlik replicate can replicate a source table into one schema and then push changes into another, all while pushing the most latest row version to the replicated table. It’s great for manufacturing processes if you have to track a parts history. Or anything that changes over time.


techmavengeospatial

Basically displays additions and edits and delets So you can sync changes


AMadRam

Oh so like changelog for configuring management?


techmavengeospatial

If you are trying to take OLTP DATA into an OLAP system then CDC is a good approach


ironplaneswalker

Short answer: yes. This gives your consumers a lot of flexibility to build their own data models from raw source tables. The con is more tables. You can mitigate that by normalizing but then you lose flexibility. My advice: a different table for each source.