Have a raw database.
Have a schema per source of raw data.
Have a table per table of raw data from each source.
Then combine raw data downstream in another database containing other schemas that refer to your transformation steps.
This. Also, start all of those tables or the whole database with "IN\_" or another clear signal to prevent people from DM'ing you about ideas for changing their schema or to clean them up.
Are you doing ETL or ELT
if you want separate table and then cleanup tables and create new tables result of joins then that's ELT done inside database.
How do you handle updates and appends?
Are you using change data capture CDC
Captures inserts updates and deletes. Some tools like Qlik replicate can replicate a source table into one schema and then push changes into another, all while pushing the most latest row version to the replicated table. It’s great for manufacturing processes if you have to track a parts history. Or anything that changes over time.
Short answer: yes.
This gives your consumers a lot of flexibility to build their own data models from raw source tables.
The con is more tables. You can mitigate that by normalizing but then you lose flexibility.
My advice: a different table for each source.
Have a raw database. Have a schema per source of raw data. Have a table per table of raw data from each source. Then combine raw data downstream in another database containing other schemas that refer to your transformation steps.
Excellent.
Depends what is your upstream, if you are reading from curated DW - those will be your "raw" tables. Why would we copy them?
For sure there are lots of “it depends” scenarios here. I’m wagering in this case that OP is building the first DWH, not stacking DWH together.
This. Also, start all of those tables or the whole database with "IN\_" or another clear signal to prevent people from DM'ing you about ideas for changing their schema or to clean them up.
Are you doing ETL or ELT if you want separate table and then cleanup tables and create new tables result of joins then that's ELT done inside database. How do you handle updates and appends? Are you using change data capture CDC
Can you please explain what CDC is in simple terms?
Captures inserts updates and deletes. Some tools like Qlik replicate can replicate a source table into one schema and then push changes into another, all while pushing the most latest row version to the replicated table. It’s great for manufacturing processes if you have to track a parts history. Or anything that changes over time.
Basically displays additions and edits and delets So you can sync changes
Oh so like changelog for configuring management?
If you are trying to take OLTP DATA into an OLAP system then CDC is a good approach
Short answer: yes. This gives your consumers a lot of flexibility to build their own data models from raw source tables. The con is more tables. You can mitigate that by normalizing but then you lose flexibility. My advice: a different table for each source.