Arlon's CSUMB Intro to Database Systems CST-363 Module 7 Learning Journal #7/#23 for the week Wed 06/09-Tues 06/15, year 2021

- June 14, 2021

Prompt for week 7:

In your own words what is a data warehouse and why is it important to have another version of the data in a data warehouse?

The data warehouse is your playground for the data you collect with your database program. You can make any program you want to subsequently process and re-process your data, which itself is an asset.

What I am so excited about is the insert with select statements - I'd never seen that before. You can make database upon database based on the data you have in any form you can contrive, that could become helpful for analysis, administration, or tons of other purposes. Tons of ideas for tons of purposes. It's a whole world of building upon and studying the data you collect from your operational database.

Say you have a game, say it's a chess game, and you collect say 100,000 games. At some point you could write a separate program that might go through all the games and start to make decisions about what it might do based on past game history and results. You wouldn't want that to run in the same program as the game program because it's a whole different program, but it's based on the data that came out of the first program, the game. That's just one example.

Customer trends I think is the class example, you have an operational database of customer sales and at some point you get to take all that data, design a brand new whole different database that you dump your operational data into - it's separate from your operational database so it doesn't matter to operations if it is huge - redundant, etc. It's for doing whatever you want to - you can do it and re-do it all day long with fresh operational data if you mess up the warehouse data or whatever. You don't that program into the other program - it's a different program.

The reasons to copy the operational database are many - perhaps not when the database is fresh - but soon after it's fresh - it becomes populated - then you can take that data - extract it - transform it however you want, load it into another database of a completely different design - study, analyze, learn from it however you want, without messing with your operational data.

It's the same for file management, photo management, etc. For example say I go on a hike and take a bunch of pictures - I don't go chopping up the originals, I edit copies, etc, and you always have the original to re-edit in a presumably better and better way as time goes on.

For any warehouses or copies of the data:

a) Need for unique data is out the window - redundancy is okay.
b) The data gets transformed into a whole new form - you just copied it.
c) You get to do whatever you want with your data in the data warehouse copy from a whole world of possibilities, admin, analysis, research, the sky is the limit.

Actual Journal Notes:

We did forward engineer in ER week - this week you can use 'Reverse Engineer' in MySQL Workbench to see the SQL in ER form - very handy. The first part of this week's homework assignment was to study these schemas - an operational sale/customer schema and a matching derived warehouse schema to go with it, which copied the operational data into a new warehouse form using an ETL script. ER is a good way to study a schema, that we learned how to do in this class!

Although I have never seen this stuff before - insert with a select and with stuff like substrings and well like this from line 106 of the hsddw CreateTablesWithData.sql file from our homework this week:

/* load the hsddw customer table from hsd customer */

insert into hsddw.customer 

(customerid, customername, emaildomain, phoneareacode, city, state, zip)
 
select customerid, concat( trim(lastname), ' ', trim(firstname)), substring(emailaddress, locate('@', emailaddress)+1),   
         substring(phone,1,3), city, state, zip 

from hsd.CUSTOMER;

I feel like this is the real SQL. Now we are programming. Taking stuff, doing stuff to it, and putting it somewhere else in a new form - you can do a lot with that paradigm - now I see where this can lead, at least I think I partly see. There is a lot to it, and it seems like a whole world of interesting stuff to dive into - I'm so glad to know you can program the database in a fundamental way - until this class all my database programming - well most of it - has been at the application level rather than in the SQL like it should as this class thankfully demonstrates.

I am wondering if you could use a paradigm like this in SQL to do math like matrix math and transformations and stuff, so it might be fun to look into that.

Thanks for reading!

Search This Blog

Arlon's CSUMB Blog

Arlon's CSUMB Intro to Database Systems CST-363 Module 7 Learning Journal #7/#23 for the week Wed 06/09-Tues 06/15, year 2021

Comments

Post a Comment

Popular posts from this blog

Module 2 Learning Journal 1-19-21

Arlon's CSUMB ProSeminar CST300 Module 8 Learning Journal for the week Wed 02/24-Saturday 02/27, year 2021 - Journal 8

Arlon's CSUMB ProSeminar CST300 Module 4 Learning Journal for the week Wed 1/27-Tues 2/2, year 2021