The other day I did a mini-course online about relational databases with Stanford Online. The course is taught by Stanford Professor Jennifer Widom and is aimed at relative database novices like me.
Jennifer started off the course by providing some useful characteristics of a Database Management System (‘DBMS’):
“Efficient, reliable, , convenient and safe multi-user storage of and access to massive amounts of persistent data”
- Massive scale – Jennifer talked about databases with the potential for terabytes of data.
- Persistent – The database always outlives the programme that will operate on its data.
- Safe – Power or hardware failures for examples shouldn’t affect the database.
- Multi-user – Jennifer talked about concurrency control in this respect.
- Convenient – The convenience of DBMS comes for a large part from ‘physical data independence’, i.e. operations on the data are independent of the way in which the data is laid out.
- Efficient – This relates to database performance, e.g. thousands of queries/updates per second.
- Reliable – Jennifer pointed out that with databases 99.999999% up-time is critical.
Jennifer then went on to break down the structure of a database into to its most basic elements, something which I actually found quite helpful:
- Database is a set of named relations (or tables)
- Each relation has a set of named attributes (or columns)
- Each tuple (or row) has a value for each attribute
- Each attribute has a type (or domain)
- A database schema is a structural description of relations in a database
- An instance contains the actual contents at a given point in time
- “Null” is a special value for “unknown” or “undefined”
- A key is an attribute whose value is unique in each tuple (e.g. a student ID) or a set of attributes whose combined values are unique
- Having unique keys helps to (1) identify specific tuples (columns) (2) fast indexing of the database and (3) refers to tuples of another key
In the session, Jennifer gave a first flavour of how to create relations (table) in SQL, which is commonly used database programming language. I found below example which Jennifer gave to be very helpful:
Create Table Student (ID, name, GPA, photo)
Create Table College (name string*, state char*, enrolment integer*) * These are all attribute types
She then went on to talk a bit about querying relational databases, outlining common steps in creating and using a (relational) database:
- Design schema – Create a schema which describes the relations within a database, using a Data Definition Language (‘DDL’)
- “Bulk load” initial data – Load the initial data into the database
- Repeat – Execute queries and modifications
Jennifer then finished off by giving an example of a SQL query that returns the follow relation: “IDs of students with GPA > 3.7 applying to Stanford”. In SQL this query would look something like this:
Select Student ID
From Student, Apply
Where Student ID = Apply ID
And GPA > 3.7 and college = ‘Stanford”
Main learning point: I’ve clearly got my work cut out for me, but I felt that this first mini-course on relational data models and databases was incredibly helpful and easy to understand. I’m hoping I can build on these first foundations and understand more about how to best structure SQL queries and interpret the relations that these queries return.
2 responses to “Learning about relational data models and databases”
Eight nines reliability is just not possible. That allows only a third of a second of downtime per year. Even with triple-redundant everything, a network connection failover will take more than that (and increasing the sensitivity of any tripwires will not help, as it would introduce spurious flickering failovers).
[…] about the basics of SQL, a special programming language designed for managing data held in a relational database or from stream processing in a […]