Learning about the basic constructs of XML

A few weeks ago I wrote about relational data models and databases and some of the basic principles which I learned about in an online Stanford course. As part of the same course I recently learned about “XML” which stands for Extensible Markup Language, a standard for data representation and exchange. Having worked in the entertainment industry for a few years now, I’ve often find myself looking at metadata in the form of XML, but it was great to get a good refresher as part of my course.

These are some of the basic constructs of XML:

  • Tagged elements (nested) – an opening and a closing tag
  • Attributes – attributes tend to consist of: a name (unique), an equal sign (=) and an attribute value
  • Text – also known as “character data”, this can be as simple as “Marc Abraham” or “123456”

The instructor, Jennifer Widom, then went on to explain the differences between the relational data model and XML:

Relational data (eg. SQL):

  1. Structure: Tables
  2. Schema: Fixed in advance
  3. Queries: Simple, nice language
  4. Ordering: None, unordered
  5. Implementation: Native


  1. Structure: Hierarchical tree, graph
  2. Schema: Flexible, “self-describing”
  3. Queries: Less simple, more complex
  4. Order: Implied ordering
  5. Implementation: Add-on

With XML, validation is a key aspect. In an oversimplified way, it comes down to taking an XML document and validate it against a set “XSD” (XML Schema Descriptor). This process determines whether the XML document is valid or invalid (see Fig. 1 below). During the class, Jennifer also highlighted that XML documents contain two file types. First, a schema file which contains the XSD. Second, the actual data file.

I then struggled a bit when Jennifer talked about “DTDs”. I subsequently learned that “DTD” stands for ‘Document Type Definition’ and is a set of markup declarations which defines the legal building blocks of an XML document.

There are four features of an XML schema which aren’t present in DTDs:

  • Key declarations – In DTDs, document or item IDs have to be globally unique. An XML ID can be specified through an attribute value only. This means that you can’t index elements in the XML based on a parent-child relationship (see Fig. 2 below). Key declarations in XML aim to overcome such limitations.
  • Type values  XML Schema has a lot of built-in data types. The most common types are string, decimal, integer, boolean, date and time. I’ve found some useful examples of ‘simple type’ and ‘complex types’ XML schema (see Fig. 3 below).
  • References – References can refer to already defined keys (see my previous point about key declarations) or so-called “typed pointers”. A typed pointer must point to a specific element of the XML (e.g. a string) which in term must confirm to the specification as laid out in the pointer.
  • Currents constraints  In XML one can specify how many times an element type is allowed to occur. One can thus specify a minimum and a maximum number of occurrences.

Main learning point: In her online video on the basics of XML, Jennifer Widom provided a useful overview of XML. Even though I had looked at XML schema before, it was good to understand more about some of the foundations behind XML and XML validation.

Fig. 1 – Sample XML validator – Taken from: http://www.deitel.com/articles/xml_tutorials/20060401/XMLStructuringData/XMLStructuringData_Page4.html



Fig. 2 – Sample XML, highlighting XML ID requirement – Taken from: http://msdn.microsoft.com/en-us/library/aa302297.aspx

<?xml version="1.0"?>
<!DOCTYPE orders [
  <!ELEMENT order ANY>  
  <!ATTLIST order
    orderno ID #REQUIRED   
  <order orderno="id10952" date="4/15/96" shipAddress="Obere Str. 57"/>
  <order orderno="id10535" date="6/13/95" shipAddress="Mataderos  2312"/>


Fig. 3 – Examples of simple type and complex type XML schema – Taken from: http://www.xmlmaster.org/en/article/d01/c05/

Simple Type Example

<xs:element name=”Department” type=”xs:string” />

Here, the section described together with “xs:string” is an embedded simple type according to XML Schema. In this example, we have established the definition that the data type for the element called “Department” is a text string.

Complex Type Example

<xs:complexType name=”EmployeeType”>
<xs:sequence maxOccurs=”unbounded”>
<xs:element ref=”Name” />
<xs:element ref=”Department” />
<xs:element name=”Name” type=”xs:string” />
<xs:element name=”Department” type=”xs:string” />

In this case the type name “EmployeeType” is designated by the name attribute of the complexType element. A model group (what designates the order of occurrence for the child element) is designated in the child element.

New types are created by placing restrictions on or extending simple or complex types. In this volume, we will discuss restrictions and extensions for simple types.

Related links for further learning:

  1. http://www.rpbourret.com/xml/XMLAndDatabases.htm
  2. http://stackoverflow.com/questions/966901/modeling-xml-vs-relational-database
  3. http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.1.0/com.ibm.db2.udb.apdv.embed.doc/doc/c0023811.htm
  4. http://www.deitel.com/articles/xml_tutorials/20060401/XMLStructuringData/XMLStructuringData_Page4.html
  5. http://en.wikipedia.org/wiki/Document_type_definition
  6. http://www.w3.org/TR/xmlschema-1/
  7. http://msdn.microsoft.com/en-us/library/aa302297.aspx
  8. http://www.w3schools.com/schema/schema_simple.asp
  9. http://www.xmlmaster.org/en/article/d01/c05/
  10. http://www.w3.org/TR/xptr-framework/
  11. http://www.xmlnews.org/docs/xml-basics.html



Book review: “Web Metrics”

When I asked analytics expert Barry Mann about any good books on analytics, his advice was short and sweet: “simple, read Web Metrics by Jim Sterne“. Even though this book was published back in 2002, Barry recommended this as a great textbook on analytics. And so it proved to be. “Web Analytics – Proven methods for measuring web site” does a great job in distinguishing between the things one wants to measure (and why) and the tools one can use for measuring.

These are the areas of the book that I found most helpful:

  1. Division of analytics tools – Sterne references a useful way of dividing tools into four levels. This breakdown comes from Gartner and offers a handy way of looking at available analytics tools (see Fig. 1 below).
  2. The importance of log files – A great tool to start with is log files. Reading Sterne’s chapter titled “Sawing Logs” really helped me in asking the right questions before starting to look at the log files of a website (see Fig. 2 below).
  3. Understanding your visitor data – The chapter in the book which talks about “valuing visitors” is great in helping you think about different type of visitors and their – expected or desired – behaviours (see Fig. 3 below).
  4. Measuring stickiness – For commercial websites, the question of stickiness is one of branding and persuasion. First, how can we get people to stay longer with our brand? Second, when people engage with our brand, are we leaving the right impression? Sterne provides some useful formulas to calculate stickiness (see Fig. 4 below).
  5. Calculating conversion – Sterne helpfully describes “conversion” as “whatever moment of commitment happens on your site”. He then goes to elaborate on a number of related metrics: navigation and search impact conversion, depth, recency and frequency, abandonment and checkout (see Fig. 5 below).

Main learning point: Web Analytics by Jim Sterne is a great book for anyone who is either new to the world of analytics or wants to build on a basic understanding. Sterne spends a great amount of time talking about the ‘why’ of certain online metrics and how to best measure them, which I found incredibly helpful.

Fig. 1 – The Gartner Levels of Ambition – Taken from “Web Metrics” by Jim Sterne, Chapter 5, pp. 67-68

  • Level 1: Monitoring – The focus is on website optimisation. You look at server logs to figure out how to get people to stick around longer and give them more of what they want.
  • Level 2: Feedback – Here you pay attention to the site visitor and try to improve the user experience. You look at the amount of time people spend on the site, and you use visitor segmentation to alter the content for different types of visitors.
  • Level 3: Leverage – The focus shifts from the visitor to the customer, and the goal is to increase customer profitability. Customer profiling, dynamic up-selling and cross-selling, and customer satisfaction all come into play.
  • Level 4: Strategic – Now the spotlight is turned around to shine on the company itself in order to optimise the business model: channeling low-margin customers to the competition, tracking lifetime value, and getting into some serious business intelligence analytics.

Fig. 2 – Things that one can learn from looking at log files – Taken from “Web Metrics” by Jim Sterne, Chapter 5, pp. 67-88

  • Search terms used – Use referer log files of the search terms that a person has typed into Google. The URL of the page they were on is recorded, and that URL includes the long string of data the search engine used to produce the page, including the search term itself.
  • Most used entry and exit pages on a site – Server logs show the most used entry and exit pages on a site. These are the pages most people use as a door into a website and the last page they looked at just before they left.
  • Number of hits – Log analysis tools like WebTrends provide a good overview of the number of site hits: (1) entire site, (2) average per day and (3) home page.
  • Number of page views – Looking at (1) the number of page views (impressions), (2) average number of page views per day and (3) document views.
  • Visitor sessions – Looking at (1) the number of visitor session, (2) average number of visitors per day, (3) average visitor length, (4) number of unique visitors, (5) international visitor sessions, (5) visitors from the United Kingdom and (6) visitors from unknown origin.
  • Visitors – Looking at (1) number of visitors, (2) visitors who visited once and (3) visitors who visited more than once.

Fig. 3 – Different visitor types and behaviours – Taken from Web Metrics” by Jim Sterne, Chapter 7, pp. 141-146

  • Unique visitor – The easiest way to track unique visitors is to look at their IP addresses. However, what do you do when your server logs show that two visitors came from two IP addresses, but if they come from the same online gateway or corporate firewall, how do you distinguish between them? Cookies are the best way to work out unique visitors in this scenario.
  • Return visitor – Placing a cookie on a visitor’s computer is the best way so far of telling one visitor from another and knowing if the visitor has been to your site before. Two big drawbacks to this approach though: (1) lots of people are annoyed by cookies and will disable them and (2) many corporate firewalls won’t allow cookies to go through the corporate firewall.
  • Qualified visitor – A suspect is somebody who shares characteristics with your current customers. A prospect is somebody who has expressed interest in your products – perhaps by responding to a promotion. A qualified prospect is one who has the need, the desire, and the means to make the buy. WebTrends defines a qualified visit this way: “Visits by customers who are considered qualified as revenue generators. To qualify, a visitor must access specific pages on a web site that are specified by the system administrator.”
  • Stale visitor – Qualified visitors eventually lose their qualifications when they don’t come back for a spell. The length of this spell is likely to depend on the type of product or service that one is selling.
  • User – A visitor is visitor. They come, they look, they may event become qualified if they stay long enough and dig deep enough. But a user comes to your site repeatedly and for a reason.
  • Churn – Churn measures how much of your customer base rolls over during a given period of time. Divide the number of users who fail to return in a given time period by the total number of users at the end of the time period and you’ve got your baseline.

Fig. 4 – Ways to calculate stickiness – Taken from “Web Metrics” by Jim Sterne, Chapter 9, pp. 169-171

  • You can easily calculate stickiness by multiplying frequency (F) by duration (D) and reach (R) and come up with a benchmark for your content. You choose whether frequency is measured per day, per week, or per month. Duration can either be calculated in minutes or pages. Reach is a percentage of your total potentially interested universe.
  • For example, Your site has acquired a total of 200,000 unique users. Over the past month, 50,000 unique users went to your site. These 50,000 users accounted  for a total of 250,000 visits (average frequency of 5 visits per unique user for the month), and during these visits the users spent a total of 1,000,000 minutes viewing pages on your site. Therefore:
  • Monthly Stickiness = (250,000 visits/50,000 active users) x (1,000,000 minutes/250,000 visits) x (50,000 active users/200,000 total users)
  • Stickiness = Total Amount of Time Spent Viewing All Pages / Total Number of Unique Users

Fig. 5 – Relevant conversion to consider – Taken from “Web Metrics” by Jim Sterne, Chapter 11, pp. 214-248

  • With navigation and search impact conversion, it’s useful to look at the “first purchase momentum”. This will provide you with insights into the clarity of your navigation; what us the actual number of clicks-to-purchase and how does this compare to the minimum required clicks to first purchase?
  • First purchase momentum = Required clicks to first purchase / Actual clicks to first purchase
  • It can be helpful to look at depth – How many pages of your website did people look at? And at what level of detail? Did people look at any specific product detail?
  • Recency and frequency are about looking at the relationship between visits and purchases. As Sterne points out, “Not all buyers are first-time visitors, and not all first-time visitors are buyers. What’s the relationship? What is the pattern of visits for an individual user.” Marketing professors Wendy Moe and Peter Fader wrote a paper in 2001, which looks at the ability to predict purchasing probabilities for a given visit.
  • Abandonment – Sterne provides some very useful metrics in relation to shopping cart abandonment: (1) the ratio of abandoned carts to completed purchases per day (2) the number of items per abandoned cart versus completed transactions (3) the profile of items abandoned versus purchased and (4) the profile of a shopper versus a buyer. The overall abandonment rate is the number of people who commence but do not complete the buying process.
  • Apart from talking about the aforementioned shop-to-purchase ratio, Sterns also look at yield which determines the effectiveness of a multi-step process where incremental costs aren’t available, such as creative / banner testing or the comparison of two paths leading to the same path.
  • Net yield = Total Promotion Cost / Total Promotion Result
  • Cost per conversion = Advertising and Promotional Costs / Number of Sales

Related links for further learning:

  1. http://www.webtrends.com/products-solutions/analytics/
  2. http://www.kaushik.net/avinash/excellent-analytics-tip6-measure-days-visits-to-purchase/ 
  3. http://www.brucehardie.com/talks/ho_cba_tut_art_09.pdf
  4. http://www.moblized.com/blog/simple-ways-to-fix-cart-abandonment


Web Metrics

Looking at key omni-channel analytics – Part 1

Over the past few weeks I’ve been learning about retailers and how they sell via a multitude of channels. The next thing for me now is to learn about some key omni-channel analytics. Let’s start with some questions to ask when measuring omni-channel retail and marketing:

  • What is the impact of online channels on offline and vice versa?  Given the fluid nature of consumer decision-making, alternating between online and offline, it’s important to measure the impact of online activities on offline and vice versa.
  • What does the conversion path look like? – How and where do we convert people into paying customers? Where do we lose people and why? Which channels do contribute to conversion and to which degree?

I’ll start by looking at the impact of online activities on offline conversion. I learned an awful lot from a 2008 blog post on tracking offline conversions by data guru Avinash Kaushik. Before I delve into some of Kaushik’s great suggestions, I want to take a step back and think about potential things to measure and why:

  • What is the impact of online channels on offline conversion? – As a product person, I’m keen to understand the relationship between online activities and actual purchases in-store. This understanding helps me to focus on the right online and offline elements of the value proposition, comprehending which things can be optimised inline to achieve  a specific outcome in-store.
  • How do I best measure revenue impact of my website or mobile app in an omni-channel world? – For example, I’ve got a nice eCommerce site or app with a decent amount of traffic, 20% of which gets converted into actual online purchases. However, what happens with the remaining 80% of traffic that doesn’t get converted? Is my website or app delivering some value to this 80%!? If so, how? Can we measure this?

Now, let’s look at some practical tips by Kaushik in this respect:

  1. Track online store locator or directions – If I track in an analytics tool the interactions with the URL for Marks & Spencer’s store locator, I can start learning about the number of Unique Visitors that are using the store locator in a certain time period (see Fig. 1 below). In addition, I can look at the number of visits or visitors where a certain post code or town has been entered into the store locator. I can take this insight as a starting point to learn more about the people within a certain geographical area that have a tendency to use the Marks & Spencer site and its store locator. Once a user then goes on to click on “Show on map” or “Enter an address for directions to this store” you can make some inferences about the user’s intentions to actually visit the M&S store in question.
  2. Use of a promo code – Using an online voucher or promo code is an obvious way to combine online tactics with offline conversion (see a John Lewis example in Fig. 2 below). One can use the promo code as an event in an analytics tool and capture data on e.g. the number of codes or vouchers exchanged in-store vs the number of vouchers sent. I guess the only downside is that you’re unable to capture many interesting insights if a user doesn’t redeem her voucher or code.
  3. Controlled experiments – Running controlled experiments was the bit in Kaushik’s piece that intrigued me the most. The idea behind these experiments is to validate retail ideas in the real world (the same as “experimentation” in a ‘lean’ context, which I’ve written about previously). As Kaushik explains, “the core idea is to try something targeted so that you can correlate the data to your offline data sources (even if you can’t merge it) and detect a signal (impact).” I’ve included some prerequisites for successful experiments in Fig. 3 below. One of them is to isolate the experiment to different states that are far from each other. As Kaushik explains, this way you are isolating “pollutants” to your data (things beyond your control that might give you sub optimal results).

Main learning point: Learning about how online can affect offline conversion felt like a good starting point for my getting a better understanding of the world of omni-channel analytics. The next step for me is to find out more about the impact of offline on online conversion: how can we best measure the impact of what happens offline on the conversion online?


Fig. 1 – Screenshot of the results of Mark & Spencer’s store locator 


Screen Shot 2014-12-11 at 08.10.26


Fig. 2 – Sample John Lewis voucher – Taken from: http://www.dontpayfull.com/at/johnlewis.com



Fig. 3 – Some points on prerequisites on controlled experiments by  (online) retailers:

  • Clearly defined customer segments of a decent size to quantify the impact of the experiment.
  • Design the experiment in such a way that the results can be isolated and compared in a meaningful way (e.g. IKEA umbrella sales on a rainy vs on a sunny day).
  • Random selection of customers in the control group (who get the current offering) and the treatment group (who get the experimental offering).
  • Clear assumptions and hypotheses which underpin the experiment.
  • Create a feedback loop which allows you to measure or observe how customers respond to different experiments.

Related links for further learning:

  1. https://support.google.com/analytics/answer/1191180?hl=en-GB
  2. http://www.kaushik.net/avinash/multichannel-analytics-tracking-online-impact-offline-campaigns/
  3. http://www.kaushik.net/avinash/tracking-offline-conversions-hope-seven-best-practices-bonus-tips/
  4. http://online-behavior.com/analytics/multi-channel-funnels
  5. http://atlassolutions.com/2014/04/07/atlas-insights-series-is-device-sharing-a-significant-problem-in-ad-delivery-and-measurement/
  6. http://www.kaushik.net/avinash/web-analytics-visitor-tracking-cookies/
  7. http://www.kaushik.net/avinash/excellent-analytics-tip6-measure-days-visits-to-purchase/
  8. http://www.practicalecommerce.com/articles/74215-14-Key-Ecommerce-Events-to-Track-in-Google-Analytics
  9. http://www.shopify.co.uk/blog/15514000-14-ways-to-use-offers-coupons-discounts-and-deals-to-drive-revenue-and-customer-loyalty
  10. http://en.wikipedia.org/wiki/Experiment#Controlled_.28Laboratory.29_experiments
  11. http://www.businessinsider.com/data-revolutionizes-how-companies-sell-2013-4?IR=T
  12. http://sloanreview.mit.edu/article/how-to-win-in-an-omnichannel-world/
  13. https://hbr.org/2011/03/a-step-by-step-guide-to-smart-business-experiments


Learning more about omni-channel retail

I recently did an online retail course and one of the first things that our teacher Ian Jindal said was: “forget thinking about retail in terms of the classic ‘awareness to conversion’ funnel, consumers don’t think that way!”. He went on to explain that consumers nowadays come to a purchasing decision through a multitude of – online and offline – channels, and that they might well skip any customer journeys or experiences carefully designed by a brand or retailer.

As Agile Architect Jason Bloomberg explained in a recent Forbes article: “Omni-channel represents more than simply adding mobile and social media to the mix. It heralds an interconnectedness among touchpoints that from the perspective of the consumer, blurs the distinction among channels.”

It means that even if the traditional sales funnel with its classic four stages – awareness, consideration, intent and decision – still exists, its inputs are likely to come from a variety of online and offline sources. These sources are typically hard for marketeers to fully control or influence. I like how back in 2009, McKinsey described how consumers are shifting from a funnel to narrowing down their purchase options through more of a ‘loop’ (see Fig. 1 below).

These are some of the things that I learned about how to best cater for the changed ways in which consumers make purchasing decisions:

  1. Zero Moment of Truth – Google’s so-called Zero Moment of Truth is all about people making purchasing decisions at the “Zero Moment”; the precise moment when they have a need, intent or question they want answered online. These questions can be as diverse as “Which family car has got the most boot space” or “Which jeans do make my bottom look small?”. The point for brands is to be able to answer these questions at the right time and for the right customer and to be at the consumer’s front of mind when she looks for answers.
  2. Deliver on your promises – In our course, the instructor Ian Jindal stressed the importance of brands delivering on their promises to the consumer. For example, if a retailer promises to deliver within a certain time period, it can’t afford to fall short on this promise. Companies like Amazon and Ocado are all about delivery, so they have to make sure thy get this capability absolutely spot on. A recent MCA report mentions the growing need for omni-channel retailers to increase the number of choices of service offerings available to customers. This inevitably increases operational complexity, which means that back-end systems need to be deliver on the promises made at the ‘front-end’.
  3. The importance of in-store – Don’t think that because there’s mobile and online consumers have stopped caring about stores and their in-store experiences. People are likely to check online first before buying in store and vice versa. It seems that particularly for product categories such as furniture and fashion, bricks and mortar stores still have a significant role to play. A recent AT Kearney study among 3,200 UK and US consumers shows that about 40% of their shopping time is spent online and on mobile, leaving about 60% of shopping time spent in-store. It’s things like iBeacons, QR codes and in-store scanning that help retailers in linking the different channels that consumers interact with. A great example is John Lewis StyleMe, an in-store virtual mirror that lets consumers try on clothes from its online catalogue without people having to undress.

In tandem with having the aforementioned basic elements in place, it’s important to have a clear proposition and to implement this proposition consistently across all your channels. In the course, Ian Jindal talked about the following three perspectives to selling, and which  impact on the content and execution of one’s proposition:

  • Customer segment – Your proposition can be all about your customer; demonstrating customer needs, what makes them tick (and why), etc. A good example is House of Fraser who state that impact on customer satisfaction is a key driver for their development roadmap.
  • Product or service – You offer an excellent product or service, delivering the value for money that your (target) customers are looking for. For example, Aldi – who historically focused more on on price than product – has been growing its product assortment over the last few years to meet its customer needs.
  • Operational capability – Your proposition is all about your operational excellence. A good example of this are Amazon who clearly don’t have the most beautifully designed product pages but rely for a large part on their great service and delivery times.

I guess these are the main two things that I learned about having a retail proposition:

  1. Your proposition is always there – A proposition is something which, as a retailer, you don’t shout about all the time but which should underpin everything you do. As Ian Jindal put it: “proposition isn’t something you market, it’s something that you hear behind your back”. He mentioned Pret A Manger as a good example of a company that has a clear proposition and which is reflected in a lot of the things that it does, both in-store and in its communications (see Fig. 2 below).
  2. Build your proposition on at least 2 out of 3 elements – To build a successful and sustainable retail business you need to have at least two of the these three elements in place: customer segment, product or service and operational capability. John Lewis are a good example, with their focus on both product and operational capability (see Fig. 3 below).

Main learning point: Irrespective of the different channels that are relevant to a retailer and its customers; there are a number of key elements that need to be in place regardless. I believe that there are three key lenses which are applicable: customer segmentation, product or service and operational capability. It was important for me to learn how these lenses impact the retail proposition and how retailers position themselves. The next thing for me to learn about are the key metrics that play a role in retailers’ omni-channel approach to selling.

Fig. 1 – A graphic outline of “The consumer decision journey” by McKinsey & Company – Taken from: http://www.mckinsey.com/insights/marketing_sales/the_consumer_decision_journey




Fig. 2 – Example of Pret A Manger’s freshness proposition – Taken from: http://blog.kj.com/slow-fast-food/


Fig. 3 –  Examples of Waitrose’s values with respect to product and operational capability

  • Product safety
  • Product provenance and integrity
  • Sustainable products
  • Local, regional and british sourcing


Related links for further learning:

  1. http://tech.co/6-top-dynamics-creating-perfect-ecommerce-website-business-2014-11
  2. https://www.thinkwithgoogle.com/tools/customer-journey-to-online-purchase.html
  3. http://www.fitch.com/think-article/dreaming-exploring-locating-understanding-the-new-customer-journey/
  4. http://vonbismark.com/todays-multichannel-consumer/
  5. https://econsultancy.com/blog/65530-q-a-selfridges-simon-forster-on-the-brand-s-multichannel-retail-strategy
  6. http://futurethinking.ee.co.uk/from-funnels-to-loops-how-mobile-has-disrupted-the-consumer-purchase-cycle/
  7. http://dauofu.blogspot.co.uk/2013/01/multi-channel-analytics-why-every-web.html
  8. http://dauofu.blogspot.co.uk/2013/01/8-simplest-ways-to-track-multi-channel.html
  9. http://www.getelastic.com/omnichannel-vs-multichannel-and-the-store-of-the-future/
  10. http://www.forbes.com/sites/jasonbloomberg/2014/09/30/omnichannel-more-than-a-digital-transformation-buzzword/
  11. http://dauofu.blogspot.co.uk/2013/01/multi-channel-analytics-why-every-web.html
  12. http://www.mckinsey.com/insights/marketing_sales/the_consumer_decision_journey
  13. http://www.mca.org.uk/news/updates/omni-channel-in-2014-retail-at-a-crossroads
  14. https://www.udemy.com/blog/unique-selling-proposition-examples/

Learning about relational data models and databases

The other day I did a mini-course online about relational databases with Stanford Online. The course is taught by Stanford Professor Jennifer Widom and is aimed at relative database novices like me.

Jennifer started off the course by providing some useful characteristics of a Database Management System (‘DBMS’):

“Efficient, reliable, , convenient and safe multi-user storage of and access to massive amounts of persistent data”

  • Massive scale – Jennifer talked about databases with the potential for terabytes of data.
  • Persistent – The database always outlives the programme that will operate on its data.
  • Safe – Power or hardware failures for examples shouldn’t affect the database.
  • Multi-user – Jennifer talked about concurrency control in this respect.
  • Convenient – The convenience of DBMS comes for a large part from ‘physical data independence’, i.e. operations on the data are independent of the way in which the data is laid out.
  • Efficient – This relates to database performance, e.g. thousands of queries/updates per second.
  • Reliable – Jennifer pointed out that with databases 99.999999% up-time is critical.

Jennifer then went on to break down the structure of a database into to its most basic elements, something which I actually found quite helpful:

  • Database is a set of named relations (or tables)
  • Each relation has a set of named attributes (or columns)
  • Each tuple (or row) has a value for each attribute
  • Each attribute has a type (or domain)
  • A database schema is a structural description of relations in a database
  • An instance contains the actual contents at a given point in time
  • “Null” is a special value for “unknown” or “undefined”
  • A key is an attribute whose value is unique in each tuple (e.g. a student ID) or a set of attributes whose combined values are unique
  • Having unique keys helps to (1) identify specific tuples (columns) (2) fast indexing of the database and (3) refers to tuples of another key

In the session, Jennifer gave a first flavour of how to create relations (table) in SQL, which is commonly used database programming language. I found below example which Jennifer gave to be very helpful:

Create Table Student (ID, name, GPA, photo)

Create Table College (name string*, state char*, enrolment integer*)  * These are all attribute types

She then went on to talk a bit about querying relational databases, outlining common steps in creating and using a (relational) database:

  1. Design schema – Create a schema which describes the relations within a database, using a Data Definition Language (‘DDL’)
  2. “Bulk load” initial data – Load the initial data into the database
  3. Repeat – Execute queries and modifications

Jennifer then finished off by giving an example of a SQL query that returns the follow relation: “IDs of students with GPA > 3.7 applying to Stanford”. In SQL this query would look something like this:

Select Student ID

From Student, Apply

Where Student ID = Apply ID

And GPA > 3.7 and college = ‘Stanford”

Main learning point: I’ve clearly got my work cut out for me, but I felt that this first mini-course on relational data models and databases was incredibly helpful and easy to understand. I’m hoping I can build on these first foundations and understand more about how to best structure SQL queries and interpret the relations that these queries return.