How you can get your Analytical Data Mart solution off the ground
ADM (Analytical Data Mart), SCV (Single Customer View), Sandbox, Data Warehouse; is there any real difference between them, and how can they be deployed?
Every company has its own unique data infrastructure, built to face different challenges, and therefore holding their data differently. All have (or should have) a need to achieve one view of the ‘truth’ that could be a single customer account for an on-line retailer, a customer policy for an insurer, or one customer record for a home shopping company.
Traditionally this linking has been done either through external data bureau’s, or by building match-keys for internal databases, but now in comes ‘big digital data’, and the need to add multiple sources of data, of varying formats, into one consistent form. The nature and sources of this data often change, and as a result companies are increasingly struggling to maintain one view of the truth.
This article explores routes to overcome this problem.
All companies will have their own unique data set up, but below is a simple example of how many companies tend to arrange their customer data – or would like to!
Let’s start by level setting some common definitions:
- ADM – a table or set of tables in a fixed format, but not held in a database structure, used to build reports and insight tools and deploy them
- SCV – a database solution, often with multiple tables, used for holding customer data for selections and reporting
- Sandbox – an unstructured area on a network where analysts can access raw data easily
- Data warehouse – a structured repository for raw data
In terms of the data flow inside a company, this could hypothetically look like this as a solution:
However there are many alternative process flows that are commonly used.
As the data comes in from feeds (live or batch), it generally gets stored in some form of database in a data warehouse. The term ‘data warehouse’ is used as there are often multiple tables to represent the data in its raw form, but then it is often manipulated in some way to make it easier to query.
In terms of transactional data, such as found in home shopping, this manipulated view is often an SCV, and could be held by a bureau, who would run name and address deduplication, and store lists of emails and other data.
A Sandbox comes into play as SQL databases, which are where the SCV would normally be held, are great at reporting and holding volumes of data, but they are not so good at pure play analytics. The Sandbox is an area that is usually not so well developed. The Sandbox can either be produced from data in a data warehouse or from the raw data.
As discussed, SCV’s are normally held within databases. There can be scripts and programs in place to feed the data through the system so that it can be easily reported on.
But consider the following:
- Incoming data formats change
- An unknown entry happens on a website
- A new source of data comes in
- An existing report is not currently being produced
Each of these events would require agreement to a new (cross team) data specification, with design stages, handovers and UAT. Each change would require re-calibration of any deduplication and matching process which only increases the problem.
Due to the complexities in merging data, we often see that this redesign is just not done. Recently we worked with a large global client who thought they had Xm active customers when they actually had 20% fewer after we de-duplicated. This kind of difference is often caused by the SCV being maintained on one channel, rather than across channels.
Now, there are benefits to the SCV within a database approach, for instance back-ups, quality control, disk space control, but when this is getting in the way of creating value through developing customer insight – is it really worth it?
However with storage being relatively cheap these days, back-ups can be automated, and files can simply get big without impacting the user. This leaves just quality control as the argument in favour of the database structure, but this is often made simpler by having a single department control the end to end data process.
Hence our case for an ADM solution, to be maintained without a database infrastructure.
Our definition of an ADM is a data table (or set of data tables), that can be derived directly from any source(s) of data. It can also be de-duplicated across match keys, or other unique identifiers (please read our article “are you being duped by your customers?” on our web-site for more details).
As we can involve multiple data sources, we can add fields for analytical purposes, and change layouts as required. We often infill missing data, for instance deriving gender from title, or region from postcode.
From now you have a consistent view, so that the data can be used for:
- cross-channel reporting e.g. sales in response to outbound emails, through web browsing…
- developing statistical models
- making customer selections (with some post selection cleansing)
And using this as the one source of customer data means that we can ‘unify’ definitions across teams.
All this may sound strangely familiar; is an ADM just an SCV in disguise!
And you are right in some respects, but what won’t be familiar, apart from the technical aspects, are the timescales; by working in a more flexible format, we can deliver some basic forms of these ADM solutions in days, not the weeks and months it takes to set up a full customer database.
And changes then become so much easier, as we can adapt to them as soon as we have the new data feeds. The ADM can be run as regularly as you like, but we can build in quality controls to ensure that the ADM is up to date, and fully populated, so no more erroneous customer counts!
Actioning insight tools, like applying propensity model scores, can also be easier as you are applying a model to the dataset you already have; URN’s or customer id’s selected for an activity can be handed over easily for mailing purposes. And the file extracts (while large) are also a good way of keeping track of customer history.