Avoid GIGO – three rules to keep your database clean and usable

In the late 1950s, US Army mathematicians coined the phrase “Garbage in, garbage out” (GIGO) when programming the world’s earliest computers.  The British equivalent – ‘Rubbish in, rubbish out” – is a well-known adage for anyone who works with a database. Fast forward more than half a century and unfortunately GIGO is still rife – and even more complicated given the huge volume of data we now manage and use in our daily working lives.

I know from experience how unwieldly a database can become if you don’t follow some simple rules to capture and categorise information in the right way.  At ToucanTech we’ve run more than 300 database clean-up, amalgamation and migration projects for schools in the UK and internationally – and although we’ve built mapping templates and scripts to make these data projects more efficient, there’s never a magic wand to instantly unpick a messy set of data.

But it’s also never too late to start following some good data practices!  Here are my top 3 rules – whether you’re managing a simple spreadsheet or a database software. Even if you don’t have time to sort out your legacy data, you can start following these rules for new data in your system.

Rule 1: Use unique IDs

Every person in your database should have a unique ID reference. If you want to add new information to someone’s database record (e.g. an import of updated contact details or email open rates) you want to be sure you’re appending the data to the right record – and the only sure-fire method is to match data using unique reference IDs.

Here’s a simple example of four individual records in a database. Imagine you’re creating a list of school parents for your termly newsletter. From the data below, can you ascertain if the two Li Jings and the two David Whites are the same person or different? You might be able to make an informed guess if you know your parent population, but there’s nothing in the data below to give you a categorical match.

If you add an ID number to each of the records you can see immediately that the two rows of data for Li Jing are stored with different ID numbers – meaning these are probably two different parents – whilst the two rows of data for David White share the same ID number, implying this is the same person.

When you’re creating a new record, or running a data import, it’s important to ensure individuals have unique IDs and that you only create one record for each person (no duplicates!) in your database.

If you export data out of your database (e.g. to an Excel spreadsheet) to manage data for an activity like an event you must always include the unique IDs for each record so that when you import the data back in to your database the system can unambiguously match the new information to the existing database records.

Rule 2: Don’t embed records together

Sometimes it can seem quick to store family members or married couples together in one database record, but you can build up all sorts of data problems if you do this!  Joint records make it difficult to ascertain which bit of information refers to whom and you won’t have a unique ID for each person (see Rule 1 above!)

Here’s an example of a joint record to illustrate the difficulty of identifying which person each piece of information relates to:

In the GDPR era it’s even more important to store your data accurately.  If you’re capturing consent options, you need to know specifically which person has opted-in/ out – which can be difficult to track if individuals are grouped together in a record.

Rule 3: Create pick-lists

Validating data entry using pre-defined pick-lists is a smart way to future-proof your database long-term.  One of the biggest issues we’ve seen across school databases is the wide range of data entry applied by different team members at different times. Forcing people to use pick-lists as much as possible makes it more likely that information will be added in a consistent, fixed, format. This consistency is critical if you want to run search filters in the future and segment people according to specific criteria.

Here’s an example of two data fields – one with a fixed pick list and one with free text entry. The pick list forces you to select a consistent input (and is faster to enter) – you’ll be able to filter using these inputs in the future and easily group records together.  The free text enables you to write more context, but you won’t be able to filter or group this data in a quantified way for analysis or segmented communications in the future. We’d recommend selecting from a pick-list and then adding extra free-text info as a note.

Kate Jillings is the co-founder of ToucanTech, a community database and website software for schools. She’s passionate about providing a practical and beautiful product and helping schools to run effective marketing, fundraising, alumni and careers activities.