Data science: the new skillset for learning technologists

Coloured_GraphFor all the talk of big data being the next big thing in learning technology, few people mention that in workplace learning there just aren’t any examples of big data to speak of. The data collected just isn’t at the same scale. However, big data has led to an explosion in data analysis tools and techniques that learning technologists can use in their work. Throughout 2014 I’ve been dipping into data science MOOCs, learning the basics of R programming, and thinking about how to apply this within learning and development. These are some of my initial thoughts and notes.

Can understanding big data techniques help us to improve learning outcomes and performance?

Big Data as a term started appearing following the success of online services such as Facebook, Google Search and Twitter which gather data on hundreds of millions of people. Data including their likes and dislikes, online behaviours, website usage patterns, shopping patterns; it all has value and can be sold to the highest bidder. Now that users can also register for other online services using their Facebook, Twitter or Google logins, they literally leave a trail of ‘digital exhaust’ behind them. This data is all collected and analysed on the assumption that it is valuable to someone, somewhere, or at least may be one day. The data gathered by just one service like Facebook amounts to over 500 terabytes per day! This is the scale that big data operates at, and the harvesting of personal data is BIG business. Jaron Lanier is not wrong in suggesting that next time you post a status update, they really should be paying YOU!

Edtech and learning technology entrepreneurs clearly want a slice of this action, hence the buzz. However, even the largest organisations only have relatively small amounts of learning related data. Even an organisation with half a million employees will only have learning related data measured in little old Gigabytes. That’s not big data at all.

However, if there is one big takeaway from the big data world then it is the renewed focus on data analysis and data driven insights. Take a look at any MOOC catalogue to see the popularity of data science courses.

Your first steps in building a data science skillset

We are fortunate to have a wealth of data science tools at our fingertips: statistical programming languages like R and data analysis tools like Hadoop, Pig and Tableau. Data analysis can ultimately be performed on any data set, large or small, so many of these tools and techniques can be applied in the context of learning and development. Many of the available tools are open source and free to download, with good tutorials and strong communities. Couple that with a data science MOOC and you’ll be off to a flying start.

Handily, data science already intersects with learning and development in disciplines such as Educational Data Mining and Learning Analytics. EDM started life back in about 2000 when web based training took off in academia, hence is more research driven, while Learning Analytics is a more recent discipline and driven more by learning practitioners and vendors. There is over a decade of research and books on EDM and Learning Analytics, yet this extensive body of knowledge has largely been ignored by the corporate L&D world until relatively recently. If a MOOC is where you dip your toe in to data science, then this body of knowledge is where you jump in, immerse yourself, and learn! And use the free tools to try out ideas, they are there to help you learn, and their communities will support you.

It’s worth remembering that data science is not something that a learning technologist can just pick up easily; you can’t just download a copy of Tableau and expect to gain amazing insights into learner performance. Learning technologists need to understand the basic principals of data science but many organisations may be better off employing an experienced data scientist to really do an analytics programme justice. Or if you have a business intelligence team, tap into them.

Good data science always starts with a question. Why are our learners failing particular courses? How can we identify learners who need support? Who is at risk of failing? Where is engagement dropping off? What are the characteristics of our most successful learners? Which learning paths lead to the best outcomes? How do successful learners impact organisational performance?

These are types of questions that learning analytics and data science will answer. Even if you aren’t the person to do the analysis, an understanding of the principals of data science should be on every learning technologist’s CV.

Facebook Twitter Linkedin

8 Replies to “Data science: the new skillset for learning technologists”

  1. I like it Mark – particularly the point that not everyone will necessarily have these skills, but you have to outsource them, and still have a grip on the principles.

    Question: we’ve had data for a long time in one form or another. What’s to stop L&D obsessing about activity data (bums on seats / course completions) rather than trying to uncover the more difficult to find impact data (how learning affects workplace performance)?

    1. Thanks Donald. I’ve learned this year that data science is really hard to understand and do well, as you’d expect! But understanding some of the key principles has stood me in good stead when talking to clients. One of the messages we’ve been giving people is to stop obsessing over so-called ‘vanity data’ and think about what is really meaningful data instead. Good data science will start with a question. Why are our students dropping out? Where does engagement drop off in our online courses? That’s where you need to start if you are using data to drive improvements. Bums on seats and completion data will always have a place I guess, especially among compliance training box tickers 😉 Like you say, it’s simple to measure. But there seems to be an increasing acceptance that it’s not necessarily driving performance improvement, and that starts all manner of interesting conversations 🙂

  2. Interesting comments on Twitter in response to this article, my response is too wordy for Twitter though! Donald Taylor asked whether in-depth data skills are required, or just an understanding of the concepts. To which Donald Clark questioned whether data is just for software and algorithms, not trainers.

    My view is that an understanding of the concepts is definitely required by learning technologists. Software and algorithms have their place, and are heavily used in adaptive learning tools to drive personalised learning experiences. Cogbooks does this, which Donald C is involved with, so I understand this position.

    However we live in a world where data is collected by a vast amount of systems. For example, we did a project for a university recently which identified where learner-related data was stored across ten or more systems. The university wanted to improve student retention, and was trying to identify where learner engagement dropped off in order that they could make early, targeted interventions. Data was the oil in making that work. But it would have been a futile exercise without an understanding of the data science basics.

    Maybe in a world where students are all using adaptive, personalised learning tools we wouldn’t need to do that. But we are not in that world yet, the tools are young and to my knowledge are not widely proven. There are also a variety of ‘analytics’ tools available to help interpret data, but even with these you cannot just install them and expect miracles. They are only as good as the data you feed them with, and an understand of data science concepts will ensure you can identify good and bad data, and clean it up appropriately before using it to aid decision making.

    This stuff all goes back donkeys years to be honest, it’s nothing new. There are legions of data warehousing people, business intelligence people, that have been doing this stuff for years. But L&D folks are way behind the curve on this. It’s time we used the data at our fingertips to help us improve learner experiences and performance. If you want to put learning culture at the centre of organisational performance, then understanding learning-related data is the way to go about it.

  3. Great points made all round, both in the comments and on twitter.

    I’m delighted that more L&D professionals are starting to get serious about making better use of data. My worry is that we’re going to make all the mistakes that other, more mature professions have already made and learned from and not take advantage of the fact that many of the problems we’ll encounter are well understood and, in many cases, solved.

    Take just one example: surrogate outcomes. It’s well known in medicine that surrogate outcomes (cholesterol levels) are less valuable than ‘real world’ outcomes (heart attacks). In L&D we might start patting ourselves on the back for getting better working with outcomes like observed behavior change resulting from an intervention, while still not appreciating that it’s change in performance outcomes that matter (occasionally the behaviour change is the real desired outcome, but that’s rarely the case).

    It’s not just the ability to work with the data, merge data sets, clean them up, run regressions, identify the strength of correlations, hypothesise causal links etc, that’s important. It’s equally important to understand research methodologies and appreciate the relative strength of different types of evidence.

    How do I select my data sample so it’s truly representative? How do I control for bias? Which algorithms or statistical tests are most appropriate? What do these results really tell me? How do I verify an initial finding? These are not trivial questions and if we want to really make a difference, then L&D is going to have to dive in.

    Improving the awareness and skills of people currently working in L&D is a great start, but if we’re going to move things forward then people from different backgrounds to the current L&D profession are going to be needed and given a voice.

  4. Learning technologists, if they are to be more effective and influential, must become fluent with data. What data? Where is the data? Are we collecting it anew, as in most needs studies or are we looking to existing stores of big or not so bid data (as in usage figures or drop out numbers)? How do you get your hands on it all? How do you make sense of it? How do you communicate it?

    The question is how much to do on your own, how much to rely on partner data scientists?

    One more question: how much can we, should we, expect of the LMS that are even more omnipresent in most organizations.

    I was always quick to encourage learning professionals to rely on experts for video or Articulate or editing. But this is different.

    I think we need to know a lot, so we know what to ask of partners or the LMS. We need to know more than we do now, but not everything, certainly. Start with intense curiousity about our people, their needs, their tasks, their contexts. Seek data for revelations.

  5. Great debate, this is something I have really been thinking about since Tin Can., but you now got me thinking in much larger terms. There’s obviously possibilities for demonstrating ROI providing you have business systems managing performance, KPIs etc. Have you any experience or thoughts on that?

Leave a Reply

Your email address will not be published. Required fields are marked *