Reading Time: 7 minutes
The final project at my last role involved extracting seven years of learner data for 80,000 learners from an LMS that was being decommissioned. The shock of our vendor’s £150,000 quote to both extract and host 3TB of learner data led me to explore a robotic process automation (RPA) solution using a low cost browser plugin, UIVision. This article explores how I crafted and fine-tuned a suite of RPA scripts to automate the data extraction and reflects on the challenges faced and lessons learned.
Extracting 7 years of learner data
I joined a training provider during the final year of the three year IT rationalisation and LMS migration project. The final step of this was to extract and archive seven years of learner data from the old SaaS LMS platform, in line with the government data retention policy for apprenticeship learner data. The SaaS vendor we were moving away from had a data export function, but it was limited to formal learning data (achievements and progress) and not more informal data such as learning journals and notes. Given that our tutors and coaches had been storing critical data in these areas too, we needed to export from all parts of the system. The upshot of the conversation that followed with the vendor was a request for £100,000 for them to provide a complete data extract for 7 years worth of learner data and more than £50,000 for annual hosting of 3TB of data. Daylight Robbery was one of the more ‘polite’ terms used in our internal discussions that followed!
I was convinced there must be a better way, and didn’t buy the vendor’s explanation that the task could not be automated. I asked my team to demonstrate the data export process for a single user, and after trying it out myself, I was convinced that some simple browser-based task automation tool must be able to do this as a batch automation process. From my time in software testing earlier in my career I knew that such automation tools existed, and more recently I had interacted with robotic process automation (RPA) teams who were similar stuff. My belief was that an automation tool would be able to record a task that could then be repeatedly run at volume against a list of users. So I set about creating a proof of concept to test this out.
Creating the RPA solution with UIVision
I looked around at a number of free and commercial tools and settled on a tool called UIVision. It was a browser plugin that worked in Chrome and Edge, had a recording feature which captured a sequence of user interactions in the browser, saved this as a JSON file which could then then be manually refined as needed, including the ability to loop through a CSV file containing user identifiers so we could run the process against each of the 80,000 users.
The free browser plugin was built on Selenium, an open source testing tool I remembered well from way back. There was a commercial option that cost under £100 for a lifetime license, and this included additional functionality such as the ability to capture keyboard input and exercise scripts at high volumes, which was what I needed. The small payment was a no-brainer.
I then set about creating a series of scripts for the various tasks involved in downloading the learner portfolio (the formal learning data) and the informal notes and journal files. Creating the basic scripts was easy enough. They needed some fine tuning to identify the right page elements to click on, which involved crafting xpath statements, a task made easy using Chrome’s built-in HTML inspection tools. I then had to set the scripts up to import user identifiers from a CSV file, and then save validation data back to another CSV file to confirm when the data export had been completed for each user.
The routine would have to be run for 80,000 learners. Importantly, the scripts had to be run at ‘human pace’ so as not to overload the vendor’s servers, if we ran the scripts too fast it would be a similar pattern to a denial of service attack. Despite my immense frustration at being asked to stump up £100,000 for a basic automation job, we strived to be a ‘good customer’ and gave reassurance to the vendor that they would barely notice the server load. Running the scripts at human pace would take a month even with the scripts running 24 hours a day, 7 days a week! But it is what it is, so we went ahead.
Of course it wasn’t just a case of running the scripts and watching the data download. As the scripts started running through batches of users, various hurdles had to be overcome. This would be things like finding unexpected file types to download, or finding that a new version of the vendors software changed the page layouts, forcing script rewrites. The scripts went through numerous minor improvements so version control of the scripts became increasingly important. I started tracking the JSON changes in Github, which also safeguarded against script loss or deletion, as this work was all being done locally on my PC.
The other major consideration was data storage and security, which meant working with the IT team to ensure the data transfer and storage method was secure, and that we had capacity for the expected 3TB of data. The data itself had to be saved in a file structure that was intuitive and accessible so that the quality and audit teams would be able to find learner data at any point over the next 7 years. This required creating PowerAutomate scripts to move the downloaded learner data into a folder structure that used an intuitive naming convention for a future non-technical audience to be able to easily find a specific record for a given learner id or name from a certain year.
So the scripts took about a month to run, between myself and a colleague. They would need checking every few hours (while awake!) just to keep an eye on things. At times we had to slow the scripts down, working with the vendor to adjust the speeds to an acceptable rate. At one point the vendor actually stepped in to execute the formal portfolio export directly on their servers, it turned out that all customer export requests joined a single server process queue, so our thousands of requests were causing other customers to have to wait days just for a single learner export. So we had to deal with these unexpected things as and when they cropped up throughout the month. Once downloaded, other team members audited the downloaded files, performing spot checks to ensure they matched what was in the system.
Reflections on an RPA project
The script creation and ongoing refinement in response to the challenges that constantly cropped up was a pretty big job. Given that it saved us substantially more than my annual salary, it was well worth the effort! But more than that, it just felt important. As a learning business and a customer of a SaaS LMS vendor, we didn’t have a big in-house technical team, this function was largely outsourced by using SaaS. But that did leave us at the mercy of the vendor as our data was locked into their system, and they wanted £100,000 to release it! I refused to be held over a barrel and I wanted to make this RPA project succeed as a point of principle as much as anything. In the spirit of empowering other customers to break free of vendor lock-in, these RPA scripts are all now freely available on Github for anyone who wishes to use them.
This project was my final piece of work for the training provider before moving onto my next job. Reflecting on those final couple of months, I loved every minute of it. My new boss suggested it must have been tedious as hell working on automation scripts for a month, but it wasn’t at all. Maybe it was that I was getting my hands dirty again, creating scripts, crafting xpath statements, understanding and navigating the browser DOM. I’d not done that stuff in years and it was an absolute pleasure to get stuck back into it.
It definitely appealed to the part of me that enjoys methodical, rigorous work and problem solving. That’s why I loved being a software tester for the first decade of my career and my love for that kind of work has never waned. I was lucky to have a colleague who was equally methodical and rigorous who shared the work of running the scripts and validating the results.
I learned some really useful RPA technical and process management skills and am already using UIVision again in new contexts. Most recently I’ve used it to create a script to run through a batch process that a customer wanted to perform in Thrive LMS (my new employer). I’ve also started using UIVision in my voluntary work with a youth cycling club, extracting rider results data from the clunky British Cycling events platform. UI Vision is a great little browser plugin and I’m looking forward to putting it to work in all sorts of future scenarios.
Tips for RPA projects
Some final tips that I would refer back to if running a similar project in future…
- Consider decommissioning when buying. When you procure a new LMS provider, check the data export features, understand your exit strategy and quiz them on the hidden decommissioning costs.
- Search out open source and low cost tools. Don’t let RPA consultants tell you that you need some expensive tool to do this stuff, it’s really not rocket science and basic tools are up to the job.
- Learn how to use the features of your browsers HTML inspector, for example to find page elements and automatically create xpath statements.
- You don’t need to be a seasoned programmer, but do brush up on how loop functions work if reading from or writing to CSV files. Use ChatGPT or similar for troubleshooting errors in your code.
- Use two monitors, one to display the scripting interface and one to watch the automation tasks running in the browser.
- Factor in time up front for preparation and learning, this includes script prototyping, technical feasibility work and learning how to use the tool.
- Be open with your vendor about your plans and intentions. They will have security tools that detect robotic processes and are probably within their rights to block access to your company if they detect such behaviour coming from your IP addresses. Play nicely.
- Put your RPA scripts under version control. Really important, I lost count of the number of times I rolled back to prior versions of scripts while fine tuning then.
- Work with your Infosec and IT teams to ensure you have robust and security data transfer and storage.
- Ensure you have an audit trail of what was downloaded and when, using a CSV output file your scripts can write to.
- Do human spot checks of downloaded data to ensure it all looks as intended, this will give real peace of mind.
- Expect the unexpected. This project experienced lots of little curveballs and challenges from start to finish. Add plenty of contingency time to your schedule.