Jon McClure & Ben Welsh, Reuters, about automating hundreds of charts each week

Jon McClure and Ben Welsh from Reuters spoke at our Unwrapped conference about “Bare Facts First – how the reporters, editors and computer programmers at Reuters use Datawrapper to create, automate and disseminate hundreds of charts each week.”

Jon McClure is the EMEA editor for graphics at Reuters in London. He’s also the minder of Reuter’s custom-built pipelines that publish hundreds of Datawrapper charts every week to media clients around the world and to the global readership of Reuters dotcom. He'd rather be bagging a peak somewhere outside the city, but for a day job, it’s not bad work.

Ben Welsh is an Iowan living in New York City. At Reuters, he led the creation of a new software system for automating charts via the Datawrapper API. In a previous job at the Los Angeles Times, he partnered with Datawrapper to completely overhaul how the newspaper produces graphics for print and the web.

Watch their talk here:

00:00 – What's Reuters?
04:00 – The problems that Datawrapper solved
05:29 – Distributing Datawrappper charts
08:20 – 800 reporters onboarded
10:27 – Automating the whole data to charts pipeline
17:44 – 20,000 charts published
21:13 – Updates to the Python Datawrapper library
23:58 – Q: Teams integration?
25:43 – Q: Byline in automated charts?
26:39 – Q: Cleaning process for data?

Full transcript

Introduction

[00:00:04] Ben Welsh: Hey, thanks for having us. Thanks for coming everybody. It's a real honor to speak at this excellent conference. I've been tuning in as much as I can over the last couple of days, and it's been super impressive and inspirational to hear. Should we get into it, Jon?

[00:00:19] Jon McClure: Let's do it, Ben.

What's Reuters?

[00:00:21] Ben Welsh: Right. I think they already got the intros. They know who we are. We can buzz those. Jon, you want to tell people what Reuters is?

[00:00:29] Jon McClure: Reuters is a newswire service founded in 1851. We're a big place. So there's about 2,500 journalists in around 200 locations. What other factoids do you want me to pull out from all of that, Ben? Why don't you tell them?

[00:00:47] Ben Welsh: It's a huge place. I've only been here a year and I've learned so much.

And I think really maybe no one knows everything that happens at Reuters. But there's three kind of big pieces to the company that the outsider might not know.

One is that wire service that Jon talked about that's so famous that sends photos and breaking news to more than 2,000 newsrooms around the world in all different languages.

That's thing one. Thing two, which shouldn't be overlooked, is that Reuters is the exclusive news provider to one of the world's major financial terminal businesses, which has recently been rebranded as LSEG. It used to be known as Refinitiv. And that is in many ways the Pepsi to the Coca-Cola of the Bloomberg terminal, if you're familiar with that.

And then of course, there's reuters.com, which is where anyone can go, and for free read a selection of our breaking news and our excellent investigative special reports.

So those are the three big pieces and arms of the company to keep in mind as we move through this.

And at the core of all of it is really the lifeblood of the Reuters business as it has been since the very beginning, which is breaking news.

Literally, there are hundreds if not thousands of little itty bitty stories that are flying out from our offices around the world every day. It's been that way for more than a century. One of my favorite illustrations at this point is this very, very real 1883 memo sent out from the London office where Jon now works, to the correspondents in the field, urging them to deliver more breaking news. I won't read the whole thing to you, though it is an entertaining read in general. One thing to keep in mind is, back then the telegram was how breaking news was submitted and the author of this memo is really pushing down on the troops to deliver more breaking news. Near the end, they actually give an enumerated list of the type of news that they would like to hear more of. My favorite little bit here, if you, slow down and squint and read. is a call for all "disturbances arising from strikes, duels between, and suicides of, persons of great note, social or political, and murders of a sensational or atrocious character".

Which maybe gives you a sense of how editors saw the news back then. It's still true to some degree today. But the key line in this memo that is quoted around the office, quite frequently comes in that last paragraph, where it is requested that the "bare fact be first telegraphed with the utmost promptitude. And we say the line with a little bit of irony today given the sort of elevated British diction, the King's English in there. But it really is still true and really the essence of our business. Being fast, being first, being simple and clear is really the primary imperative of the company. We do so much more than that, but it really is at the heart and soul. And it's that kind of mission that we're going to talk about today and how Datawrapper helps us serve it better. And we're going to do that, one, by Jon giving an overview of how he and his team built this, I think, truly wonderful system for distributing Datawrapper charts across all those different arms we talked about. And then how I came in and tried to do even more with it.

Jon, you want to do part one? You want to take it, man?

The problems that Datawrapper solved

[00:04:00] Jon McClure: I'll take it, if you don't mind driving. So, like a lot of places, Datawrapper filled a pretty predictable need for us in our newsroom. We are, I suppose, one of the larger graphic teams, by comparison to folks who aren't working in some of the mainstay newsrooms, but still compared to the number of journalists that we work with, we're actually quite small.

And so for Datawrapper, we had all the same problems that I'm sure you've heard echoed throughout this conference about. Being very thinly spread, always seemed like we couldn't quite keep up with the reporting that the newsroom was doing and really failing that first rung of our own company ethos to really be there for the big breaking news.

And behind the scenes, the reporters, and others that we work with would often reach out to, if they couldn't get us, to these really terrible, charting tools that would be scattered throughout dark corners of Reuters infrastructure. I've then found a very nice copy down there. I believe that is a real chart that's come out of one of the terrible systems. I think Icon is that one.

In a lot of ways, Datawrapper was going to fill exactly the need that we all have, in situations that we really need to scale out the number of people who can create charts in the newsroom.

Distributing Datawrapper charts

[00:05:29] Jon McClure: So as Ben mentioned, though, we're a little bit different because of our wire service business. And one of the main tenants of that work is that we have to service other media clients. And in addition to parking charts on a website, we actually also have to distribute them in a number of ways to other media who will then publish them on in their websites, papers, television segments, whatever.

At Reuters, we publish charts in a lot of different flavors for our media clients in particular. And we call these different editions, some of which you'll be very familiar with coming straight out of Datawrapper. So, embeddable HTML pages, so the chart that you can just embed with an embed code.

We also make those so that our media clients can host them on their own servers. But we also make static and editable images, so PNGs, PDFs. We also make our source code for all of our graphics available to clients. We have clients who have developers in house who like to customize things themselves. And then we do the usual kind of light, dark, mode charts, little variations like that.

But what that all means for us is that that little publish now button is not the end for us. Basically, that's just the start of a process by which we have to take our charts and make them ready for media clients. And to do that, we use Datawrapper's publishing hooks and the API. And basically, what that does, is it gives us the ability to suck out those charts and the various flavors and formats that we need. We can break open the HTML, CSS, and JS to do various things in them, repackage them the way that we need them to be packaged for our clients, and ship them out. And that may sound like it's very specific to us, but I think it opens up a ton of options for anybody to do really cool things with Datawrapper charts.

We use AWS lambdas, so that this can really scale. We'll talk about how big it scales here in a second with Ben. But that post processing step is really key to us being able to service our clients. You can go ahead and get your notes from Ben. We also use custom fields in the chart editor. It's a really easy way for us to attach some of the vital metadata.

You'll see these root and wild slugs. These are some of the weirdest nomenclature at Reuters. I think these are basically real pieces of metadata, that allow us to categorize and make it easier for the clients to find their stuff. And then once we've done all that, we've tagged our things, we've repackaged them, we've pushed them all up.

We can ship them out to the world through a platform that we call Reuters Connect, which is where clients can come and purchase our charts enabling the formats that I mentioned.

800 reporters onboarded

[00:08:20] Jon McClure: As part of all that process, while we're doing all this repackaging, we take the moment to go ahead and flag charts that are coming that our newsroom has made, or that our team has made, or that our bots have made.

And especially in the context of our newsroom, it's actually a really great moment. I think I've heard several people mention this. They deal with Slack or whatever. It's a really great moment for us to coach our newsroom and really have a feedback cycle about the choices those people are making. This is one such example from someone in the newsroom.

At the end of the day, what that means, is in the last 18 months, we've onboarded something north of 800 reporters, editors, and producers across our newsroom. They're publishing dozens of charts every day, and what that means is that the load is off of our, by comparison, small graphics team.

So we can focus on certain data sets that we're known for each step of the process. It also means that reporters can do more niche work in their individual beats. And our producers have actually become some of the most important chart makers we have. The first ones getting out maps, quick charts on the news as it's happening.

So once we have that sort of pipeline in place, one of the things that we started to try to do was do some of this automated chart building that you've heard some of the others here talk about. This is our first example. It was done by Trent Swerton in our graphics team. It's called Tremolo, which I think is a good name for it. It's a USGS-powered earthquake monitor. One of my favorite examples of this was last year when the Morocco quake struck. We had to shape maps out within minutes. And we continue to pivot on those throughout the day, add more annotations to them, stuff like that, continue to update them.

So we've now had this really great system by which we can, either by hand from the newsroom or through an automated process, create a chart and then publish it out into the world in all these various ways. And so we are ready to try scaling this thing way, way up, and that's where Ben comes in.

Ben?

Automating the whole data-to-charts pipeline

[00:10:27] Ben Welsh: Yeah, that's right. All this work was ready and waiting for me when I started at Reuters. And I was trying to figure out, what the heck am I going to do? Let's find something ambitious, a project to take on, something to chew on. And I found inspiration immediately from two places.

One is a system that was invented here at the company a few years ago called Lynx Insight. And so this system is basically templated text stories. So when new data comes in on the feeds, they're fit, like a Mad Lib, into a templated story and then sent directly out into the wire. And this is a type of system that exists not just at Reuters but at other news wires, and has been very effective at increasing kind of the speed and scale of the text operation.

The graphics editor, Matt Weber, approached me and he said that he and others that had the longstanding idea of "Why can't we do the same thing for graphics", right? So if we can automate these mad libbed text stories, why can't we automate the graphics that would go with the same thing as well?

And I thought, man, he's really got a point. So let's look at it. Let's see if we can make it happen. And then, at that point, I learned all about Jon's system, which he just, I think, covered very well. And it blew my mind. It's really just a powerful end-to-end kind of assembly line that's just waiting for widgets to be pushed down it.

Someone who can claim to have worked for the entire tenure of the company, it reminded me in some ways in my lighter moments of the funnel, which is one of the more unintentionally hilarious metaphors in journalism history. If you haven't seen the video, the link's right there, give it a look.

And I just needed to figure out how I could get the data in there. On the one end is the Datawrapper API. If we could put charts into the Datawrapper API, the top of the funnel so to speak, we know they'll flow entirely down through Jon's system.

And as has been covered by other speakers in this panel, that's totally doable, right? And I'm a Python guy, not an R guy. I found, wow, there's this cool Python wrapper for the Datawrapper API that makes it easy with a couple of lines of Python code to pass in whatever chart you might want. Oh, okay, that's great. Started checking that out, started working with that. Sergio Sánchez Zavala is the inventor of that, by the way. Cool guy. And then I saw on the other end, where are we going to get the data to put into the charts, right? Well, we've got this terminal, okay? So this LSEG terminal system really is this kind of incredible database, that has limited access to a very small number of financial professionals who pay for it. But it updates literally within seconds of new market data being posted or macroeconomic indicators appearing on government websites. And learning about it has really been eyeopening and interesting to me as someone who's never worked at a terminal business before.

The truth of it is: behind a system like this are literally hundreds, if not thousands of web scrapers that are crawling government data sites and putting in new figures instantly when they arrive. And then a whole quality control layer of people who make sure they flow through. So we had this great delivery system for graphics on the one end, and we had this great data source on the other end. We just needed to connect them.

It turns out there's a Python API for the terminal as well, which would allow anyone with access to write code, to download the data from the terminal. And all we needed to do was to close the missing link between the data source, the terminal, and Datawrapper, our delivery device.

And that really was the work that needed to be done and was the job for me, if we were going to make it happen. And one way of thinking about a process like that that ferries data from one to the other, is what really gnarly, boring computer programmers call an ETL pipeline.

I think it's good we're just calling them pipelines now, it's a little more human. But that's an acronym for Extract, Transform, and Load. And what those typically do is: The extractor downloads the source data from wherever it comes from. In this case, our terminal, right?

There's some sort of computer programming layer that is going to transform, clean, summarize, reformat that data so that it's ready for delivery to its target.

And then the loading step is that process of then delivering it out, in this case, to, say, Datawrapper.

And I set out to write something like that. In real life, the actual pipeline is a little more complicated. But that's really the essence of it. We're extracting from the terminal, we're transforming it into the format for Datawrapper, and then we're posting to the API.

The main sort of additional thing is configuring each of your chart templates to behave in a certain way based on what the data source is, what type of chart you want, how you want to manage the layout, which we call the generation or the configuration of the template. Here is an example of one from real life.

This is a screenshot from our code base. At the end of the day, we basically have kind of a Python framework of classes designed for each of those stages in the ETL pipeline, that we then configure based on the template or chart. So this example, without belaboring it, is the unemployment chart that ran last Friday morning when the Bureau of Labor Statistics posted the new jobs report.

And again, if you were to squint and look carefully, you would see that it's really only about 20 lines of code. And most of it is really just the specific stuff related to what data series should I go get from the terminal? What should the headline be, and what the date range should be and some of the specifics of how the chart ought to be collected and rendered.

And that's really all it takes to do an individual chart. The system that executes that sort of pipeline process is run entirely in GitHub Actions. If you haven't checked out GitHub Actions yet, you should. This is a free task running system within the GitHub repository universe, where your code repository has access for free to task running jobs in data centers that will do on a schedule or based on triggers you set, run computer programming tasks.

We have our, templates organized into groups around, say, revenue or jobs or inflation. And those groups of templates are then executed on a schedule very frequently. And when they detect that new data has been found in our source, the terminal, they make the chart and send it out to Datawrapper.

And that is the production environment. It works flawlessly. I recommend exploring things like this in your own work. Once they then deliver that to the Datawrapper API, Jon's system takes over.

20,000 charts published

[00:17:44] Ben Welsh: And so this is the chart that got pushed out just one minute after the new numbers were posted to the BLS website last Friday morning. It flowed into the graphics pool, where an editor could find, select a graphic, and then embed it into a piece. It was there and ready. We're seeing these charts get delivered through the system faster than even the text stories get written and edited. And they're often there ahead of when the reporters and editors are even thinking about looking for art.

So these are obviously speed wins. That chart is then put right into the story, which was on the homepage of Reuters.com and out on the wire in the next few minutes, as were several other charts in the jobs report, because we have a whole kind of package of four or five. But it's not just for speed.

You also begin to benefit from scale. So that same class that I showed earlier, you may have noticed that two entries, one for the United States and the other for Canada. So that same template can be adapted to a different data series with a slightly different configuration and boom, you now are doing two instead of one with the same amount of labor.

And that can multiply to really large numbers in the right circumstance. So for instance, we have a template for a bar chart of corporate earnings. So every quarter, a public corporation will report to its shareholders, how much money it made in the previous period. We can make the same bar chart, we are making the same bar chart for every single publicly listed company we choose to track, which allows us to make literally thousands of these charts within seconds or minutes after the data being discovered in the system And that's where you begin to get to scale that no human's going to be able to do on their own.

So it's not just faster, it can do quite a bit more. And the result of that is that the staff on Jon's team is freed up to do much higher value, higher skill work, which can also have much, much, higher impact as well. So in real life, once a month, every Friday morning, Allie Levine, here in the New York office would have to wake up early and make that same batch of jobs reports every single time. Unless she missed her alarm clock. But now the robot does that for her, and she gets to sleep in and do things like this beautiful Walt Disney graphic they recently published instead. And of all the people and writers that I've talked to about this project, no one has been more excited about it than Allie, who sees it as really something that allows her to work on more rewarding and challenging work.

We have an internal system that we've built for monitoring the output. Thus far we have close to 60 templates that we've deployed, including even a few that branch out to different inputs and outputs. So for instance, we have some custom web scrapers that power charts that don't use the terminal.

And we actually are publishing some static images generated, not via Datawrapper, that get embedded into a morning email newsletter series so the pipeline system can also horizontally scale if you will, in that way.

Thus far, we've pushed through about 20,000 charts to Datawrapper in the system. That's according to my little profile row in our team page in Datawrapper. My hope is if we aren't already, we will have published more charts than anyone else on the platform, using this system.

Updates to the Python Datawrapper library

[00:21:13] Ben Welsh: Beyond the actual work here privately inside of Reuters, I partnered with Sergio, the maintainer of the Python Datawrapper library. And we recently have released a new version, which I think maybe we're announcing for the first time here today which now covers 100 percent of the, API endpoints on the Datawrapper API. So anything that's documented in the Datawrapper API, from creating folders and moving folders, organizing teams, creating maps, all of it should be possible in this system. If something isn't, tell me and I'll make sure we get it in. And Sergio's been really great to work with on that.

For anyone who's really into API stuff, here's just like a quick example of how the Python wrapper could make a really basic chart. For those familiar with the API, you'll see that metadata variable is hewing very closely to the sort of the JSON structure that Datawrapper prefers to be posted in. The result of that: It's just a little verbose and maybe not as friendly to beginners as it could be. So one of my personal kind of dreams or goals, and I would love to get feedback and collaborators on this if anyone is interested, would be to maybe pursue an upgrade to the Datawrapper API that is more, say, declarative. This is pseudocode that does not exist. But to me, this is maybe how I think the API ought to accept Python code. Where all of the attributes are sort of keyword inputs. That they all would be fully documented and listed out somewhere, which I'm not sure that they are currently. And just generally trying to move towards something that makes automation easier and easier over time. So if anyone has thoughts or interest in that, I would love to hear what people think. This is my little attempt of dreaming it.

And then finally, Sergio and I recently prepared a tutorial on how any beginner could pick up Python and try to begin automating Datawrapper charts with zero experience. We presented that one week ago today at the NICAR conference, the large data journalism conference in Baltimore. It was standing room only. There was so much enthusiasm from the data journalists we met to do more with Datawrapper, and to do more with automation. It is free to anyone, and every step in the class is explained and all of the code is included as well.

I think that's what we have. I just want to thank you all for tuning in. It's really appreciated. And we'd be happy to answer any and all questions.

Q: Teams integration?

[00:23:58] Shaylee (host): Thank you so much Ben and Jon. That was a fantastic speech. Very informative. Great to see behind the curtain and learn so much about the intricacies of what goes into pushing out 20,000 charts in about six months. That turnaround time that you mentioned is really remarkable, and very impressive. I have a couple of questions while we wait for other questions to come through in the chat. Things that came to me while I was listening to this presentation. You mentioned a Teams integration as a kind of plugin to make sure that things were being reviewed or to make sure that nothing was going wrong.

Was that an idea for this project from the get go? Or was that implemented after the core pipeline was built and created?

[00:24:50] Jon McClure: Yeah, no, that was part of the original project. We needed to feed back some information to reporters about where their charts went into the graphics system, and just decided that was a really great point to go ahead and pop an image of the chart since we had that in hand already, and that gave us a chance to see what people were making very easily throughout our day. And we could pop in some comments and stuff like that. It's actually been one of the funnest things about the whole process for me. Because you never know what kind of data you're going to see on any given day come through. And it's nice to talk back to the newsroom about what charts are breaking.

[00:25:33] Shaylee (host): True. It's also one of my favorite parts about this job: not knowing what kind of charts are going to come through on any given day, it's very exciting. Can agree with you there and great foresight on that.

Q: Byline in automated charts?

[00:25:44] Shaylee (host): We do have a question from Nora Gully: How do you byline automated charts?

[00:25:52] Ben Welsh: We don't put a byline in the Datawrapper field. We list the source. If the data comes from the terminal, we list LSEG as the source, of course. But no chart is published without a human editor reviewing it and, in the language of Reuters, packaging it with the story. And we view that as the editorial oversight on the piece. If you're interested in the quality control aspect further, we do have a committee of people that I review all the templates with before they're implemented. And so these are people who are experts in the domain that we're trying to make the chart for. And so they're involved in kind of a review process prior to that.

[00:26:35] Shaylee (host): Okay, that makes sense. Okay.

Q: Cleaning process for data?

[00:26:41] Shaylee (host): I do have one other question that came to me. You mentioned that a lot of your data comes from one data source. Is that always in the same consistent format? Cleaning that and preparing it for Datawrapper, I can imagine that would be a sticking point. Is that always the same cleaning process?

[00:27:00] Ben Welsh: It's mostly consistent. It's a truly massive database with a really wide array of data and so it is definitely not a hundred percent consistent. And across domains, there can be a little bit of quirks. Like for instance, commodities data is handled a little different from stock market data, which is handled a little different from the macroeconomic data, in terms of what the columns are called, or how you query it from the source API. And so that's where the Python pipeline comes in. And the approach we've taken is to try to write generic Python classes that handle the most common data. variations. Hey, here's our monthly macroeconomic extractor and transformer, right? And that's gonna handle most of the macroeconomics data, but then you can always kind of override or tweak the individual implementations based on what the quirks might be.

[00:27:59] Shaylee (host): Yeah, makes sense. Having that kind of room to adjust things with human eyes if needed.

Fantastic. Okay. I think that we are done with any questions that might come through in the chat. I want to say thank you again so much for the time. It was an excellent speech, great to hear about it and now we can move on to our final talk of the day.

We asked Jon and Ben a few questions before their talk:

Jon, Ben, how do you use Datawrapper at Reuters?

Jon: On our graphics team, Datawrapper may not always be the tool that produces our most ambitious published work, but chances are it is the first tool we touched along the way. It's an invaluable prototyping platform in our reporting process, helping us test out initial ideas and explore data. But when the news gets hot, Datawrapper — especially its fabulous and ever expanding mapping capabilities — is the break-glass option to get vital information to our readers and clients at the speed of our reporting.

The war in Ukraine is fuelling a global food crisis., Reuters

How has it been integrating Datawrapper into your organization's workflow?

Jon: What's been most exciting is how Datawrapper has opened up avenues to evangelize graphics and data-first journalism across our newsroom. Having a tool with an easy user interface that we can collaborate on with reporters and editors has given us uncountable opportunities to spread The Knowledge™ and has fundamentally changed the idea of what any journalist’s core skills are, regardless of what they’re reporting on.

What's your guiding principle when working on data visualizations?

Ben: Every chart should tell a story. Even if that story is as stoopid simple as “number go up.”

What's your favorite Datawrapper feature?

Ben: Currently my favorite feature is the Datawrapper API, which is at the heart of the automation framework we’ll reveal for the first time at your conference. Once you learn how to use it, almost anything you can achieve with a your mouse and the browser can be automated your code and the API.

Thus far, we've pushed through about 20,000 charts to Datawrapper in the system. That's according to my little profile row on our team page in Datawrapper. My hope is that if we aren't already, we will have published more charts than anyone else on the platform, using this system. Ben Welsh, Reuters, in minute 20:51 of the talk at Unwrapped 2024

We loved Jon and Ben's talk at Unwrapped! You can find Jon on X, LinkedIn, and see his work on his Reuters page. To learn more about Ben, visit his website, X, or LinkedIn account.

Find out more about Unwrapped and hear from other great speakers on our blog.