The Organisation for Economic Co-operation and Development (OECD)[1], is easily one of the best sources for R&D statistics. So it’s no coincidence we looked into the OECD R&D statistics databases while doing research for our R&D investment section.
According to the OECD’s search page, they provide more than 2000 databases on a variety of topics.[2] And although that count of more than 2000 contains some duplicates, it’s indisputably a huge amount of data. We will mainly focus on the OECD R&D statistics databases which they list under “innovation and technology”.[3]
However, they also have databases on topics such as: agriculture, development, economy, education, energy, environment, finance, government, health, jobs, and society. What a wealth of information! And even within the topic of innovation and technology, the OECD features many different sets of data.
Member countries and non-members
The OECD is an international organization with currently 38 member countries. The member countries include many European countries, United States, Canada, Australia, New Zealand, Japan, Korea, Israel, Turkey, Mexico and several other Latin American countries. For a full list of countries, and a proper introduction to the OECD, see their about page[4]. The OECD R&D statistics databases typically feature data from the member countries but often also include data from selected non-member countries. China, Russia, Argentina, Singapore and South Africa are examples of such non-member countries.
OECD Main Science and Technology Indicators database
The OECD’s Main Science and Technology Indicators (MSTI) database has an incredible number of statistical indicators (called MSTI variables)[5]. At the time of writing there are 134 MSTI variables to choose from! These include R&D statistics such as:
- R&D investment (expenditure) statistics per country as well as totals for all OECD countries and the European Union. This includes many variables relating to gross domestic expenditure on R&D (GERD).
- R&D expenditure statistics specified according to source of funds and/or split by different sectors that are performing the R&D activities. Prepare yourself for terms like business enterprise expenditure on R&D (BERD), higher education expenditure on R&D (HERD), and government intramural expenditure on R&D (GOVERD).
- Government budget allocations (GBARD).
- Statistical indicators concerning research and development output such as patent applications and import/export figures of relevant industries.
- Statistics on researchers and other R&D personnel.
- Some general economic indicators like gross domestic product (GDP), population, employment, and more.
The MSTI database has a wealth of information and can serve as a one-stop-shop for many R&D statistics needs. Although shop is probably not the right word as most of the OECD’s data is free[6]. We can’t possibly describe the full extent of the MSTI database in this article. However, the OECD provides documentation[7] that does so. It’s worthwhile deep diving into that to appreciate the MSTI database in its full glory. First, please read the rest of this article though. You’ll find out about OECD data exports and the OECD RDS databases, which contain even more data!
Exporting data from the OECD MSTI database
The OECD offers reasonable tools for visualizing R&D data[8][9], but you may want to export OECD R&D statistics data instead. Perhaps you want to run your own custom calculations using some of the stats. Or maybe you prefer to create your own visualizations, and/or integrate OECD data in your own application. Whatever the reason, there are several options to export data from the OECD MSTI database. Just go to the MSTI section in their “OECD.Stat” environment[10] and click “Export” to find the following options:
Excel export
Very straightforward so you don’t have to be an IT genius for this. This option exports data for only one statistical indicator at a time though. Thus, it’s not great if you want to do a bulk export. Also, the data comes in a table layout. This may not be the preferred way if you want to import it into another system. If you want all MSTI variables in Excel format, we recommend using the “Related files” option instead, see next item.
Related files
Open the related files tab in the export popup, and click the “Full MSTI database in Excel” link. This downloads a zip file with the full MSTI database in Excel format! The Excel file contains an overview tab which lists all the indicators with links to the individual data tabs. That’s a very welcome bonus as it can be difficult to navigate all those tabs. This Excel file is great if you want to manually browse through the data. However, it’s still not the best format if you are looking to do a bulk import into other applications.
Text file (CSV) export
A great option to quickly export the full database in a flat file format. Only use this option if you are familiar with CSV files though. You need to import this file into another application and transform it in accordance with your statistical needs. Business intelligence tools like Power BI, Tableau, and many others make this reasonably straightforward but they all come with a learning curve. Also, we were not able to find a direct link to the CSV export function which would allow automating the process of periodically obtaining updated data. With this particular statistics database, a manual download may not be the biggest problem though. The OECD MSTI database is published only twice per year (in March and September).
PC-Axis
Frankly, we had never heard of this before. But at Beating Gravity we are curious folks, so we decided to Google a bit. We landed on a page of the SCB (Statistics Sweden) website which mentions PC-Axis 2008[11]. That sounds ancient! However, they also explain that PC-Axis was replaced by something called PxWin which can handle px files. It sounds like PxWin might be able to import the PC-Axis file format you can download from the OECD website but we have not tested this. We are also not sure how much this file format is used still in general. Feel free though to enlighten us in the comments section if you are using PxWin.🙂
Developer API
If the word developer didn’t immediately put you off, let us warn you: technical knowledge is a must for this option! It can be a great option though if you are looking to automate data updates. The developer API allows you to generate API query URLs. You can use these in your own applications to automatically obtain fresh data periodically and format it as required. This API uses SDMX-JSON format which in our humble opinion is not the most friendly format ever invented. Nevertheless, SDMX is a standard used by many organizations. It might be great if you use systems that support it, or if you don’t mind a technical implementation. Also, see the section below about Power BI if you are looking to use OECD data in Power BI.
SDMX
This option is somewhat similar to the developer API and requires similar technical skills. The format is different though, it gives you SDMX-ML rather than SDMX-JSON.
That’s a lot of export options! And while there is always room for improvement, the options above should make a lot of statistically minded people happy.
Beating Gravity tip: before you open the export menu, first make sure to select all the data you require! Check your selection via Customize > Selection. Notably, the system doesn’t select all years by default. So if you want more historical data, make sure to select a longer date range before exporting your data.😉
OECD Research and Development Statistics databases
The title of this article is “OECD R&D statistics databases”. This seeks to cover all R&D related statistics provided by the OECD. However, the OECD actually features a series of databases that they refer to as “Research and Development Statistics (RDS)”[12]. And that doesn’t include for example their MSTI database discussed above, how confusing! Why does the OECD have a Main Science and Technology Indicators database and feature their RDS databases separately? The MSTI database certainly contains R&D statistics too, so what is going on here?
The answer is that RDS databases are the main basis for the MSTI database. Therefore, these two sets overlap. Furthermore, you might guess that the RDS databases contain additional data that is not in the MSTI database. That is indeed the case. Here is a high level overview of what the OECD’s RDS databases include:
- R&D expenditure statistics[13], but with additional detail not available in the MSTI database. Some examples are views by industry activities and organization size.
- Government budget allocations for R&D (GBARD)[14] by a range of “socio-economic objectives”. Some of these are not available in the MSTI database.
- R&D personnel statistics[15], with certain filter selections that may not be available when using the MSTI database.
- A long list of historical series[16].
RDS or MSTI, which is better?
You might wonder, as the RDS databases have more detail than the MSTI database, why use the MSTI database at all? It really depends on your needs and the amount of time you want to invest. When using the MSTI database, you can select a statistical indicator and you immediately get an overview. On the other hand, the RDS databases offer more filtering options. This allows for more flexibility but also requires the user to think about which selections make most sense. We recommend having a look at both to see which approach suits you best. And, as we’ll explain now, there is also an important difference when considering data exports.
Exporting data from the OECD RDS databases
There is a reason why we are referring to RDS databases (plural) and MSTI database (single). In the OECD’s system, the Research and Development Statistics (RDS) consist of quite a few separate databases. This makes your life harder if you want to export all that data for use in other applications.
If you want to export OECD RDS data, there are some key points to be aware of:
- You can export RDS data from the “OECD.Stat” environment.[17]
- The export options (Excel, CSV, API, etc.) are similar to what we discussed above in the MSTI database section.
- Because the RDS data consists of separate databases, you can’t download it all in one go! That’s an important downside to consider when comparing with the MSTI database. If you are doing an API integration you may be less concerned about this limitation. However, even if you automate using the API, you will need to generate and maintain multiple data queries. And you will need to make sure to parse the data correctly for all those queries.
- If you are using Power BI, make sure to check the next section.
Importing OECD R&D statistics in Power BI
At Beating Gravity, we use Power BI[18]. Hence, we looked into the possibilities of importing OECD R&D statistics in Power BI. We will discuss the options but assume some basic Power BI knowledge. If you need help with the basics, we recommend the Power BI documentation[19] or the Microsoft Power BI Community[20].
As mentioned earlier, there are several options for exporting OECD statistics data. We played around with most of these formats and found the following when importing the data in Power BI.
Excel
OECD data exports in Excel format tend to have rows with text and empty rows that you need to get rid of when importing in Power BI. Usually, Power BI easily detects where the actual data tables are and can handle the cleanup without your assistance. More troublesome is the fact that the Excel exports tend to have the years as columns, and often contain multiple rows with column headings, some including merged cells. The OECD interface allows you to customize your layout before you do an export though. You can move the years from columns to rows, and set some table options. Nevertheless, you may still struggle to get the format you need.
Power BI allows you to transpose tables, and apply other transformation to get the data in the desired format, but that can be time consuming. Moreover, the Excel exports tend to come as one table for each statistical indicator. This means that you have to carry out some steps several times if you need more than one indicator. Excel is not the best choice if you are looking to import lots of OECD statistics data!
Text file (CSV) import of the OECD MSTI database
This is much better already. Especially with the MSTI database, which allows you to export all the statistical indicators to CSV in one go! You can import this CSV in Power BI and apply some basic transformations as needed. After “promoting headers”, changing some types and renaming some columns you already have a format you can work with.
We recommend also adding a custom column which multiplies the “value” column with an expression like “Number.Power(10,[PowerCode Code])”. The new column then has the actual values without having to refer to other columns to find out whether the values are in units, thousands or millions.
Text file (CSV) import of the Research and Development Statistics databases
Pretty much the same considerations apply here as just described for the MSTI database. Additionally though, keep the following in mind:
- As discussed, the OECD RDS databases are typically exported as separate files. This means you can’t import them all in one go without doing some additional work. Fortunately though, many of the files have a very similar structure. We first exported all the RDS databases in the subcategory “expenditure”. Then we imported and transformed just one of the files. Next, we were able to duplicate the query in the Power Query Editor, and simply change the file name in the source step. We then changed a few column names in some of the other steps. And that finalized the query for the second file. You can repeat this procedure for the other files, and save lots of time like that.
- Building on the previous point, you may want to append the queries of the separate files. This will allow you to load data from separate files into one table in your Power BI data model. Just make sure to add custom column(s) with the code and/or the label of the statistical indicator of each file. Like that, you’ll have a field that allows you to filter/select the required statistical indicators in your reports.
- Some RDS exports give you a column named “Sector” while this is actually the “performing sector”. That is the sector carrying out the R&D activities. We recommend renaming this column accordingly. Like that, you avoid later confusion with for example the sector providing the funds for R&D (source of funds).
OECD developer API / SDMX and Power BI
While CSV exports work fine for many use cases, you may require automatic data downloads and refreshes. In that case, you are out of luck with CSV as the OECD doesn’t provide any direct CSV export links. The developer API (SDMX-JSON) or the SDMX-ML file export option may come to the rescue. However, both are not so straightforward to use in combination with Power BI:
- You can make it work with some custom coding. Have a look at solutions from others in the Power BI Community if you are comfortable going down this route.[21]
- In 2021 an SDMX connector for Power BI was introduced.[22][23] Great, problem solved?! Unfortunately not yet, as this connector requires the SDMX-CSV format. It doesn’t support SDMX-JSON or SDMX-ML. Should the OECD introduce the SDMX-CSV format for exports though, this connector might quickly become your preferred solution.
- YouTube features some interesting videos on the two points above.[24][25] The speaker suggests that the OECD is working on making the SDMX-CSV format available in the future. However, she doesn’t mention a timeline.
- You may come across OECD R&D statistics databases that are not available via the API. Therefore, it can occur that a statistical indicator is available on the OECD website but not via the API.
Other R&D statistics databases offered by the OECD
We discussed the most important ones in this article, but the OECD has even more databases with R&D statistics! Consider the following:
- The ANalytical Business Expenditure on Research and Development database (ANBERD)[26][27] That’s a long name!🙂
- Various patents statistics databases.[28]
- R&D Tax Incentive Indicators, for example R&D tax credits statistics.[29]
This article is arguable already too long, so we won’t discuss these additional databases. Just click the relevant links to the reference section to find out more if you are interested.
Concluding remarks regarding OECD R&D statistics databases
Hopefully, you learned something useful from Beating Gravity‘s experiences with the OECD R&D statistics databases, shared in this article. Overall, we are definitely pleased with the huge range of data the OECD makes available. However, we think it’s fair to say that the OECD website can be hard to navigate. All the different sections and subdomains don’t always make it easy to find the data you are looking for. If you find yourself in that position, we hope this article helps you on your statistical adventure!
Did you enjoy this article? Please let us know in the comments section.
Check our homepage for more amazing articles!