Chapter 6 Our Approach for Creating a Music Observatory

An observatory is a location used for observing terrestrial or celestial events. The European Commission and the Council of Europe are supporting numerous data observatories to support research and development and evidence-based policymaking. We are creating automated observatories following the best practices of reproducible research.

  • A private observatory is a data integration system, which automatically collects external information, processes it and professionally joins it with internal data resources. We offer this to business, scientific research, think tank and NGO and journalism partners.

  • A collaborative observatory is a data integration system that has a map to collaborating institutions’s data resources, and is able to exploit their synergies by automatically combining their data, triggered by an authorization of all involved parties. We offer this to Consortia of various entities. Daniel’s CEEMID project developed since 2014 is a good example of a collaborative observatory.

  • A public observatory is a collaborative observatory that intends to make at least some of its data assets available as open data. ode chunk to prevent printing of the R code that generated the plot. We offer this for the European Union and its Consortia. See some observatories »

6.1 Evidence-based, Open Policy Analysis

In the last two decades, governments and researchers have placed a growing emphasis on the value of evidence-based policy. However, while the evidence generated through research to inform policy has become more rigorous and transparent, policy analysis–the process of contextualizing evidence to inform specific policy decisions–remains opaque.

We believe that a modern data observatory must improve how evidence is created and used in policy reports, and pass on the efficiency gains from increasing reproducibility and automation. Therefore, we pledge that the will comply with the Open Policy Analysis standards developed by the Berkeley Initiative for Transparency in the Social Sciences & Center for Effective Global Action. These standards are applied by the World Bank.

6.1.1 Reproducible research

Reproducible research is a scientific concept that can be applied to a wide range of professional designations, for example, reproducible finance in the investment process or reproducible impact assessment in policy consulting. Based on the computational reproducibility we believe that the following principles should be followed.

  • Reviewability means that our application’s results are can be assessed and judged by our user’s experts, or experts they trust. We help reviewability with full transparency: we publish the software code that created the indicators, our methodology, and an automatically refreshing statistical description of the indicator each day when it receives new data or corrections from the original source.

  • Reproducibility means that we are providing data products and tools that allow the exact duplication of our results during assessments. This ensures that all logical steps can be verified. Reproducibility ensures that there is no lock-in to our applications. You can always chose a different data and software vendor, or compare our results with them.

  • Confirmability means that using our applications findings leads to the same professional results as other available software and information. Our data products use the open-source statistical programming language R. We provide details about our algorithms and methodology to confirm our results in SPSS or Stata or sometimes even in Excel.

  • Auditability means that our data and software is archived in a way that external auditors can later review, reproduce and confirm our findings. This is a stricter form of data retention that most organizations apply, because we do not only archive results step-by-step but all computational steps – as if your colleagues would not only save every step in Excel but also their keystrokes. While auditability is a requirement in accounting, we are extending this approach to all the quantitative work of a professional organization in an advisory or consulting capacity.

  • Reviewable findings: The descriptions of the methods can be independently assessed, and the results judged credible. In our view, this is a fundamental requirement for all professional applications. CEEMID’s music data is used to settle royalty disputes in judicial procedures, or in grant and policy design. We believe that the future European Music Observatory should aim at the same bar, making its data & research products open for challenges in the publicity of science, courts, and professional peers.

  • Replicable findings: We are presenting our findings and provide tools so that our users or auditors or external authorities can duplicate our results.

  • Confirmable findings: The main conclusions of the research can be obtained independently without our software, because we describe in detail the algorithms and methodology in supplementary materials. We believe that other organizations, analysts, statisticians must come to the same findings with their own methods and software. This avoids lock-in and allows independent cross-examination.

  • Auditable findings: Sufficient records (including data and software) have been archived so that the research can be defended later if necessary or differences between independent confirmations resolved. The archive might be private, as with traditional laboratory notebooks. See Open collaboration with academia, auditors, and industry.

These computational requirements require a data workflow that relies on further principles.

  • Record retention: all aspects of reproducibility require a high level of standardized documentation. The standardization of documentation requires the use of standardized metadata, metadata structures, taxonomies, vocabularies.

  • Best available information / data universe: the quality of the findings, their confirmation and auditing success will improve with better data and facts used.

  • Data validations: The quality of the findings will greatly depend on the factual inputs. While the reproducible findings may have many problems, inputting erroneous data or faulty information will likely lead to wrong conclusions, and in all cases will make confirmation and auditing impossible. Especially when organizations use large and heterogeneous data sources, even small errors, such as erroneous currency translations or accidental misuse of decimals, units can cause results that will not pass confirmation or auditing.

6.1.2 Indicator design

We are committing ourselves in the final deliverable to follow the indicator design principles set out by Eurostat: (Eurostat 2014, 2014; Kotzeva et al. 2017) to create high-quality, validated indicators that receive appropriate feedback from users, i.e. music businesses, their trade associations and policy-makers.

  • Indicators that were used with all known royalty valuation methods (PwC 2008), for both author’s and neighbouring rights, and fullfil the IFRS fair value standards, incorporated in EU law and the recent EU jurisprudence (InfoCuria 2014, 2017).

  • Indicators that can be used for calculating damages, or calculating the value of the value gap (Daniel Antal 2019a, 2019c).

  • Indicators that quantify the development needs of musicians, and can set objective granting aims and grant evaluations (Antal 2015).

  • Intelligent, AI-based applications, including machine learning, to predict the best scheduling or likely audience.

  • Understanding how music is taxed, how music contributes to the local and national GDP, and how music creates jobs directly, indirectly and with induced effects (Daniel Antal 2019b).

  • Providing detailed comparison of the differences of music audience among countries.

  • Measuring exporting success on streaming platforms, and preparing better targeting tools.

6.2 Data sources

In line with existing EU polices and regulations, we believe that any future data observatory, and the European Music Industry is no exception to this rule, should rely on as much as possible on open data, because this avoids the duplication of investment and offers a better return on taxpayer money when public funds are used.

6.2.1 Open Data

In the EU, open data is governed by the Directive on open data and the re-use of public sector information - in short: Open Data Directive (EU) 2019 / 1024. It entered into force on 16 July 2019. It replaces the Public Sector Information Directive, also known as the PSI Directive which dated from 2003 and was subsequently amended in 2013.

The aim of the EU open data regime is to give access to all (non confidential) information for the music industry if that information was already collected on the taxpayer’s expense by a public body.

Europe has a lot of open data that is highly relevant for the Music Economy, Music Diversity & Circulation, Music, Society and Citizenship, and can be a creative source of Innovation — i.e. all pillars of the planned European Music Observatory.

Open data is not a panacea. While various EU research programs, tax authorities and even transport authorities hold data that may be highly relevant for the music industry, this data sits in databases that are processed for a very different purpose. The most important heritage of CEEMID is that over many years we learned how to get access to and reprocess these otherwise free data sources for the benefit of the music industry (See Annex).

The data retrieval is done by the musicobservatory R package in the case of public data sources. We will treat private data sources with the same care, but obviously do not publish sensitive access code or data.

  • CEEMID has been creating since 2014 thousands of music and audiovisual industry related indicators from raw data collected for other purposes, such as inflation measurement or public policy assessment in the EU using our proprietary software and some open source software.

6.2.2 Industry & Partner Data

CEEMID has worked with all parts of the music industry: record labels and distributors, their collective management organizers, performers, composers, publishers and their collective management-granting authorities, trade associations and export offices. We helped them professionally join and integrate public and private data. This means that we have a detailed data map of the industry and its main organization in recordings, publishing and live music.

We are not a data vendor or re-selling organization. We believe that our know-how and added value lies in integrating many data sources into more complete information and providing state-of-the art predictions, forecasts, valuation and other professional uses of the data. We will never publish either data maps or data from these sources, but we will seek partnership with industry organizations to make some of their data visible or available for the observatory, in a form, frequency and under terms comfortable for them.

In this demonstration, we will show some visualizations of publicly available data from various industry associations. We do not have the right to re-publish the data but we have the know-how to enhance, better and join these data to be more useful information. Our users who have legal access to the databases of CISAC, IFPI, GESAC or other industry associations can ask to join these private sources with our observatory for analysis. We provide this service only for the benefit of legal users of such data.

6.2.3 Survey-based data

Currently many Eurostat data products are made of primary data that is collected in distinct, and not fully harmonized frameworks. For example, Cultural Access & Participation (CAP) data, which is survey-based, is sometimes included in the AES, sometimes in the EU-SILC, and sometimes in the European Commissions Eurobarometer program. (See more about the CAP surveys in the Annex: Cultural Access & Participation Surveys )

We are both commissioning with our partners CAP surveys, and we are retrospectively harmonizing existing EU CAP surveys. We will also publish retrospectively harmonized data from individual responses from pan-European surveys concerning music, willingness to pay and the use of entertainment technology. An early version of this work already created some unique indicators for our work with IVIR colleagues Open Access is not a Panacea, even if it’s Radical – an Empirical Study on the Role of Shadow Libraries in Closing the Inequality of Knowledge Access (See Metadata: Regional Eurostat Variables For Understanding Piracy Of Books)

The following two indicators are use cases of our eurobarometer software package that allows us to create statistical indicators from the usually unpublished, unused questionnaire data of Eurobarometer.

Our more general retorharmonize and more specific eurobarometer software package allows us to create statistical indicators from the usually unpublished, unused questionnaire data of Eurobarometer – and even create comparative indicators with with Latin-America, Africa and the Arab world.

We placed two examples in Ownership of CD players and Ownership of smartphones as examples of the use of questionnaires not processed for the purposes of a music observatory with open data and our open source software.

The European statistical framework covers the demand/society side of the music, and some data, but not very detailed data is available for the entire EU. This is not the case with the supply side of the industry. Because the music industry is dominated by micro- and small enterprises, they usually do not participate in the structural business framework of the EU or member states. Currently, there is no other way to understand the investments, cost and value added of the music industry than to collect data outside of the governmental statistical programs within the industry.

CISAC –on the authors’ and publishers’ side– and IFPI on the recording side collects plenty of important market data, and these data sources are more or less global (i.e. they cover all mature markets globally and some emerging markets.) Live DMA has a fledgling data program for some aspects of European live music. Most of these data sources are not public, but our private data integration allows their users to make a lot-lot more out of these data sources, because we can add critically missing information to use them for royalty valuation or market forecasting.

Our own survey program (see See Annex: Music Professional Surveys) is designed to collect the information that is missing from these industry data collections and also from the EU statistical frameworks, and we aim to globalize them, starting first on the most important North and Latin American markets and Australia to be best complement existing data sources. We are well aware that Asia is in many ways the biggest future market, and in 2021 we will try to reach out to the most populous continent, too.

6.2.4 Experimental estimation techniques

In our 4.2.1 Central European Music Industry Report 2020 we showed another approach. Together with Consolidated Independent, a music distributor of about 3.3 million sound recordings, we created a very large sample from 700 million individual, anonymized royalty statements, and created experimental indicators for royalty, distribution components and the advertising based revenues for 20 countries.

Furthermore, using the Spotify API we created music export and via the Google Trends API demand seasonality indicators, which we compared with actual ticket sales data in one country. We believe that this approach is in line with many aspects of the methodological work carried out in the ESSnet Big Data II project.

6.2.5 Other Proprietary Data

CEEMID and its successor, Reprex B.V. (website:, short introduction in the Annex) had been collecting data from primary sources, mainly via harmonized surveys for 7 years. CEEMID has been surveying music professional and film professionals for 7 years, and has been conducted probably the most Cultural Access & Participation Surveys in Europe.

Probably the most comprehensive and fully reproducible report that CEEMID did is the Central European Music Industry Report 2020, that was presented as best case for evidence-based policy design in the cultural and creative sectors in the CCS Ecosystems: FLIPPING THE ODDS Conference two-day high-level stakeholder event jointly organized by Geothe-Institute and the DG Education, Youth, Sport and Culture of the European Commission within the Creative FLIP programme.

Some of this data is the asset of CEEMID and we will release indicators from those data assets. Some of them belong to CEEMID partners and we will seek their permission to release examples and seek funding to make the relevant, high-value data open in the future.

The Demo Music Observatory will follow the guidelines of Eurostat Towards a harmonised methodology for statistical indicators series (Eurostat 2014, 2014; Kotzeva et al. 2017) to create high-quality, validated indicators that receive appropriate feedback from users, i.e. music businesses, their trade associations and policy-makers.

Because music is often a very local business, artists often have a local or regional fan base, and they are helped by local policies, we will show how our rich data assets can be produced on regional and city level following the best practices and guidelines set out by Eurostat and OECD (Münnich et al. 2019).

6.3 Data Integration Principle

Instead of creating expensive and unproven new data assets, we believe that the future European Music Observatory, like our Demo Music Observatory, should rely on proven industry data assets, and it should put efforts into making the existing data well-documented, validated, and easy to build upon in a statistically ‘tidy’ format that allows quick automated data joins.

We believe that more insights can be gained from joining existing, known, proven data assets than increasing the size of new ones. For example, both CISAC and IFPI help the author’s and neighbouring rights’ societies with data assets to fullfil their obligations to their members and regulators, bearing in mind the often restrictive conditions set in the jurisprudence (InfoCuria 2013). However, the increased activity of licensees and competition authorities have significantly increased the burden of proof required to justify collectively managed royalties and private copying compensation (InfoCuria 2014, 2017). Collective management organizations must be able to professionally join data from each other, and about the market demand and macroeconomic conditions of the entire Single market with many currencies and reporting standards. CEEMID is providing them with hundreds of indicators that comply with this jurisprudence, and automate correct currency, unit conversion, data processing and other tasks that most CMOs do not have data science competences.

  • We would like to introduce the achievements of CEEMID in integrating numerous data sources of the European music industry, i.e. building programmatic interfaces based on a thorough understanding of data and accompanying metadata into more advanced data products and services.

  • We would like to address owners and managers of known, high-quality data resources to provide at least a minimal, valuable sample to the Demo Music Observatory and elaborate on the conditions of providing more data for the future European Music Observatory.

6.3.1 Public Data Integration

In the Demo Music Observatory we would like to demonstrate that the innovative and professional combination of open data can create highly valuable new business key performance indicators, ex ante or ex post grant indicators, or public policy indicators.

6.3.2 Collaborative Data Integration

CEEMID was founded on the idea to help music organizations with small research capacity and small research budgets to pool those capacities and assets. One of the most important aspects of our value proposition and hoped for sustainable model is to help participants in our open collaboration to exploit privately the benefits of sharing data with each other using us as secure, trusted third parties.

6.3.3 Private Data Integration

We mainly seek to find a sustainable financing of our activities via offering private data integration. In this case, we are offering industry, research and policy partners a highly automated tool that bring into their research teams the open data, and combines it with their confidential, proprietary data.

In the 4 Innovation you can read about how we want to make forecasting, AI & machine learning,royalty valuation and copyright infringement compensation calculation, or how to decrease research costs significantly by automated reporting and documentation.

We believe that we do not create a conflict of interest, because paying users of Private Data Integration do not get different data than the users of our Collaborative or Public observatories. Rather, we work with them to join their private data with our data and that of public sources, analyzing the resulting data to its full potential.

6.4 Open collaboration with the music industry, music researchers and cultural policymakers

We believe that the future European Music Observatory must rely on open-source statistical software written in the R statistical language, such as the Demo Music Observatory, and it must be funded on the principle of open collaboration with the industry, statisticians and academia, employing best statistics, data science and AI practices. The Observatory uses many data sources about the audience, the creators of music, music works and recordings, its global circulation and its economy. CEEMID has created thousands of high-value, hard music industry indicators by integrating open data sources, industry data sources, surveys and various APIs to other relevant data sources.

CEEMID is aiming to transfer thousands of indicators that are reproducible and verifiable, together with the open-source statistical software that creates them to the European Music Observatory in order to give Europe-wide access to timely, reliable and actionable statistics and indicators for the music industry, policy-makers and music professionals.

6.4.1 Open Source Code

Open-source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Open-source software is often developed in a collaborative public manner, and which is a prominent example of open collaboration.

Generally, the use of open source software, including the open source R language and its software packages or libraries in the national statistical offices are encouraged by four important considerations:

  1. lower cost,

  2. higher level of security,

  3. no vendor ‘lock-in’ and

  4. better data quality; as all data manipulations can be revised by expert statisticians and programmers.

We believe that the number of data scientists in the music domain is so few that only an open collaboration can guarantee adequate data quality.

We believe that most of the software code producing the indicators of the future European Music Observatory must be made public, and we have made most of our critical sofware code not only open source, but we send it to peer review, which also requires a very high level of documentation. The high level of documentation (see, for example, our iotables that calculates creative industry GVA/employment/tax multipliers) also ensures that our partner’s experts, or independent experts can validate and even modify or improve our code. You can read more about our open source, open collaboration method in the the Data integartion and Data processing parts of the Annex.


  • fully open source and available for review or modification on github with full source - contributions, PR requests are welcome and will be credited for.

  • goes through unit-testing, i.e. automated validation of indicator results

  • the software code will be peer-reviewed;

  • fully documented on

This means that the software code that produces the indicators stored in the will be fully available for cultural statisticians, data scientists of music organizations, researchers and all interested parties. This is the software code that will, for example, check every day if Eurostat has published any new tables, data points or corrections in its data warehouse that is relevant for the observatory.

6.4.2 Licensing Policy

Our current licenses are usually GPL-3 licenses, which allows the use and even modification of our code for free, and provides the highest level of transparency, because all modification must remain open, too.

We believe that this is not the best policy for fostering innovation, an important goal of the future European Music Observatory, because it prevents music tech companies and new startups to use our code for commercial purposes.

Our code was developed over years with a growing team, and we have invested very heavily into this. We are looking for ways to recover some of this investment, and give proper remuneration to our team while keeping the benefits of our open-source approach.

One possibility would be to obtain a large-enough grant that compensates for the investments into the code, and change our licenses to a more permissive license. This means that while the Demo Music Observatory code would remain open source, music tech companies and startups could also use it for commercial purposes. This would prevent that our team’s (currently not remunerated) work would profit other stakeholders. Probably this would have the greatest innovation potential, because it would offer a good cooperation with open and private investment into music analytics.

Another possibility that we foresee is that we keep using some open source license, and grant a different license to some of our technology for commercial enterprises.

We believe that this issues should be broadly discussed with the creation of all data observatories in Europe, including the European Data Observatory, because they are critical from an innovation point of view.

6.5 History grew out of a collaborative observatory, CEEMID, and from its open-source, open data-based automation technology. CEEMID is aiming to transfer thousands of indicators and a verifiable, open-source software that creates them to the European Music Observatory to give Europe-wide access timely, reliable, actionable statistics and indicators for the music industry, policy-makers and music professionals. (Read more about our data coverage)

Over 7 years, CEEMID became a logical starting point of the planned European Music Observatory, because it is a pan-European music data integration system based on open data, open-source software using best statistics, data science and AI practices. CEEMID has created thousands of high-value, hard music industry indicators using open data sources, industry data sources, surveys and various APIs to relevant other data sources.

We believe that this could be a very logical continuation of the work of CEEMID, which came to existence in some of the less data-rich countries of the EU with the same purpose in 2014. Our work was also put on stage on as a good example of evidence-based policy making at the CCS Ecosystems: FLIPPING THE ODDS Conference – a two-day high-level stakeholder event jointly organized by Geothe-Institute and the DG Education, Youth, Sport and Culture of the European Commission with the Creative FLIP project. (See a brief summary of the presentation and our use case, the reproducible research document Central & Eastern European Music Industry Report 2020.)

6.6 Future & The European Music Observatory

Our start-up grew out of the technology, know-how and some data assets of CEEMID. Reprex B.V. was only founded on 1 September 2020 (website: A short introduction is available in the Annex).The creators of the Demo Music Observatory have applied for the Artificial Intelligence Validation Lab of Yes!Delft, which is considered to be the second best university-backed high-tech startup incubator program in the world. Our aim is to ask for their help to find a successful business model for an open source, open data, open collaboration-based creation of creative observatories that can finance its operations mainly from exclusive data services to participants, and to some extent from public research or policy funds. We are treating the Demo Music Observatory as our flagship project, and we hope that we can find the best, quickest and most valuable path for European music stakeholders to build up their own European Music Observatory. We are inviting all former CEEMID partners and other interested parties to build a system of Creative Observatories. While we hope to keep serving their individual needs, as CEEMID has been serving many creative organizations in the last 7 years, we believe that creating “data republics” among non-competing creative organizations can create much value for all of them.

We would like to form a consortium to build an Demo Music Observatory and offer it to the European Commission & the Music Moves Europe Programme as a foundation of the European Music Observatory. We believe that our approach offers the highest data quality, the access to the most innovative tools and the quickest and cheapest route to build up such an observatory.

We will launch very soon a twin observatory for creating an open source, open data, open collaboration based methodology-oriented framework for creating data for all creative and cultural industries in Europe. This observatory is not yet functional, but it is already open for consultation in its very early skeleton phase on

CEEMID has about 2000 indicators that are highly relevant for all the planned pillars of the planned European Music Observatory, i.e. the Music Economy, Music Diversity & Circulation, Music, Society and Citizenship, and can be a creative source of Innovation.


Antal, Daniel. 2019b. “Správa o slovenskom hudobnom priemysle.”

Antal, Daniel. 2019c. “The Competition of Unlicensed, Licensed and Illegal Uses on the Markets of Music and Audiovisual Works [A szabad felhasználások, a jogosított tartalmak és az illegális felhasználások versenye a zenék és audiovizuális alkotások hazai piacán].” Artisjus - not public.

Antal, Dániel. 2015. “Javaslatok a Cseh Tamás Program pályázatainak fejlesztésére. A magyar könnyűzene tartós jogdíjnövelésének lehetőségei. [Proposals for the Development of the Cseh Tamas Program Grants. The Possibilities of Long-Term Royalty Growth in Hungarian Popular Music].” manuscript.

Eurostat. 2014. Towards a Harmonised Methodology for Statistical Indicators — Part 1: Indicator Typologies and Terminologies. 2014th ed. Vol. 1. Towards a Harmonised Methodology for Statistical Indicators 1. Luxembourg: Publications Office of the European Union.

InfoCuria. 2014. “OSA – Ochranný svaz autorský pro práva k dílům hudebním o.s. v Léčebné lázně Mariánské Lázně a.s. Case C‑351/12.”

InfoCuria. 2017. “Autortiesību un komunicēšanās konsultāciju aģentūra /Latvijas Autoru apvienība v Konkurences padome.”

InfoCuria. 2013. “T-442/08 CISAC V Commission.”

Kotzeva, Mariana, Anton Steurer, Nicola Massarelli, and Mariana Popova, eds. 2017. Towards a Harmonised Methodology for Statistical Indicators — Part 2: Communicating Through Indicators. 2017th ed. Vol. 2. Towards a Harmonised Methodology for Statistical Indicators 1. Luxembourg: Publications Office of the European Union.

Münnich, Ralf, Juan Pablo Burgard, Florian Ertz, Simon Lenau, Julia Manecke, and Harolf Merkle. 2019. Small Area Estimation for City Statistics and Other Functional Geographies — 2019 Edition. 2019th ed. Luxembourg: Publications Office of the European Union.

PwC. 2008. “Valuing the Use of Recorded Music.” IFPI PricewaterhouseCoopers.