LinkedBrainz
Current Status (2021)
LinkedBrainz is currently unsupported and out of date as a source of MusicBrainz data, however the MusicBrainz website does output complete metadata in json-ld format:
curl -H "Accept: application/ld+json" https://musicbrainz.org/artist/20ff3303-4fe2-4a47-a1b6-291e26aa3438
LinkedBrainz
The LinkedBrainz project is intended to help MusicBrainz publish its database as Linked Data. Linked Data is simply a method for publishing structured data on the web using semantic web technologies. Because MusicBrainz does such an awesome job of providing unique identifiers for music artists, albums, and tracks, it is already widely used as a source for music-related URIs in the Linked Data community. However, Linked Data people tend to mint new URIs based on MBIDs because MusicBrainz does not serve Linked Data directly. We hope to reduce the need for this duplication by providing Linked Data directly from MusicBrainz while having no negative impact on other aspects of MusicBrainz's functionality.
Goals
LinkedBrainz will provide:
- a mapping of NGS and relationships to RDF
- integration with MusicBrainz server code to provide dereferenceable URIs
- a SPARQL endpoint for querying MusicBrainz data
Implementation
These additional pages relate to ideas for implementing LinkedBrainz
Mapping
Entity concepts in MusicBrainz will be mapped to concepts in the Music Ontology and other appropriate ontologies.
There's been some activity related to this lately on the Music Ontology Specification list.
Old RDF Mappings
At least 4 RDF mappings of the MusicBrainz database exist
- the original RDF service used by MusicBrainz back in the day
- the Zitgist mappings
- the DBTune mappings
- the new Talis dataincubator mappings (work in progress)
None of these tackle NGS but should serve as a good starting point. The old RDF seems to predate Linked Data ideas and Music Ontology modeling.
Proposed Mappings
Ideas for mapping MusicBrainz NGS and Relationships to RDF should be discussed on
Dereferenceable URIs
There are essentially two approaches to providing dereferenceable URIs (in the Linked Data sense) of the form
http://musicbrainz.org/<type>/<mbid>
These approaches are RDFa and Content Negotiation. NOTE: These approaches are not mutually exclusive and applying both (or one and then the other later on) is a viable option.
RDFa
RDFa is a syntax for embedding RDF into HTML documents. The RDF modeling of a particular MusicBrainz entity could be embedded along side the normal HTML. Web browsers and RDF consumers would use the exact same content.
pros
- only small changes to the code base required
- most parsers read RDFa these days
- no change to the DB server load as the exact same queries to render the HTML render the RDFa
cons
- HTML page sizes would get bigger - initial tests indicate between 20-30% bigger than non-RDFa NGS pages
RDFa bloating test
The artist page for James Brown was annotated by hand with RDFa. The hand RDFa-fied pages grew by 5.6% for a sparse page and 29.2% for a dense page with lots or releases. These are just estimates the example RDFa pages do not validate yet...
| page | size | |
|---|---|---|
| dense page | ||
| James Brown (old page) | 80.6k | |
| James Brown (NGS page} | 37.3k | |
| James Brown (RDFa page) | 48.1k | |
| sparse page | ||
| Cycles (old page) | 25.6k | |
| Cycles (new page) | 13.0k | |
| Cycles (RDFa page) | 13.7k | 
Content Negotiation
With the content negotiation approach, during each request, the HTTP Accept header is examined. If it contains something like "Accept:application/rdf+xml" an RDF/XML document is returned. Otherwise a normal HTML page is returned.
pros
- the "classic" linked data approach
- most widely supported by RDF consumers
- no HTML bloating
cons
- must modify code base and muck around with the request cycle a bit
SPARQL Endpoint
The SPARQL endpoint will allow users to query the MusicBrainz RDF graph using all the rich expressiveness of the SPARQL Query Language.
The DBTune SPARQL endpoint provides provides an interface to pre-NGS data.
There are two technical approaches for implementing a SPARQL endpoint we will refer to as "Replicate and Swallow" and "Wrapping".
Replicate and swallow
In this approach a script runs on the MusicBrainz DB and creates an RDF dump using our mapping. We then load this dump into a purpose-built triple store like Virtuoso or 4Store.
Obviously this is a hack and not what we want to do in the long run. However, it's the fastest way to get a SPARQL endpoint up and running. This has been done "on the side" separate from the MusicBrainz code base on C4DM's server for testing out mappings, even though it is not our final solution. See the Snorql interface.
Wrapping
In this approach some software wraps the Postgres DB and translates SPARQL queries into SQL. The DBTune endpoint actually does this using the D2R Server software and a declarative mapping file. However, the DBTune endpoint operates on a remote server using a MusicBrainz DB dump. The D2R Server software is based on Java. We might implement something similar that is lighter weight and based on Python (or Perl if that's what it is) to run natively on the MusicBrainz servers.
Project Details
people
Simon Dixon of the Centre for Digital Music is the Principal Investigator on the project and author of the grant proposal.
Kurt Jacobson (kurtjx) also helped write the proposal and was employed on a consulting basis for 3 months.
Cedric Mesnage joined the project on 1 March 2011. He was formerly a PhD student at the University of Lugano and worked on the EU FP7 project NEPOMUK.
Dr Barry Norton joined LinkedBrainz on 1 April 2011 from the Karlsruhe Institute of Technology where he worked on the EU FP7 SOA4All and PlanetData projects.
zazi joined LinkedBrainz on 1 July 2011 for continuing the MBZ NGS to Music Ontology etc. mapping.
funding
Funding for LinkedBrainz comes from JISC.
period of time
The LinkedBrainz project finished at end of July 2011.