What is the best media metadata format?
People ask us this all the time. I just got off the phone with some smart, smart people who are asking all the right questions about metadata formats for exchanging media content info between services.
The wisest thing I've ever heard on this topic was from a veteran of umpteen standardization committees in the audience of a conference panel. Unfortunately I did not catch her name, but she said something along the lines of:
"Deciding on a metadata format is like picking a database. The hard part comes next, in defining the fields/data model".
Ideally we'd have a small number of well standardized formats, each with a range of wonderful libraries and tools. In the early days of MetaBroadcast we thought it might be possible to get there quickly. We've stopped believing that.
The truth is that it's really easy to convert between metadata formats. URIplay converts between dozens, and we're pretty agnostic. The format is an operational issue, just like choosing a database. The main criteria is the ease with which a fresh developer can understand what's required.
The really difficult bit is getting the fields right. Building a user-friendly product is going to require a set of compulsory fields. In the case of VoD or AoD products, compulsory fields will typically include titles, description, pictures and a way to access the content. Unless you can deal with lots of gaps in your product, you have to specify this stuff, and content providers have to deliver. (Incidentally, this is one of the spots where RDF and the semantic web tend to fail. There's normally a lack of agreement on the compulsory fields, sidestepping the hard work of building an application from distributed data.)
The hardest fields to get right are those that allow the data consumer to identify an item of content. This is vital if you are going to figure out what's unchanged, old, or new as you update the content. Some standards do a poor job of identifying content. As a consumer of data you can't afford to get this wrong, and it can be really hard to explain to data producers.
So, back to the question. What is the best media metadata format? Well, there are three broad options:
- An "industry-strength" traditional standard – TV Anytime, MPEG 7, DAB EPG etc. Easy to understand they are not! But people expect profiles and rules to be applied. They look hard to handle, and they are hard to handle. That's OK if you're building something big and permanent, and telecoms/broadcast standards people will approve of your choice. But the world of the web thinks differently.
- Web standards – atom, media RSS, or more commonly a bastard cross-breed. People are familiar with these, so many will figure they're easy to handle. But they're not familiar with your set of compulsory feeds. So their standard feeds probably won't work, and their feed creation tools might struggle, too. Looks easy, probably has a sting in the tail. But at least you're building on something standard. You'll get some great generic tool support, but you're on your own for anything application specific, and you will probably have to define much more than if you started with #1 above. Still, your feeds will work in lots of generic readers too, and maybe your conventions are adopted some day?
- Roll your own – pick a simple base format that's easy to understand, like JSON. Reuse namespaces and field names where you can. You'll get basic tool support, ease of understanding, and clarity that this format requires special thought and development. Development effort for creator and consumer are probably similar to #2 above, but many people will accuse you of reinventing the wheel.
Maybe RDF and the semantic web will be added as a fourth option here, one day. These offer a possibility of really well described data that can be interchanged easily between many types of applications. But these bold aims require amazing tools, and levels of standardization that have not yet been achieved, and still seem a long way off. We consume and produce RDF from our systems, but most of our effort still goes elsewhere.
At MetaBroadcast we follow all of the three approaches set out above. None is perfect, and each is right in some situations. Big players should probably support several options.
Which would you choose, and why?
This was cross-posted from the MetaBroadcast blog.
Posted at 13:31 UTC, 16th April 2010.