SEARCH

Metadata 101

by Ben Zamora-Weiss

 

At Aldis, we’re chiefly working with technical solutions that help our clients manage digital media, so some people are surprised to find that our team consists of a number of librarians with degrees in Library and Information Science.

A DAM wouldn’t be worth a damn if you didn’t have quality metadata to help you find and use what you need! This can be a complex topic for people new to it, so this is the first in a short series of posts to help serve as a primer.

 

Any content tracking system is only as good as the metadata included about its contents. Think about documents on your computer: it would be frustrating to have everything simply titled “Document 1”, “Document 2”, and “Untitled Document” all living in folders with similarly non-descriptive names; to find *anything*, you’d need to start opening items one at a time. Step things up to having *okay* names, but everything still living in just a few giant folders. Now having information *about* the file (like the date it was last modified) as a way to sort similarly-titled “Expense_Report_x.pdf” files might help you narrow down to get the one you need. Take that thought and multiply it by a thousand and start thinking about also tracking many different types of files; that’s the nightmare a digital asset management (DAM) system could be if you don’t have good metadata associated with files. In this brief introduction, we’ll look at the main types of metadata, how an asset management system might leverage these, and provide a bit of vocabulary that often comes up in metadata discussions.

Metadata: a set of data that describes
and gives information about other data.

The Three Paths

We tend to talk about metadata in three primary ways, and we’ll look at these in growing levels of complexity: Structure, Function, and Theory. 

 

Structure

The first is simply asking whether the metadata we’re collecting is structured or unstructured; if there are rules to how information should be entered and then defining what those rules are.  Free text fields, for example, are very much unstructured while a checkbox is a very structured True/False item.  A list of names?  It takes time to create predefined (structured) lists to pick from, but doing so can eliminate the same person being tagged multiple ways (Bob Smith and Robert Smith).  Will you use “First_Name Last_Name” or “Last, First”...and what about titles (Dr. or Jr.) and middle initials?  Similarly, how will you list dates?  If a user types “June 24, 2020”, some MAM systems can translate to “06/24/2020” (if that’s the date format set up in the system), while others might require dates to be entered in a certain format.  Even defining a structure for file names can be helpful in some DAMs.  For example, if you set the expectation that your photographers name all their files in a format like below before delivery to the DAM, you might be able to utilize automations in your system to look at the file names and extract useful metadata simply based on that string of text, using the underscore as the break point.

 

Function

The next way of thinking about metadata adds complexity by splitting ideas into three main buckets based on their function: Technical, Administrative, and Descriptive.  There may well be some overlap between these for your DAM, and that’s okay!

 

The first set, Technical metadata, is usually the easiest to collect as this is nothing more than describing the attributes of an item, such as the file’s type, size, creation date, codec info, aspect ratio, color space, data rate, etc.  Most DAM systems can collect a wealth of technical metadata during the ingest (import) process.

 

Administrative metadata is the second bucket and helps define how media might be used inside and outside the DAM.   Fields might be set up that trigger certain actions for your media such as create proxies, send notifications, or transcode into another format.  It might be useful to track info like the producers or editors attached to a given project for easy reference.  Keeping detailed and clear rights management and embargo data is another very common type of administrative metadata, noting associated usage restrictions of the platform, expiry date, or fees. User permissions define who can access certain sets of media within the DAM, how much metadata they see and whether they can edit any, or limit the ability to download and share content. 

 

The final bucket is Descriptive metadata, where we detail the contents of a project or specific file.  This might include general information like a description of the overall project and what it was for, but can be as granular as putting a name to every face that appears in a group photo.  After all, when most people search the DAM, they’re often looking for a specific file or type of content to reuse if they don’t need a full project.  With this in mind do you care to differentiate different types of audio files into music, sound effects, voice-over, final mixes?  Similarly for video files, think about the different types of video you might want to specify; if you capture interviews, are there subsets you want to quickly call out?  Going a step further, what sort of info might you want to capture about the content of the media?  Are specific people, brands, products, or actions featured?  Is there a transcript you want to create and include? The main crux to thinking about Descriptive metadata is asking “How is someone going to find this again?” This question can help reign in the impulse to describe every aspect of every file in minute detail. Descriptive metadata doesn’t have to be comprehensive to be useful!

 

Theory

Now let’s get a little philosophical about metadata; the third way we approach these conversations looks at the theory behind the metadata being captured about an item.  While we use a similar three-pronged approach as above, this time we give each item (or set of) three levels of depth and talk about the ‘Is-ness’, ‘Of-ness’, and ‘About-ness’ of what we’re tagging.  

 

‘Is-ness’ is usually the simplest and often based around a technical or structural way of describing what an item is: A PDF.  An MP4 final.  A PSD image.

 

The next level is ‘Of-ness’, commonly just a literal description of the content we’re tagging.  The PDF is a copy of final shooting script.  The MP4 final is of the CEO address for the upcoming annual meeting.  The PSD shows headshots of the CEO, CFO, and CHRO with text and logos.

 

Finally, we get to ‘About-ness’, which goes a bit further and looks at the intention behind the item we’re talking about.  Now the PDF script is for the CEO to read at the recording and includes stage directions and areas of emphasis.  The MP4 final of the CEO’s message was exported to follow the guidelines for the internal video distribution network as a ramp-up and invitation to the annual meeting.  And the PSD of headshots and text is advertising some of the key presenters to be used in an email blast tailored to key stakeholders.

 

Next Up...

We’ve covered a lot, and this isn’t a textbook.  Next time, we’ll take a closer look at ways to organize this metadata for your DAM: outline the common field types, talk more about controlled vocabularies, and dive into taxonomies.