The Shatzkin Files

Metadata is the new most important thing to know about

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Share on RedditShare on StumbleUponEmail this to someone

Several very recent conversations have come together for me.

1. Joe Esposito, the new CEO of GiantChair, says metadata is the key to publishing in the future; he describes metadata as the modern equivalent of Allen Lane’s discovery that cheaper paperback books sold in mass merchant locations could boost book sales. Of course, Giant Chair is very much involved in metadata as a way to help publishers find marketers and customers.

2. F+W and Ingram have come together to make a deal enabling niche web sites to sell the full range of applicable ebooks to their community. Of course, finding “applicable” ebooks will be dependent on the quality of the metadata that publishers provide to Ingram. I really liked seeing this happen, because it is the first significant example of something I’ve predicted and advocated: that publishers who want to go after communities should sell the books of their competitors and that all web sites should deliver curated ebook stores of the titles of interest to their site visitors.

3. A list discusses whether the publisher has a role in the future, what it is, and how the spoils in a new world should be divided between the publisher and the author. One observer points to the nuances in royalty rates: the royalty implications of the wholesale model versus the agency model, whether or not the commission paid to the agent is or isn’t deducted from “receipts” for purposes of calculating royalties, and what the competitive implications are for publishers going after authors. This gives rise to the next question: are publishers differentiated on royalty rates alone, as though each publisher would sell the same number of books? And that gives rise to the next point: understanding, quality, and richness of metadata can determine how successfully publishers can sell a book.

4. One of the biggest issues for publishers in managing and providing quality metadata is associating all the works and editions of them for each author with that author, and while that challenge intensifies when they look at the author’s books published by others, the fact is that most current royalty systems have plenty of problems keeping track of the multiple titles and editions of any author that they themselves have published.

5. Filedby, the directory of author web sites I co-founded with Peter Clifton, has a new metadata clean-up service called Author Data Advantage that makes it simple and economical for publishers to organize their works and edition data properly tied to each author and to keep it that way as new works and editions are created. Filedby’s service, which any publisher can avail themselves of, can tie all the editions of a work together, relate them accurately to each author or other contributor, and provide each of the authors with a unique ID. That allows the publisher to tie the marketing, reviews, conversation, community, rights, and digital promotions back to the right work and the right author.

Metadata work for publishers is, really, a bottomless pit, since it is, in effect, “information about the book” and there is no limit to that. There will be no end to the categories of quality, interest, and association each book can have attached to it. How many books published in years past, for example, should now be associated with “Gulf oil spill?” If you published one discussing whether using chemical dispersants is a good idea or not, I think you’d probably want somebody googling “Gulf oil spill” to find it, wouldn’t you?

The list conversation referred to above was really about the difference in royalty rates offered by publishers and how the authors cents-per-copy is affected by the agency versus wholesale model. My own hunch is that this won’t matter much in the short run because dollars offered in the advance will still be far more important to the authors’ and agents’ decision than selling policies that can change between signing and publication. In the longer run, differences in the ways publishers handle metadata might be relatively more important because it will affect how many copies they sell.

In an earlier post, I made the point that we’re approaching the day that half the sales of new books will be made online. All the sales of books online are highly dependent on metadata. Very robust metadata can enable a book and author to get discovered when more minimal, even though correct, metadata would omit it from the conversation. Incorrect metadata can prevent a book from being found even if the customer knows pretty much what they’re looking for.

Metadata, what it is and how it affects discovery and sales, is a subject that every book professional will find increasingly important to understand and master in the days to come.

Last year I wrote a post suggesting that one way publishers might deal with piracy is by posting sabotaged files on offending sites, rather than just playing whack-a-mole. This triggered more than a few hostile reactions. I found it ironic to see yesterday that the new Stephanie Meyer ebook could be the occasion for software mischief-makers to come into conflict with copyright mischief-makers, using infected PDFs of a book many people want as a way to gain entree into people’s computers with malware. So now the hackers who want to attack your operating system are the allies of the publishers who want to discourage people from downloading ebooks from anything but clearly-authorized sites.

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Share on RedditShare on StumbleUponEmail this to someone

  Back to blog

  • Chris

    Mike, the sabotaging of files is not really a problem for anyone who downloads from torrents. There is a trust rating associated with the uploader and, of course, there are user comments relating to the download.

    • Thanks for the education, Chris. So I guess the publishers have to depend on
      crowd-sourced saboteurs.


      • Chris

        Or perhaps stop worrying about it.

        No one will ever stop it.

        Even if torrents weren't available, people will simply email book files to their friends. We're not talking about GB's of info here. It takes two seconds of your time.

        As an aside: 16 of Amazon's top 100 kindle titles are 'Pride and Prejudice', a work that is in the public domain. One of those files is a free ebook, the other fifteen are paid ebooks. We're talking thousand of sales here … paid sales. For a free work.

      • Actually, it is the casual emailing of files that I think is the most
        sensible sort of trafficking to try to stop with DRM. I know they can be
        hacked, but only by a small percentage of the overall population. I think if
        we reach the point where every purchased ebook can be easily sent from one
        person to another for free, there will be a dramatic drop in both sales and

        But I agree, overall, that piracy is sort-of like used books in the print
        market. Publishers will hate it and try to figure out how to stop it, but a
        certain level of activity will just be part of the landscape.

        When we get to a true cloud situation, of course, this all goes away. Google
        will only have to do as good a job of keeping people out of your files as
        they do at keeping people out of your email.


      • Chris

        The beauty about using Amazon and ibookstore et all is that it is far easier for the majority of people to simply buy a title and have it appear on their readers. Most people don't want to stuff around with converting, transferring etc.

        So I guess we get back to the old chestnut of pricing titles low enough not to be a barrier to selling or an inducement to pirating.

        Take the Pride and Prejudice example. Why are people buying those titles? I bet it is because of the cover image (the freebie has no cover image), product description, easy access and negligible price point. If publishers price an ebook too high, they should assume people will pirate it. This goes back to a post you made several days ago … publishers need to experiment on pricing. Same as any other industry.

        Once again it seems that the pub industry is trying to protect an out-dated business model. Most worthy new media content is free. You don't even need to pirate it. In fact, I can't think of a single worthwhile blog run by a professional (from any industry) that has paid subscription. Unless, it is a 'path to riches' scam.

        What do I read? Umm… Jarvis is free, Petis is free, you're free, Fallows is free, Perkowski is free, GalleyCat, Arrington, Fake Steve Jobs 🙂

        'Shit my Dad says' is also free … and now a successful book (#9 HC @ Amazon).

        Sure, there are premium subscription sites but they are mostly quite specific. And at any rate, what is stopping someone reporting, for example, Cader's pay-walled content on a free blog? It's just aggregation. Happens to WSJ all the time.

        My apologies for spearing this somewhat off topic to the bulk of the content in your post. I'm just sick of pirating being an excuse for lost revenue. Most people don't pirate books, in fact, most people don't even read books. Talk about revenue loss … as a publisher, that would be my biggest worry right there!

      • Chris, I really agree with you.

        The big publishers really break into two camps. There are some — and some
        are pretty public about it — who see piracy as an existential threat to the
        industry and spend real money and bandwidth combating it. Then there are
        others — I know of two for sure — that see it as we've been talking about
        it: part of the landscape, not cause for celebration but not cause for any
        sort of obsession either. They politics of agent relationships absolutely
        require that publishers do certain minimal things: apply DRM, for example,
        and issue takedown notices when they see clear instances of copyright

        Brian O'Leary of Magellan Media, who is one of very few people who have
        really tried to *measure* the impact of piracy on sales, feels that it
        stimulates sales much, if not most, of the time. That's based on sketchy
        data, but, to my knowledge, nobody has ever proven that pirated copies hurt
        sales. You're expected to take a high number of pirate downloads as prima
        facie evidence that some proportion of them constitute lost sales and that
        whatever stimulative effect came from word-of-mouth didn't compensate for
        them. But on that one, I'd say your guess is no better than mine and mine
        would be that piracy is more of a catalyst than a cannibal.

        And I thank you for the articulation of the central point: that there is
        more and more great content available free and, if you believe in the laws
        of supply and demand, that's pretty likely to drive down the prices people
        can charge. I also agree that quality content can be exploited in a variety
        of formats, and that the existence of “free” doesn't eliminate the
        possibility of “paid.” It might even lead to it.


      • Chris

        All authors would do well to think about how they want to 'sell' their work. Especially non-fiction writers.

        Perhaps, in this day and age, a book isn't the best format when it comes to monetising their work.

        A low-cost or free ebook could be the teaser to the ever-changing, currently relevant, fully interactive blog.

        I would assume that the meta data in that blog would be far more powerful than that contained in any book.

        The Khan Academy [ ] is a great example. If he can successfully monetise that site he will be a new media winner.

      • To the extent that the identifying metadata are the tags, those on a blog
        wouldn't necessarily be better than in a book. I create the tags for each
        post as I do them. I would need a more sophisticated system by which I went
        back in light of new events to make the tags any better than
        well-thought-through tags would be in a book.

        Very impressive site. I wish I had known about it yesterday when I was
        talking to educational publishers about “Hidden Competitors.” This was spot


      • Chris

        Sorry, Mike. Last reply … I promise!

        I think relevance is the keyword here. Yes, you add your own metadata for each post (I assume you only use tags to save time?), but would you redo metadata for an book you wrote 2 years ago?

        You cite the Gulf Oil example.

        And here, today, BP's spokesperson said: “We have bought search terms on search engines like Google to make it easier for people to find out more about our efforts in the Gulf and make it easier for people to find key links to information on filing claims, reporting oil on the beach and signing up to volunteer.”

        Time relevant data.

        As for Khan's site: it's brilliant. My children are in for one hell of an interesting education with guy's like this around. All for free.


      • Chris

        BTW: Jobs says 1 ipad selling every 3 seconds.

        ipad + Khan + youtube html5 = time relevant education in your hand.

        No wonder it is a frightening new world for publishers.

      • No apology necessary, Chris. I generally grit my teeth at lengthy comments.
        “Concise” is a value for me.

        But you delivered real value with your ideas and information. I am sure the
        readers who get down this far will have appreciated it.


  • brianoleary

    Certainly metadata and its integrity have become only more important as both the sale and provision of content have grown increasingly digital. I've been impressed by the attention paid to things like Laura Dawson's weekly #ISBNhour, hosted on Twitter. Using social media as well as e-mail blasts and the like, she has taken the work that has been often locked inside BISG subcommittees and made it accessible and relevant to publishing folks. It's a sign that both awareness of and support for metadata quality are increasing, and it gives me hope that a more common-sense approach (that is, not just about systems or standards, but how metadata is “owned” and maintained) will carry the industry through this rough patch.

    • Thanks for the pointer to Laura's Twitter work. I think this is a
      never-ending challenge, on some levels. The first task is to get the core
      metadata right, which not everybody has. I was hoping that the Gulf disaster
      example would underscore that the job never ends and it is up to the
      publisher (or whoever wants to make money from getting the content
      discovered) to be thinking about it every day.


  • Flagship storesAbercrombie Fitch Pants
    The Company opened the first ever flagship store for the Hollister concept 16 July 2009, calling it “the
    True Religion Jeans
    coolest store to open in NYC.”[18] The flagship is located in the fashionable SoHo district on 600 Broadway at

    the southeast corner of Houston and Broadway. Four floors of the occupied building provide a retail space of

    40,000 sq ft. The flagship rep. commented, “The EPIC store is what Hollister is total about–big waves, surf,

    sun, and hanging out on the pier. The laidback HCo. vibe is effortlessly cool, and we're bringing the SoCal

    lifestyle to SoHo.” Thomas D. Lennox (Vice President of Corporate Communications, A&F Co.) stated that the

    company believes the flagship will be a “memorable” and “unique” experience to customers, as well as an

    necessary step for the brand. Out of the total capital expenditures for fiscal 2008 of A&F Co. (up to 445

    million USD), approximately 300 million USD was be spent on fresh store construction and remodeling, including
    Vibram Five Fingers
    the HCO flagship.
    Abercrombie & Fitch anticipates opportunities for opening HCO flagships “on an international basis” in the

    near future. It is called the “EPIC” expansion program for HCO a portion of the greater expansion effort for

    total A&F brands on a global scale.
    In November 2009, A&F released plans to open an “Hollister Epic” in 2010 on Fifth Avenue.In February 2010, A&F

    officially confirmed its plot to open a second EPIC flagship in Fresh York. The location, originally planned

    for an abercrombie flagship, is on 666 Fifth Avenue, and included 22,000 square feet of retail space. The

    location was the previous second flagship spot of Brooks Brothers which vacated January 31, 2009. Pre-

    development had begun but has currently been postponed. 666 is also one of the most exorbitant retail spaces
    on the Avenue[22] is near the Abercrombie & Fitch flagship and such luxury boutiques as Chanel, Fendi, and Prada.

  • Ashford

    Not long ago I was in a bookstore (yes, an actual retail store!) and remembered that I wanted to find a copy of Tracy Kidder's early bestseller, “House”. It was not easy. With help from a clerk, I finally located a copy in the Reference section under the heading Home and Garden. It was on the bottom shelf next to a book called “Great Garages, Sheds and Outdoor Buildings”. It was exactly where the computer said it would be, and yet I'm sure this wonderful work of general nonfiction lived there in quiet obscurity. One must be careful with metadata, including BISAC codes.

    • Thanks for the cautionary tale. This is an example of why book merchandising
      is so difficult. It is really complicated.


  • How well do you think books on Search Engine Optimization (S.E.O) will compare or be able to help writers/organizations to navigate the idea of metadata?

    • Metadata definitely plays into SEO, but in the B2B world of book publishing
      metadata is important in ways that have nothing to do with SEO. So my hunch
      would be that a book of that kind would be helpful, but that it wouldn't
      provide all the help you'd need.


  • Pingback: Ein Tipp für Verlage, die Beziehungen zu Communities aufbauen wollen | Leander Wattig()

  • Pingback: Os consumidores de eBooks amam os best-sellers ou isso é uma ilusão? « eBook Reader()