cleaning up after a messy website migration: how to start fresh when you can't start over
Presented at Confab Higher Ed, New Orleans, 11/5/2015.
CMS migrations, reorgs, changed administrative priorities, and other events can leave your governance and content out of control. We’ll use the IU Libraries’ 8000-page site migration and ongoing post-migration cleanup as a case study to talk about ways to dig out of holes you may have fallen – or even jumped headlong - into. You might make different choices next time, but you still have to deal with the aftermath of this time. The old cliché warns against “closing the barn door after the horse has bolted,” but with luck and perseverance, you can have a tidier barn and happier horses!
In this session you will learn about:
Breaking your website governance free of the organizational chart without frightening the HR office
Moving from a highly distributed content creation free-for-all to a more centralized governance model without making people feel like you took away their candy
Writing a new scope or mission statement for an existing website and dealing with content that no longer fits the revised mission
In our content strategy fantasyland, our stakeholders are 100% on board, everyone’s a great communicator, analytics tell us what we want them to, technology never fails us, we get to implement our favorite best practices from the outset, our administrators’ priorities don’t change without consulting us first… it’s a nice dream, right?
True story: as an undergrad, I worked in my dorm dishroom. One night we had “too many dishes to even do things efficiently.” It had never occurred to me that efficiency might depend on circumstances. Our web work gets like a pile of dirty dishes sometimes; you can’t employ your Best Practices even if they’re the Right Thing To Do. Sometimes you just have to wrangle a pile of crusty, gross silverware until you break it down to the point where you can cope with it.
There’s a saying, “you’re trying to shut the barn door after the horses have bolted.” You’re trying to prevent something that’s already happened. Sometimes when you get into a horrible tangle of crusty gross… something…it can feel like it’s too late to do anything about it. But I like to think it’s never too late. These messes can happen in a lot of ways.
Content strategist roles vary a lot. We have different environments (academic units, marketing, admissions), different institutions, contexts. Take a moment here to think about what your own mess might be, or a mess that you might anticipate. What are you digging out from? Or hiding your head in the same from because digging out is going to be too hard? Think about your own context as I talk about OUR mess.
“The Winchester Mystery House is a mansion in San Jose, CA … renowned for its size, its architectural curiosities, and its lack of any master building plan.” It sounds like our old site. It was built on a homegrown CMS and had been up essentially unchanged for over a decade when we finally migrated. It was like someone whose job description was so out of date that the only thing that still applied was “other duties as assigned.” It was so disorganized we gave up on trying to do a content inventory. We migrated about 8000 pages, and that was after an initial round of deletions (class pages) – we had so much stuff we had no idea what we had.
We had feedback from users and from content contributors. The site no longer met expectations they’d developed from using other sites & other tools. Also, the tools had aged out (our CMS was programmed to work in IE only … and then Microsoft released a new version of IE and it didn’t really work there either). WE HAD TO GET OUT OF THERE.
We were running around moving the leak buckets and so we didn’t have time to fix the leaky roof. How did it get that way? The web team that managed our old site was made up of librarians/staff who didn’t have “web” in their job descriptions; they were selected to get input from each sector of the org chart. (I KNOW.) Also, there was not much metadata, & what was there was poorly structured. Which caused it to migrate like a blobby mess. I never realized how important it was until I had to deal with piles of content that didn’t have it.
It can be useful to see How It Got That Way, avoid repeating mistakes, recognize how the context has changed. But people do the best they can at the time with the skills & support they have. Blame is not productive.
We had to make a decision. Either direction was the Great Unknown. We chose to migrate the legacy content into the new platform rather than blow it all away. We might do it differently now, but hopefully we’ll never again be moving out of a 12 year old CMS
So once we’d actually migrated, we had an architecture in place, but old content was sort of shoehorned in. (& because of aforementioned crappy metadata, we had to do a lot of manual hooking-up.) The only way to tackle is to break it down into phases or areas. So let’s break down the areas we had to deal with in our post-migration mess.
Our governance was … um I think we’d heard of the word before maybe? Our Intranet used the same platform as our old CMS. So because everyone needed Intranet access (even though they hated it… everyone had an account in the CMS. When we migrated we also took down the old intranet & moved to SharePoint, but all those user accounts migrated.
We had crazy weird permissions nobody understood. And we had hundreds of accounts. And we had NO procedure for weeding out accounts. Students who’d long since graduated. Librarians who’d retired, but still owned web content (sigh).
We had dead people in our CMS!
Post migration, we did a major user list purge, + created an account policy posted on Intranet that provides for regular deletion of unused accounts. (Some political exceptions e.g. associate deans….) We also started offering to make updates for some of our authors, so those who had 1 update a year didn’t have to sit through training refresher every year. Also, we assigned content roles not based on the org chart, not based on the sitemap, but on what people actually need to do. “Who needs to be able to post news items on the homepage?” Create a permissions role that allows them to do that & assign as needed. Ask people what they need to be able to do! Ask them how their web work fits into their workday.
A lot of it boils down to empathy. Understand what people’s workday is like, what their goals are. This applied especially to working with library administration & human resources. Getting us OUT of the intranet business let THEM manage that part. And we let them manage the staff directory, which is pulled out separate from site permissions (it was pretty inextricable before). Unhooking intranet from CMS also let them keep closer tabs on confidential info.
The way our old site was structured not only affected the messy migration, it also affected how ongoing content contributors understood their content & the site. The existence of a mess changes how you perceive things. And people’s perceptions are changed by the tools that they have.
Our old site was based on self-contained mini-sites which did not permit sharing/reuse of pages. This also affected how authors approached the site. No sense of overall context. People mean well…they just don’t think outside their own context. We realized so many of our content contributors had never used anything but our old CMS. Understanding how they were perceiving the author experience required talking to them, asking them questions, realizing why they saw things the way they did.
Everything HAD to be a library, collection, department, or subject. If it didn’t fit into one of these categories, we MADE it fit. As you can imagine, all this non-fitting stuff with kludged metadata migrated awkwardly and had to be cleaned up. We did a relatively thorough audit of this content, but it was basically the junk drawer, and we kept discovering stuff. A lot of the non-collection collection stuff ended up being out of scope, we’ll talk about that in just a bit.
We added admin tools that made it easier for people to see the scope of content across the site. If you’re adding a news item, you can see what’s already there. We made better use of content types. People’s behavior is shaped by the tools they’re given. Create tools that encourage the behavior you want to see, when possible.
In addition to ROT (redundant/outdated/trivial) content, we began encountering content that made it clear that the scope of the website had to be redefined. Or, well… defined. When all you have is a website, everything looks like web content…
What do you get if you cross a fox with a chicken? A FOX. From The Stranger’s Long Neck. “The chicken is the current content; the content you need today. … The fox is the archive, the library, the place where we store all the stuff we might need at some future date.” The fox will eat the chicken if you don’t keep them separate. All your out of scope content will overwhelm your website if you keep everything in the same place. Let’s talk about how to find places to move the stuff that you can’t just delete for whatever reason.
*Review/rehome process. Consider audience (IS there one?) & where they go. Do you need to create new places or is there a service or platform you can use? You can point to stuff outside your own website! That’s how the web works! Maybe tweak search to index selected external sites, if users rely on site search.
*Semester-specific content. Can you set this to expire? Or manually delete on schedule? *Internal-facing: Intranet! *Fixed-form: Publications, no updates expected. Institutional Repositories are great for faculty publications, the product of research, even department newsletters. Ask your librarians about yours, if your institution has one. If IR is too heavy-duty or you don’t have one, see if your uni has a Box- or Dropbox-type solution. *Digital collections of research value to our users: We have a department with the expertise to bring up to archival standards, has an intake process. We can point to some of these from the website by making them resources (which is a content type). *just in case… the wayback machine exists! Also Archive-It, if your institution participates, which is not going away. Also, talk to your university archives; they archive digital materials nowadays, not just dusty boxes of papers!
After a year, we mass-deleted ~73% of pages. Most were draft, hadn’t been touched since migration. Had anyone even missed them? We helped content contributors find backups in Archive-It on the few occasions that they needed their stuff back. And created a clearly communicated policy to purge on an ongoing basis. We’re not a storage bin, and web pages can’t get tenure just by functioning long enough. (+ our web pages aren’t scholarly output – nobody will get tenure because they authored hundreds of web pages on our site!)
So what DOES belong on the site? I realized we’d never made this clear to our content contributors. The website is for THIS, not THAT. Like so many things, it’s a communication issue. In higher ed, we value the documented, the written word. It helps to be up front about your goals. So think big thoughts – we go on periodic retreats; going offsite helps!
We searched around for similar scope statements for websites, without finding much (maybe they’re internal). It became more than that; it’s somewhere between a scope statement and a Core Strategy. But we like to call it a manifesto. We tried to situate it clearly within the way we see the mission of academic libraries, and we tried to write something that could guide us as we continue to respond to changes in technology, user expectations, research needs, etc. We tried to clearly outline the need for expertise in managing the library website, not just a web team.
We didn’t just enumerate specific types of content that belonged on the site. That would go out of date + people would just shoehorn stuff into the “approved types” (non-collection collections all over again). Providing access to the library catalog is clearly a core function of the website, people still look for books! Also we provide access to vended resources (databases, e-books, journals) even though that sometimes amounts to being the portal that leads people into a plethora of silos that we don’t control. So our manifesto comes from a place of deeply understanding the mission of our organization, and that’s key.
Everything we do on the website should come from a place of fulfilling the mission of the library and the ACADEMIC (research/teaching) mission of the university. If it doesn’t do that, it doesn’t belong. It’s also important that our content is continuously updated, not archival. We knew this instinctively, but it had never been enumerated for our content contributors.
“For the benefit of the user” seems like common sense, but we keep coming back to that. When specific content is in question: how does this benefit our users? Emphasizing consumption rather than production drives the metrics we use to measure success. Although traditionally “the best library” means “the one with the most books,” that doesn’t work on the web. The full document outlines a lot of our thoughts about governance – it should be in the hands of people with expertise, not a consensus of people chosen to represent the org chart, among other things.
To wrap up, a bit of a pep talk. Maybe you made a wrong turn, maybe you had lousy directional signage, maybe you did make a bad decision. Many of us have been in that position. So don’t beat up on yourself.
Don’t fret about the past or how you got here (though understanding causes can be helpful). Just dig in & start cleaning up. You can’t wait to start digging out until the new dean is hired or you get that budget line for a new staff member. Just start.
Don’t wait to launch (relaunch) until everything is perfect. IT WILL NEVER BE PERFECT. Plan to release in stages then cycle back around with continuous improvements. Know that improvement will be ongoing. Help your stakeholders understand this.
The least amount of content that you can have ready to go so that you can launch then iterate. For us: Our website won’t lie to users. Library hours/contact info will be correct. Access to subscription databases must work well; catalog; help. What’s the core stuff? Set priorities. Which means you have to know what your website is for. Thinking back to that pile of silverware: your MVP might be “clean service for 12 by 5 pm for the dinner party.” We all live & die by the academic calendar. Maybe the most important thing is your admissions application, and if that’s ready on time, everything else can come second. Maybe it’s your instruction manuals for student workers. You have to know what’s most important to your mission and to how your organization operates.
The content that’s used most often by the most people has got to be your priority. (Don’t design for the tiny tasks. Read Gerry McGovern on “top tasks.”) Which means you have to assess what your users need! Stuff that’s seldom used may just have to wait. Spoiler: sometimes you’ll realize you didn’t really need some of that at all!
Understand your mission-level goals (our users need to be able to do x); set long-range + sprint goals. Don’t obsess over the long-range, just keep them in your peripheral vision. Reexamine the long-range goals periodically and let them drive your sprint goals.
If you have badly written content everywhere & it’s disheartening, pick one thing. Get rid of all the “click here” or clean up the pronouns so the user you’re addressing is always “you.” Make a chart of discrete mini-projects: Resolve duplicate page titles. Clean up metadata in one content type. Every now & then, not too often, go to Wayback Machine; realize your site really is better than it used to be.
cleaning up after a messy website migration: how to start fresh when you can't start over
when you can't start over
how to start fresh
cc: Joccay - https://www.flickr.com/photos/57555837@N00
Anne Haines @annehaines #ConfabEDU
cc: stevendepolo - https://www.flickr.com/photos/10506540@N07
cc: SeeMidTN.com (aka Brent) - https://www.flickr.com/photos/94502827@N00
trying to shut the barn door
after the horses have bolted
cc: Richard Masoner / Cyclelicious - https://www.flickr.com/photos/99247795@N00
what’s your mess?
cc: Jenn and Tony Bot - https://www.flickr.com/photos/7315825@N04
cc: Jenn and Tony Bot - https://www.flickr.com/photos/7315825@N04
collections of stuff
“just in case”
hey, you gonna use that?cc: cobalt123 - https://www.flickr.com/photos/66606673@N00
cc: John Lemieux - https://www.flickr.com/photos/21051229@N06
a written statement declaring publicly
the intentions, motives, or views of its issuer
cc: Trevor Pritchard - https://www.flickr.com/photos/11451700@N00
In some cases it has been
easier to say what the library
website is not – a catalog,
a fixed-form document,
a repository – although
it facilitates access
to these things, and perhaps
makes them discoverable.
The library website is
an integrated representation
of the library, providing
content and tools
to engage with the academic
mission of the university.
It is constructed and maintained
for the benefit of the user.
Value is placed on consumption
of content by the user rather than
production of content by staff.
From consensus to expertise:
rethinking library web governance
you are not alonecc: Izkophoto - https://www.flickr.com/photos/88414062@N08