Q. Where does my time go?

A. Not blogging.

In the past month or so, I’ve been to two incredible conferences, started a new project for DLS, and almost finished my last one.  So why can’t I just sit down and write about it already?  Every time I get on the computer, I get sucked into reading blogs (professional and trashy, but mostly trashy) watching YouTube, or wasting time on Facebook.  Yes, it’s summer, but I’d hoped to be a little more productive than this.  So, the other day I signed up for RescueTime, a free application that tracks your computer usage by website or program. You can tag websites you frequent to see just how much time per day/week/month, etc. you spend on school, work, scrabulous, whatever. I’m always on the lookout for promising apps to help me manage my information online, but I haven’t found many that manage my time for me.  Hopefully after a week of seeing how I really spend my time online, my guilt will write my conference reports for me…

rescue me

(weeks 12, etc., etc. and beyond)

Well, this project isn’t over yet, but since the semester is, I should probably sum up a little before my summer blogging hiatus. Status on the Chautauqua migration is the same, still waiting for a solution to our software problems before we can start uploading the brochures. Hopefully either this summer or the newest version of ContentDM will bring us a remedy. Last week I gave a presentation on my project experience to SLIS faculty and the project mentors along with the other nine digital fellows. Because I’m now in full summer mode, and I just don’t have the energy to write up a decent summary of the semester, here’s a link to my powerpoint presentation. Maybe I’ll annotate it later on, maybe not. I personally enjoy viewing graphics-heavy online powerpoints and making up my own narrative to go along…

This summer, in between taking a class and working for Digital Library Services, I’ll be attending two conferences, the Digital Humanities Summer Institute in Victoria, BC in May, and The International Conference in Electronic Publishing (ELPUB) in Toronto in June. I’m also planning to use the web skills I learned this semester to renovate my homepage and this blog sometime this summer, and if something exciting happens or if I feel so inclined, I may just post a little update or two.

(weeks 9,10,11)

Progress on the Chautauqua collection migration has slowed to a screeching halt, as we’ve encountered major problems batch uploading into ContentDM on a larger scale. I discovered early on in the migrations that uploading more than 50 compound objects with metadata, images, and associated full-text transcription files, all at one time will crash the program. Over the course of about a week Mark, Wendy, and I tried many different combinations of batch uploads–some without metadata, some without text files, some to different ContentDM collections, etc.–to try to narrow down the problem. I’ve realized that I don’t really like this aspect of carrying out a project. (Who does?) It would be so much easier to just pass it off to ITS or ContentDM. But, alas, it’s part of the process, and there’s probably some great learning experience in there somewhere. So, after trying everything, we think we’ve identified the full-text transcription files as the culprit. Having a better grasp on the problem, we’ve sent it out to ContentDM support for help. (One of the perks of proprietary software…)

In the meantime, I’ve been scanning in and editing photos in for the Iowa Women’s Archives UI Women’s Physical Education collection. It’s been a nice break doing semi-mindless work and working with physical artifacts again. Hopefully this coming week will bring a solution to our problem, so we can get migration back into full swing.

(week 8)

With the metadata now finished, we are so close to having the fully migrated test batch ready for Special Collections’ approval. This past week was spent fine-tuning the metadata and image display with Jen and Mark and test migrating about 25 of the brochures into ContentDM. Later in the week we met with John O. in ITS to discuss the feasibility of renaming and stripping the TEI markup from the full-text files for import into the transcription field in the metadata. John wrote up a quick script and got back to us mere hours later with a few sample files for us to play around with. This week I’ll try uploading the images, metadata, and transcriptions all together into ContentDM. The limitations of this software, however, are really starting to get to me. For instance, to add an image rights band to the bottom of an uploaded image, ContentDM will only let you choose a standard font size rather than allowing you to set font size relative to image size. When batch uploading multiple objects of different sizes, some images end up with excessively large banding text while others have tiny illegible banding text. We discussed resizing all of the images to make the banding text more uniform, but decided against it in the end. I personally feel that it’s a bad idea in the long run to change our digital objects just to conform to immature technology.

I’ve also been thinking a lot about the issues of permanence involved in conducting a migration like this. In my research for a paper I’m writing for my electronic publishing class, I’ve been reading a lot about institutional repositories and the need to provide permanent access (read: URLs) to digital objects in order to guarantee authoritativeness to the sources and support the credibility of the researcher referencing those objects in their scholarly work. How will we guarantee this permanent access for researchers who have already linked to brochures in the old system? Will we leave the old system in place? Will we redirect the URLs? If we redirect, will we redirect to the object referenced or just to the new homepage of the digital collection? These are questions I will have to bring up in the weeks ahead.

(weeks 6,7)

Lesson learned: metadata manipulation will always take you longer than planned. After another two weeks of subject metadata massaging and reformatting, the bulk of our metadata is finally ready for the migration into ContentDM. Why has this taken so long? Answer: Humans rule, computers drool. While much of our metadata reformatting could be automated to some extent, subject metadata is finicky and requires human brains to manipulate it semantically. Our original subject metadata was a mishmash of Library of Congress and locally created subject headings with a little LCTGM (Library of Congress Thesaurus for Graphic Materials) thrown in. Headings and LCSH/local subdivisions were combined syntactically into ‘heading — subdivision’ strings a la LCSH rules (ex. ‘Puppet theater — History and criticism’). This method of cataloging was good practice eight years ago when the collection was first digitized, but as digital libraries have matured, it’s become apparent that subject metadata can give us more precise search results when it’s split out into its narrower subject types. So, the decision was made to include ‘Geographic subject,’ ‘Personal name subject,’ ‘Corporate name subject,’ ‘Chronological subject,’ ‘LCSH,’ and ‘LCTGM’ to our metadata (all mapped to Dublin Core ‘Subject’) and weed through our original subject headings to populate these fields. It was also important to preserve this ‘legacy’ cataloging, so the original subject heading strings are being left mostly as-is in a field called ‘Local subject.’ Creating these new fields will also help future cataloging efforts when new Chautauqua material is added to the digital collection, as the controlled vocabularies will already be in place. The major time suck occurred when I had to filter through all of these subject headings to find the geographic headings and manually check every unique heading against the current LCSH authorities list. But seeing those super-rich metadata fields in ContentDM? Totally worth it, right?

Okay, so I wasn’t being entirely truthful when I said I spent two weeks on this metadata. For a few days while our metadata decisions were stuck in workflow, I checked out the TEI markup that was done for each brochure. One of the requirements for inclusion in the Library of Congress American Memory project was that text of all items be marked up in TEI-lite. Now I do love text encoding (in fact, I just found out I get to go here this summer to learn some more TEI), but LC’s decision to require their American Memory partners to do this may have been a bit premature and ambitious, not to mention an unnecessary expenditure for the participating institutions (and essentially, the American taxpayers). I wonder if LC ever did anything with these TEI files because, looking at the markup, I can’t really see potential for any useful text mining. Seeing how much time and money went into the manual keying (the brochures’ presentation did not allow for accurate OCR) and text encoding, I’d be interested to find out if any participating institutions did anything with their encoded files. Anyone?

We will, however, be looking this week at the possibility of mining our TEI files to get full-text transcriptions into the ContentDM metadata. I’ll be excited if this works out, since I hate to see investments go to waste. I’ll also be importing the controlled vocabularies into ContentDM in preparation for the migration. If all goes smoothly, we might be ready for the test migration batch by the end of the week.

(weeks 3, 4, 5)

It appears I have some catching up to do. Unfortunately, I don’t have much time for that. So, this installment will be short and sweet and told through screenshots.

After the public launch of the African American Women Students at the University of Iowa collection, I was surpised at all of the press it received, including “featured digital library” in American Libraries Direct.

aldirect

Two weekends ago, I attended an intensive TEI XML (text-encoding) workshop at the University of Illinois Urbana-Champaign. Here’s a screenshot of the Dorothy Parker poem I worked on.
tei

I was very intriuged by the idea of “personography”–adding biographical information to the TEI “header” about any persons referenced in the text. I’m hoping to learn a lot more about TEI, since this workshop, while very informative, gave me only a basic grasp of TEI’s potential.

The metadata is unfortunately still under construction, but almost completed.

metadata

This week we’ll hopefully finish up with the subject fields and have the metadata ready for migration.

(week 2)

This week was almost entirely devoted to metadata, as Wendy had me investigate the pros and cons of the various files from which we could harvest. Searching through the old files, I found 3 different tab-delimited .txt files and the XML file that was required by Library of Congress (LC) for the American Memory project. After identifying and examining the different text formats for each of the files, I met with Mark A. to get a better idea of what our final metadata record would need to look like for import into ContentDM. After sharing a summary of my findings for Wendy, we decided to go with the pre-XML-marked-up Library of Congress text file because it contained both: a directory path as unique identifier for each record, and the full text of each brochure (minus stopwords– more about this in a later post). Later in the week, we met with metadata librarian, Jen W. to talk about which metadata fields we would need to add or update to keep consistent with Iowa Digital Library standards and also the Redpath Chautauqua audio collection (in progress). Since the original metadata records were designed for a library catalog, we’ll have to edit some punctuation and split out some fields in Excel before converting the file for import. The trick will be finding a way to automate as many steps as possible, since manual edits to 8000 records is not really how I’d like to spend my semester.

Next week I’ll continue reformatting the metadata and hopefully by Friday have a file ready for import. We are also dealing with the issue of file renaming. While IT is searching for an easy method of renaming our 28,000 image files, we are trying to determine the most logical naming scheme. Jill, who is working on the migration of the Dada collection, is having the same issues and wrote up a great report on current standards and best practices. While it is definitely important to have consistent file naming schemes within a large repository, there are times when it is necessary to grandfather in an older scheme. In our case, the directory path of each digitized brochure reflects the shelf location of the physical object in Special Collections. In fact, when Special Collections receives queries from LC American Memory users, they can use the URL of the online brochure to quickly find the item on the shelf. When the collection is moved to ContentDM, the only part of the directory path visible to the user will be the file name, so it seems likely that the new file names will be concatenations of the old file path For example, “abbott/2/1.jpg” (performer Abbott, second brochure, page 1) will become something like, “abbott/2/abbott0201.jpg.

Changing the subject, this week also saw the launch of my previous semester’s project, “African American Women Students and the University of Iowa: 1910-1960.” You can read the press release here.

(week 1)

As I mentioned in my last post, I’ll be working on migrating the “Traveling Culture: Circuit Chautauqua in the Twentieth Century” collection, which is currently delivered through the Library of Congress American Memory site, to the Iowa Digital Library (IDL). The collection will still be searchable at the Library of Congress site, but users will be redirected to an IDL ContentDM site rather than the current static HTML pages housed on the UI Libraries website. The main benefits of this migration will be increased search and browse capabilities, and better image navigation. Another major benefit will be that all of the objects in the digital collection will be also be searchable by metadata and full text through the federated Smart Search (Ex Libris’ Primo). Not only is this an important step in bringing the Libraries’ varied collections together under one umbrella, it is a first step towards providing wider and enhanced access to Special Collections’ incredible Redpath Chautauqua collection. Further addition of audio recordings, programs, photos, and postcards to the digital collection combined with EAD-enhanced finding aids will allow researchers to draw new connections between artifacts and subjects, a task that is difficult and time-comsuming when working with only a finding aid and an immense physical collection. (See fellows Joanna’s and Jane’s blogs for updates on the UI Libraries’ EAD finding aid project.)

Working with Wendy in Digital Library Services (DLS) as my mentor, we decided that I should first familiarize myself with the original digitization project carried out in 1998-2000 and the associated image and metadata files. We also had an initial project planning meeting with DLS and Sid and Kathy from Special Collections. In planning for the migration, we are beginning to identify issues and potential problems that will need to be addressed, such as file renaming and consistent image display, much of which will be discussed here in the weeks to come. It’s immediately apparent that a “measure twice, cut once” approach will be imperative to a successful migration. When dealing with objects numbering in the thousands (and files in the tens of thousands), any extra steps added to the process will cost us a lot of time, time that could be spent planning out workflows for digitizing more of the physical collection.

Next week will be spent creating a few compound object prototypes in ContentDM. Because the brochures in the collection vary in size and layout, we will need to find either the settings that will be optimal for all of the objects, or identify a method of arranging objects of similar dimensions for batch uploads. I also plan to spend a good deal of time poring over the ContentDM help files and picking Mark’s brain for ContentDM wisdom (sorry, Mark) in order to learn the best ways of manipulating the software.

Fall 2007 — epilogue, preview

Well, after working over winter break, scrambling to get this thing finished, the site is finally up. The “African American Women Students at The University of Iowa: 1910-1960″ digital collection, to be launched in February, can be found here.

Spring semester I’ll be working on the migration of a UI legacy collection, “Traveling Culture: Circuit Chautauqua in the Twentieth Century,” which is currently delivered through the Library of Congress American Memory site. This project will involve migrating approximately 8,000 digitized publicity brochures, mostly compound, and their associated metadata to ContentDM. I’ll also be setting up workflows for digitizing more of the physical collection. This quite a switch from my first semester assignment, but I’m super excited to learn some new skills and to work on a project at a completely different level of granularity.

(week 14) the death of perfect

I spent way more time scanning microfilmed newspaper articles this week than I had intended. It was a bit of a struggle–part of me kept wanting to call it “good enough” and quit with what I had found, while the other part, knowing this newspaper may not be digitized for a very long time, wanted to be thorough and get every relevant article from the entire two reels. All of me was thinking the entire time, “This is NOT the way to aggregate content…

In and outside of my courses, I’ve been reading so much about “the future”–the web as an (open) database, small apps and dynamic webpages that harvest and display data to your liking, etc., that it’s been difficult trying to reconcile possibility with reality. In the ideal situation, the Bystander newspaper would be digitized, full-text searchable, maybe even a little more machine-readable (some automated TEI to distinguish headlines from text, etc.). Instead of spending hours visually scanning, changing out lenses, focusing and refocusing, I type in a few search terms and get what I’m looking for in a few minutes. Cut and paste into new image for display on boutique website. Link back to original digital object for those interested in context. Done. Granted, there’s going to be a certain amount of noise and lack of precision involved in the search, but I know my (human) precision isn’t perfect when I’ve been staring at the microfilm reader screen for hours, getting distracted by headlines like “Man Has Unusual Melon” and “Prize Healthy Baby Pageant.” There’s also the question of quality–mass digitization is not going to give you the great results you’ll get from me painstakingly adjusting the brightness and shadow removal for every article (you’re welcome). But, I’m starting to be swayed into the “good enough” camp where access trumps quality. I think there’s some saying about not letting the perfect be the enemy of the good? Enemy of something, anyway. Yeah, that about sums it up.

I meant to write a little about the website progress (which is also becoming a bit of a content aggregation nightmare), but I think I rambled on too long about dead media. So, here’s a sneakpeek at one of the website subpages. All “related artifacts” and proper names will link out to their respective digital objects in ContentDM. I’ll let you guess at what the problem is.

Next week I need to get a good variety of content uploaded to ContentDM and linked to my website for a functional demo. (The oral histories from Iowa Women’s Archives have finally been digitized, so I’d like to get them excerpted and uploaded in time.) I also have to start reflecting on what I’ve learned and achieved on this project for the IMLS fellow project presentations the following week.

(And in case you were wondering, my obsessive, thorough, perfectionist side won out and I now have 75 Bystander articles waiting for metadata…)

Next »