Pages

Thursday, April 18, 2024

Transcription by Crowdsourcing

Since my last post, I've joined over 40,000 others as a volunteer for the Library of Congress. No, it didn't involve moving, except within my own home. My involvement started after attending an intro session at the Nashua Public Library last week by someone who's been doing this a while. 
The Library of Congress is the oldest federal cultural institution in the U.S. 
Located in Washington, D.C., the Library of Congress is regarded as America's national library, but it's the largest library in the world with a collection of 160 million items — over 37 million books and other printed materials, 14 million photographs, 5.5 million maps, and 3.5 million recordings. Daily, over 10,000 items are added to the collection.  

My volunteer transcribing is done at home by accessing the Library 's By the People Program. It was created in October 2018 (crowd.loc.gov) and its name is from the last line of Abraham Lincoln’s Gettysburg Address: “...government of the people, by the people, for the people, shall not perish from the earth." 
Myself and other volunteers are crowdsourcing to transcribe, review, and tag digitized images of manuscripts and typed materials from the Library's collections. At some point, machines will take over and optical character recognition (OCR) will be able to do much of the job of transcription. But that could be years away.

Crowdsourcing describes the process of obtaining information for a task or project by enlisting the services of many people, paid or unpaid, typically via the internet. In the case of By the People, it invites the public,  nonspecialists and specialists alike, to engage with collections and process information.

To volunteer for By the People it's easy for anyone with an internet connection. You can then transcribe, review, and tag digitized images of manuscripts and typed materials from the Library’s collections. No sign-on account is needed to transcribe. I created one as volunteers with registered accounts can review and tag transcriptions. That, in itself, has been a learning process. As of April 2024, there are 40,000 registered users.  An account also them track contributions on a profile page. 

What is Transcribing?
Typing exactly what's seen on a page to convert the material into a readable text document. Volunteer transcribe (type) digitized images of text materials from the Library’s collections, traveling through history first-hand and gaining new skills – like learning how to analyze primary sources or read cursive, which I have found quite challenging.

How is Transcribing Done?
The Library's By the People team works with a range of technical and curatorial staff across the Library to import digitized items from the main library website (loc.gov) into the crowd.lov.gov website. Volunteers type what's seen in an image, then review transcripts created by others, only if registered for an account. 

By the People is a stand-alone website not directly tied to the Library’s main website. In the past 6 years, virtual volunteers have completed over 500,000 transcriptions to improve search, accessibility and discovery for papers of Theodore Roosevelt, Rosa Parks, Walt Whitman and Susan B. Anthony and more.

The first set of publicly transcribed materials was released in early 2019. As of April 2024, over 421,300 completed transcriptions have been integrated back into the Library's online catalog, making them word-searchable and readable by anyone.

Why is Transcribing Vital?
Since I wondered as well, here's what I learned. Documents used for historical research tell an invaluable story. Transcribing and digitizing historical records helps ensure wider accessibility of crucial items of history enabling anyone to read them and better understand their history.

It seems that computers can't accurately translate without human intervention, so the volunteer transcriptions improve search, readability and access to handwritten and typed documents . Enhanced access provides better readability and keyword searching of documents for everyone.

For example, transcriptions allow universities, research scholars, historians, analysts and others reviewing historical documents to examine past events, looking for context to better understand the impact on modern society.

I learned that anyone using an internet connection can transcribe, review, and tag digitized images of manuscripts and typed materials from the Library collections. It's easy and you don't even need to create an account to transcribe. I created one since those with registered accounts can review and also tag other people's transcriptions. That in itself has been a learning process. An account also lets me track contributions on a profile page. As of April 2024, there are 40,000 registered users. 

The Library has released over 1,056,000 pages for transcription across 41 campaigns; over 780,000 pages have completed transcriptions. There's 19 cataloged, full-text datasets of completed campaigns now available online with more to be done. 
Transcriptions can be done and/or reviewed for these notables on By the People
Current campaigns on the By the People website include: 
  • Clara Barton, Angel of the Battlefield
  • Yours truly, Frederick Douglass
  • To Be Preserved, Correspondence of James A. Garfield
  • Leonard Bernstein, Writings By, From, and To
  • Sheet Music of the Musical Theater
  • American Federation of Labor (A.F.L.) Letters in the Progressive Era
  • My Great Mass of Papers Correspondence of Theodore Roosevelt
  • Walt Whitman, projects devoted to his poetry, letters, speeches, and other writings
  • Woman of the World: Political Thinker Hannah Arendt
  • Herencia: Centuries of Spanish Legal Documents
It takes at least one volunteer to transcribe a page, another to review for completeness and mark it complete. Complex documents can pass through transcription and review several times before being completed, and then published on loc.gov. If you don't complete a transcription, just save it for completion by another volunteer. Save and Save often is important. 

Some Basic Transcription Rules
Text order - Transcribe in the order it appears on the page
Preserve original spelling, grammar, and punctuation, transcribe as seen
Preserve line breaks - Line breaks make it easier for someone to review a transcription
Page breaks - If a word breaks across two pages, transcribe on the first page. 
Illegible or unclear text - transcribe pair of square brackets around a question mark [?]
Blank pages -  Don't transcribe text anything. Check Nothing to Transcribe box and Save.

Accurate transcription of documents is essential. Careful transcription allows a search for specific words or sections of a text for further research collaborations and creates a narrative to the past. Volunteers can pick and choose a document to transcribe (or not). After checking out the Clara Barton collection, I took a pass as there's countless tables of data to transcribe. Yes, there are How-To's for table transcriptions.

Transcribing Can be Challenging and Fun
It's definitely a learning process to explore an era as seen through the document. To preserve the document's intactness, data must be transcribed as in the original record, including dates, abbreviations, names, punctuations and misspellings. Mistakes can distort comprehension. The transcript must mirror the original text including punctuation and spelling
A handwritten document to transcribe (left) and the resulting transcription (right)
To date, I've done some transcribing from the James Garfield campaign. Decoding documents can be challenging and often frustrating due to factors such as handwriting, spellings and abbreviations, obsolete letters and punctuation.

Spencerian Handwriting
Handwriting: Many written documents are in cursive that's very different from today's cursive font. Typical flourishes in the letters of the alphabet can pose a challenge to deciphering the correct letter. Some texts can seem illegible and require a lot of concentration for accurate transcription. If a word or phrase defines understanding, square brackets are used to enclose it.
The Spencerian handwriting example at the right dominated American correspondence until the Palmer method was introduced in the 1920s.

If a word or phrase can't be deciphered, another volunteer may be able to figure it out. And, when you can't read much of a page, save and look for another.

Spellings and Abbreviations: Variations in spelling present an issue, many words were spelled differently years ago. Understanding them in the right context is essential. Spellings are always transcribed as-is. Abbreviations are copied as written and not interpreted.

Obsolete Letters and Punctuation: Historical records often use words that are obsolete today. Punctuation like long dashes and tildes were used differently in the past and can be hard to understand in the right context.

Transcribing historical documents requires adherence to best practices. Before starting, I read and made copies of How-Tos on the By the People website. This took time, but they were very helpful, especially when looking at some completed transcriptions, it appeared that some guidelines were not followed by all volunteers.

Interested in Becoming a By the People Volunteer?
There's still plenty of documents awaiting transcription, review or tagging. Some 138,000+ pages have transcriptions currently awaiting a reviewer to check for completeness. Here's how to start helping to make historical documents more available, follow this link or copy and paste into your browser to get started: http://crowd.loc.gov

Before attending the library information session, I had no idea about this process which , so far, has been a very interesting experience. Other sites also where volunteers can transcribe documents include The Smithsonian and The National Archives.

20 comments:

Barbara Rogers said...

This sounds like a great volunteer opportunity for folks who are home looking at computers much of the time. Or just dedicating and hour here or there to this. So glad to hear about it, and I'll definitely mention it in some group conversations with seniors. Not all of my friends have computers, but a few do.

Rita said...

Sounds fascinating! I couldn't do anything like that anymore for various physical reasons. I would have loved to years ago, but years ago I didn't have internet--lol! Wouldn't have been available back when I was available. ;) Enjoy!

Tom said...

...thanks for this info, it's news for me.

Emma Springfield said...

It sounds like a fun and worthwhile project. My typing skills were poor to begin with. Now my ability to control my fingers is difficult. Have a good time.

photowannabe said...

Totally fascinating and right up your "alley".
I don't think its for me but I'm grateful that someone does this.
Looking at old handwritten documents can really be challenging.
Kudos to you!!!
Sue

Ludwig said...

Congratulations for participating in such a wonderful endeavor.

Salty Pumpkin Studio said...


Wow! That is fascinating!

Cataloguing books, I often use the LOC to research. The transcriptions are so important for many reasons, mostly because what it written, is part of the history of who or whatever is being researched.

Ginny Hartzler said...

Good for you! A relative of ours is a transcriber for a hospital. Do you know how long this project will last? Seems like years for sure.

My name is Erika. said...

That sounds like an interesting activity. I don't think it would be for me, but I think it's a very valuable thing to do. Have fun, or maybe I should say happy reading and transcribing. hugs-Erika

Pamela M. Steiner said...

That sounds like an interesting hobby! Thank you for being willing to do this! With your interest in historical facts, etc., I believe you are a wonderful candidate for this endeavor. I don't know that I would be able to sit and do that for very long with my back and neck and arm issues, but I'm glad there are so many volunteers out there who are willing and able to do this. Wonderful information I never knew about! Thank you for sharing it with us.

Red said...

That's a very interesting project to be involved in.

Veronica Lee said...

Bravo for being part of such a fantastic initiative, Dorothy!

Hugs and blessings

MARY G said...

Fascinating and vital, if our history is not to disappear. Or at least warp. Good for you. I wish Canada would do something similar. But ... maybe they do and I have just never been lucky enough to find out.
Good for you!
Jealous. Or I would be if I were not stuck writing advertising copy just now.

Bijoux said...

I think I originally heard about this from all the genealogy work I’ve done. Reading the old script is extremely difficult. Thanks for volunteering!

MadSnapper said...

a worthy project, but knowing that not everyone should be transcribing I am wondering how accurate the transcribed is. looking at the cursive for instance, who knows what someone say it says... and what if someone wanted to mess it up, but making it say what they want it to say. do they have a verification system?
that said, this is not something I would do but it does need to be done.

Jeanie said...

I heard about this on CBS Sunday Morning and thought how when I'm done with my book, this might be something I'd like to explore. I've done a lot of transcrition from video interviews (where you put in every "uh" and "um) and I think I might be quite good at it -- or at least good enough to give it a try and see if my work makes the cut. You are a natural for this project with your passion for history.

When life chills down a bit, I might pick your brain a bit more on this project!

Anvilcloud said...

I had no idea, and I don't think many have. It sounds like a good task for you, for you enjoy research and detail.

David said...

Beatrice, Wow, you are much more dedicated than I am. I lack the patience for transcribing but I certainly do appreciate those who take the time and make the effort to preserve the past. I was unaware of this program so I did learn something new. Take Care, Big Daddy Dave

Rob Lenihan said...

What a fabulous project!.

baili said...

wow this is truly an amazing process for those who have time and interest dear Dorothy

i am shocked to learn how much library offers wow

thanks for explaining the transcribing method . sounds like opening the data and it's details to read .

next thing that sounds great that volunteer can access many serious govt stuff online . seems quite a facility for public and can be helpful for them in many ways
thanks for very nice post my friend
blessings