Computer recognition of old handwriting is indeed a game changer for an industry that still relies on armies of volunteer indexers to make records searchable online. Even if the technology only semiautomates the indexing process, it’ll unlock billions of records much faster than what’s now possible.
A glance back
Our digital data diet began fewer than 20 years ago. In 1996, Ancestry.com launched and hosted a billion indexed records (not images) within five years. The Church of Jesus Christ of Latter-day Saints began digitizing records in 1998 and launched FamilySearch.org the following May to site-crashing response. Within six months, the site held more than 640 million indexed names and had received 1.5 billion hits.
In recent years, optical character recognition (OCR) technology has enabled genealogists to keyword-search county histories, compiled genealogies and other digitized books. OCR also works with historical newspapers, though not as accurately because of poor print quality, varying typeface and other issues.
But the real data bottleneck is handwritten records, which require manual indexing. Hundreds of thousands volunteer through FamilySearch Indexing, Ancestry.com’s World Archives Project and other programs. But at just FamilySearch’s record digitization rate, for every record currently indexed, 12 more await indexing. And that doesn’t touch the millions of digitized images now in browse-only format on FamilySearch.org.
ICR: the next step
Handwriting recognition technology promises to open that logjam with OCR’s little sister, ICR—intelligent character recognition. ICR recognizes handwriting in different languages, despite style variations and even the quirks unique to one person’s writing.
“Several government and academic institutions are investing in [this] new technology … and it’s improving,” says Scott Flinders of FamilySearch. “We’re watching this very closely and are trying to contribute.” Stakeholders that have gone public include Mocavo, A2iA and Brigham Young University.
Will ICR replace the need for human indexers? “The idea here is that handwriting recognition technology will reduce the burden on transcribers initially,” says Mocavo’s chief operating officer, Ryan Hunter. “As we refine the technology, our objective is to reach a point where manual transcription is no longer required for most documents.”
FamilySearch officials say indexers will be needed for the foreseeable future, but that ICR will make their efforts more productive. In fact, FamilySearch’s recently overhauled indexing system was designed to work with ICR technology as it matures. “We may have the computer index certain fields, like numbers and dates—even guess at more difficult fields like surname and places—and have a human review what the computer has done,” Flinders says. Reviewing the computer’s work will be faster than having an indexer key it in. “As the accuracy improves, maybe there will be less human review needed.”
5 Questions with Matt Garner of Mocavo
1. You’re the chief scientist at Mocavo now. What’s your lab like?
My “laboratory” is pretty amazing: a supercomputer, containing over 2,000 high-end CPUs. At the helm, my desk rivals NASA’s mission control. My walls are covered with additional screens displaying up-to-the-minute data, surrounded by oversized white boards containing copious amounts of detailed scribbling from our most recent brainstorm.
2. How did you land in the genealogy industry?
I remember spending full days alone in the Family History Library in Salt Lake City when I was only 9 years old. Every time I have left the family history industry, my heart finds its way back. I’m just as passionate about a document that contains hundreds of names as I am about, say, a handwritten letter that may only relate to a single individual. I know that to someone, somewhere, that document has great value.
3. What historical writing style just about drives you—and the computer—crazy?
Interestingly, it’s modern handwriting that is disastrous. The advent of the typewriter (and subsequently the computer) has lowered the standard of handwriting beyond recognition and utility. Centuries-old handwriting, with a bit of practice, is still largely legible by both man and machine.
4. What do you do when you’re not at your computer?
I pretty much spend all my spare time entertaining my twin 3-year-old daughters, which is undoubtedly the highlight of my day. Other than that, you might run into me at the local home improvement store. I’m always in the middle of two or three DIY projects around the house.
5. You’ve flip-flopped between leading companies and providing brainpower behind the scenes. What role suits you best?
Much to my wife’s chagrin, I think I really am an entrepreneur at heart. I prefer small, nimble teams and am always on the lookout for the next big thing in the industry.