Genealogy Insider: Intelligent Character Recognition Software

Genealogy Insider: Intelligent Character Recognition Software

A new technology could bring you more searchable genealogy records, faster than ever.

Last fall, Genealogy website Mocavo announced that its handwriting recognition technology was working at 90 to 95 percent accuracy. Cliff Shaw, Mocavo’s CEO, called this the “Holy Grail” of the genealogy industry, “the single largest technological advancement,” except perhaps the internet, to “enable more content to become accessible online.”

Computer recognition of old handwriting is indeed a game changer for an industry that still relies on armies of volunteer indexers to make records searchable online. Even if the technology only semiautomates the indexing process, it’ll unlock billions of records much faster than what’s now possible.

A glance back
Our digital data diet began fewer than 20 years ago. In 1996, Ancestry.com launched and hosted a billion indexed records (not images) within five years. The Church of Jesus Christ of Latter-day Saints began digitizing records in 1998 and launched Family­Search.org the following May to site-crashing response. Within six months, the site held more than 640 million indexed names and had received 1.5 billion hits.

Family history buffs have shown insatiable appetites for online records. To give two examples, Ancestry.com sites attract 2.7 million-plus subscribers with 12 billion-plus records. Family­Search.org adds 33 million images per month to a collection that draws more than 5 million page views per day.

In recent years, optical character recognition (OCR) technology has enabled genealogists to keyword-search county histories, compiled genealogies and other digitized books. OCR also works with historical newspapers, though not as accurately because of poor print quality, varying typeface and other issues.

But the real data bottleneck is handwritten records, which require manual indexing. Hundreds of thousands volunteer through FamilySearch Indexing, Ancestry.com’s World Archives Project and other programs. But at just FamilySearch’s record digitization rate, for every record currently indexed, 12 more await indexing. And that doesn’t touch the millions of digitized images now in browse-only format on FamilySearch.org.

ICR: the next step
Handwriting recognition technology promises to open that logjam with OCR’s little sister, ICR—intelligent character recognition. ICR recognizes handwriting in different languages, despite style variations and even the quirks unique to one person’s writing.

“Several government and academic institutions are investing in [this] new technology … and it’s improving,” says Scott Flinders of FamilySearch. “We’re watching this very closely and are trying to contribute.” Stakeholders that have gone public include Mocavo, A2iA and Brigham Young University.

Will ICR replace the need for human indexers? “The idea here is that handwriting recognition technology will reduce the burden on transcribers initially,” says Mocavo’s chief operating officer, Ryan Hunter. “As we refine the technology, our objective is to reach a point where manual transcription is no longer required for most documents.”

 FamilySearch officials say indexers will be needed for the foreseeable future, but that ICR will make their efforts more productive. In fact, FamilySearch’s recently overhauled indexing system was designed to work with ICR technology as it matures. “We may have the computer index certain fields, like numbers and dates—even guess at more difficult fields like surname and places—and have a human review what the computer has done,” Flinders says. Reviewing the computer’s work will be faster than having an indexer key it in. “As the accuracy improves, maybe there will be less human review needed.”

The genealogical community won’t be the only group to benefit from ICR. “Any organization that has handwritten documents and forms will find this functionality of tremendous value,” Hunter says. Think banks, attorneys, doctors and schools. But the impact on genealogical research alone is conceivably Holy Grail (or at least “Holy Cow!”) scale: We’ll get access to even more digitized records, even faster, even easier.
 

5 Questions with Matt Garner of Mocavo

 
Mocavo.com Chief Scientist Matt Garner talks about his high-tech lab and his search for genealogy’s next big thing.

1. You’re the chief scientist at Mocavo now. What’s your lab like?
My “laboratory” is pretty amazing: a supercomputer, containing over 2,000 high-end CPUs. At the helm, my desk rivals NASA’s mission control. My walls are covered with additional screens displaying up-to-the-minute data, surrounded by oversized white boards containing copious amounts of detailed scribbling from our most recent brainstorm.

2. How did you land in the genealogy industry?
I remember spending full days alone in the Family History Library in Salt Lake City when I was only 9 years old. Every time I have left the family history industry, my heart finds its way back. I’m just as passionate about a document that contains hundreds of names as I am about, say, a handwritten letter that may only relate to a single individual. I know that to someone, somewhere, that document has great value.

3. What historical writing style just about drives you—and the computer—crazy?
Interestingly, it’s modern handwriting that is disastrous. The advent of the typewriter (and subsequently the computer) has lowered the standard of handwriting beyond recognition and utility. Centuries-old handwriting, with a bit of practice, is still largely legible by both man and machine.

4. What do you do when you’re not at your computer?
I pretty much spend all my spare time entertaining my twin 3-year-old daughters, which is undoubtedly the highlight of my day. Other than that, you might run into me at the local home improvement store. I’m always in the middle of two or three DIY projects around the house.

5. You’ve flip-flopped between leading companies and providing brainpower behind the scenes. What role suits you best?
Much to my wife’s chagrin, I think I really am an entrepreneur at heart. I prefer small, nimble teams and am always on the lookout for the next big thing in the industry. 

 
We had a hard time limiting ourselves to just five questions with Garner! Read more of this interview on the Genealogy Insider blog.
 
From the May/June 2014 Family Tree Magazine 

Related Products

No Comments

Leave a Reply