ACM: Natural Language Processing, Computer Vision and ML/AI Platform
Internal Revenue Service
The IRS is streamlining how it manages data and interacts with taxpayers through a computer vision, artificial intelligence and natural language-processing platform that standardizes scanned document images and extracts relevant data.
Built with open-source technologies, the Appeals Case Memorandum (ACM) project locates key areas in documents — right now, tables — to normalize dimensions, remove white space, correct rotation, identify revisions and isolate relevant text fields for optical character recognition. The data can help document the agency’s findings on a particular appeals case, among other uses.
Officials in “the division within IRS that deals with our large corporate taxpayers...were interested in understanding how some of their cases resolve in the appeals process,
and for them to manually go through some of those ACM documents was just too much of a manual task,” said Ron Hodge II, a senior manager in the IRS’ Research, Applied Analytics and Statistics Division. “What this would allow folks to do is have a more comprehensive understanding of what happened to their case from end to end versus losing visibility once it went into the appeal function.”
For context, the IRS estimates that it received more than 120 million pages of correspondence from 2010 to 2015,
which required about 31,000 full-time employees to process and about 8 billion hours of taxpayer time.
“Out of thousands of cases and tens of thousands of pages within those cases, we’ve been able to extract seven years of these tables and actually put [the information] in
a centralized location” so that users can efficiently interact with the data, Hodge said.
He likens the process to the way a camera zeroes in on a face when taking a photo. “What finds that person’s face is actually an object-detection algorithm,” Hodge said. “For our particular use case, we were interested in [finding] a table... embedded within text.” Although a table isn’t a specific object, it has a specific structure of rows and columns.
Most recently, the ACM technology has helped the IRS comply with the Coronavirus Aid, Relief and Economic Security Act. The law introduced new machine-readable tax forms related to business credits, and the agency needed a way to efficiently extract data from them.
“We don’t have to rebuild it from scratch because we have enough of the capability built out,” Hodge said. “Now it’s just adding some additional tweaks at the margin to help users in different use cases.”
Three business units use the technology now, but it’s applicable agencywide, he added.
