At Upgrade 2024, NTT Corporation has shown the latest update to tsuzumi, a proprietary lightweight LLM. tsuzumi, is now capable of understanding visual elements within documents such as graphs, charts and layouts. The technology is currently undergoing tests and is expected to be publicly available later in 2024.

Kyosuke Nishida, senior distinguished researcher, NTT (Image Credit: LinkedIn)
Kyosuke Nishida, senior distinguished researcher, NTT

Kyosuke Nishida, a senior distinguished researcher at NTT, said, “LLMs have become capable of handling high-level natural language processing tasks with high accuracy, and multimodal models, including those that integrate vision and language, are beginning to emerge.

“However, there remain significant challenges in comprehending documents or computer screens that contain both text and visual information, such as charts and tables.”

Why is this important?

While LLMs have become adept at consuming and interpreting large amounts of text, much data is stored graphically. It may be something as simple as an organisational chart where the image makes it easy to understand. Another example is a visual example of how to undertake a task, where the visualisation makes it easier for a reader to understand what is required.

Extracting that data has, so far, been less than effective for many of the LLM solutions out there.  To solve this challenge, NTT worked with Professor Jun Suzuki of Tohoku University’s Center for Data-driven Science and Artificial Intelligence. Along with NTT researchers, Suzuki developed a technique that allows tsuzumi to look at a page as if it were looking through a human eye.

It is not just images that tsuzumi can read. NTT has incorporated handwriting recognition technology that allows it to read written notes. There are many areas where handwriting is still common, such as healthcare, law enforcement and education. In all of these areas, there is a need to consume documents as part of the knowledge that an LLM possesses.

When tested against 12 different visual document understanding tasks, including answering questions, extracting information and classifying documents based on human-written instructions, NTT’s model outperformed the open-sourced multimodal LLM LLaVA as well as OpenAI’s GPT-3.5 and GPT-4.

When approaching the problem, the researchers considered four primary use cases. These included:

  • Customer experience solutions including call center automation;
  • Employee experience solutions for tasks involving manual searching and reporting, including electronic medical recordkeeping in the healthcare industry;
  • Transforming the value chain for industries including life sciences and manufacturing;
  • Software engineering for systems and IT departments, including development and coding assistance and automation.

Making tsuzumi a better subject matter expert

NTT positions tsuzumi as an energy-efficient and low-cost LLM that is available in two versions. The first is an ultra-lightweight version with 600 million parameters. The second, a lightweight version, supports 7 billion parameters. Having the ability to consume all types of content is, therefore, essential.

The models’ size makes them more cost-effective for businesses as they require fewer resources to run. This is a key factor for NTT. While other vendors are focused on large LLMs, NTT sees tsuzumi as a subject matter expert. That means each LLM is built to address a specific area, so small and fast is important. It also makes it faster to train, fine-tune and expand the LLM as more data becomes available.

It also sees organisations as deploying multiple instances of tsuzumi. Each instance would be a different SME, and when brought together, NTT talks about a constellation of LLMs.

Targeting the subject matter expert role is not without its challenges. Take the example of building an LLM as a product expert. Only so much is held in product manuals, which can contain a lot of graphics. Other data will be held inside FAQs and help desk systems. However, some of the more valuable data is held in the heads of experts and practitioners of the technology.

Extracting that knowledge has been an issue for decades. The problem has always been how to engage individuals and get that data from their heads into a machine-readable solution. The ability to read any form of documentation would appear to give tsuzumi an opportunity. Consuming everything from text to handwriting to graphs and diagrams can take the widest data set yet.

Because the LLM is small and heavily focused, it lends itself to easier fine-tuning by the human expert. It also reduces the risk of hallucinations or other problematical artefacts being introduced into the model.

Enterprise Times: What does this mean?

NTT has been keeping much of its LLM work under wraps. It wasn’t until November 2023 that it released the first details of tsuzumi. Since then, the company has been looking at the best way to deploy the technology.

It has now settled on the highly focused idea of a subject matter expert. Importantly, it has also built the communication details that allow the various instances of tsuzumi inside an organisation to work together. It calls this collection of LLMs a constellation, and it opens the door to further specialising LLMs, creating an AI-driven knowledge network.

It will be interesting to see how customers react to this. At present, few are building their own highly focused LLMs, looking at the knowledge contained in their organisation. Now, they have a technology that enables that and ensures that they don’t lose knowledge as people leave.

There is also the question of how vendors react to this. There are vendors out there building LLMs around their software products. Most of those are just faster help files, and the data is drawn from manuals and help desk experiences. What they lack is practical knowledge and experience.

Will NTT start contacting them to use tsuzumi and build LLMs for its customer’s constellations? Additionally, when will we see NTT product teams use the technology to do exactly that and provide tsuzumi models to customers?


Please enter your comment!
Please enter your name here