Companies seek improvement: processes, costs, sales, customer experience. This requires knowledge, and above all, data to extract the desired knowledge. Data capture is the process of collecting data to convert it into information for future computer analysis.
Given the sheer volume of information gathered, this can only be managed feasibly and rationally via automated capture. For most companies, this represents an excellent opportunity to improve, as it enables prioritising the important factors for each business, which are present in the communication with customers. This facilitates complex searches, supports strategic decision making, reduces processes, minimises errors, etc.
But how can data-capture solutions add value to a business?
Data capture adds differential value to any company that handles large amounts of information. For instance, an insurance company used to deal with medical certificates may use cognitive data-capture to almost instantly select the individuals that might be included in certain guarantees or discounts.
Typically, the medical companies issuing these certificates are different, as are the templates used. Intelligent data-capture would not rely on templates or how the information is structured or worded. After all, in one template, a person could be said to have certain physical traits in the first paragraph while in a different template, this might be said in paragraph three. Furthermore, one template could use the word “ectomorph” to refer to an athletic morphology, whereas a different template might deploy “asthenic” for an athletic morphology.
The best data-capture is capable of abstracting from these templates, understanding the content and searching for the information the company needs. The data capture could even detect the certificate issue and expiry dates. If expired, this would be detected instantly and you’d be alerted to the problem.
What are the advanced data-capture methods?
There are multiple applications for capturing data that adjust to different information structure types. With NTT DATA’s Dolffia solution, the following modules are available for different types of information capture, making it one of the most complete on the market.
1. Machine learning classification
Machine learning classification is a process of categorising a set of data into classes. A tool for capturing any type of data. The process begins with predicting the data classes. Classes are often referred to as target, label, or categories.
Let’s look at an example to better understand. A large national power company processed more than 300,000 documents of 49 types per year to sign up new customers. A team of people manually reviewed this and verified the information to approve or reject the application. The Dolffia solution allowed them to increase automatic documentation processing by 90%, reducing online customer waiting time and minimising claims by increasing accuracy to 95%.
2. Natural language processing (NLP)
Natural language processing (NLP) is, despite its apparent newness, a discipline with over 50 years of development. Mathematical modelling of different languages creates patterns of communication between machines and people.
This modelling process involves computational linguists “preparing” the linguistic model and computer engineers implementing it with efficient and functional code, generating context and intentionality for further study. The result is the analysis of any data, including opinions, relevant topics, errors, etc. to create search patterns and offer predictive solutions through natural language understanding (NLU) and subsequent natural language generation (NLG), which translates that data into linguistic knowledge.
Natural language understanding (NLU) is the part of information processing that allows reading, interpreting, understanding the meaning, context, and intention of a text. Once the meaning is understood, NLG gives the machine the ability to create autonomous messages from the parsed data. Natural language generation (NLG) is AI’s big challenge. The idea of computers capturing and understanding information not in natural language (e.g., an Excel spreadsheet) and creating complex natural language results that look human-like.
3. Optical character recognition (OCR)
Optical character recognition is a process able to convert a typographic image into a machine-readable text format. Specifically, converting the image into a document with its content as text data.
The OCR system is highly recommended for departments in different vertical sectors that move large amounts of documentation whose management and storage is often time consuming. Namely, printed forms, invoices, scanned legal documents or printed contracts.
Pattern recognition covers fields such as handwriting recognition, which is particularly complex because it involves processing non-typed (non-standard) characters. The same letter, for example, written by two different people, despite having the same value or meaning, can be visually very different.
4. Ontologies
Information systems must deal with an increasing amount of data that is heterogeneous and unstructured or incomplete. To align and complete the data, systems can rely on taxonomies and prior knowledge provided through an ontology. Ontology is the creation of an ecosystem of concepts including hierarchies, classes and ranks and that enables the automatic interlinking of the different elements. For example, when searching for “near me” online.
5. Microservices architecture
Microservices architecture is a method enabling the flexible development of software applications and that works, as its name suggests, within a structure of small autonomous and complementary services. Each service performs specific functionalities that operate independently but, at the same time, communicate with the other services.
This allows smaller, independent parts that communicate with each other through simple interfaces to be analysed for troubleshooting purposes.
For example, payment and order processing can be separated as independent service units. Thanks to this separation, payments will still be accepted even if there are problems with billing.
6. Concept disambiguation
Concept disambiguation is one of the main challenges of natural language processing (NLP) systems. It ranges from lexical and semantic ambiguity to more complex structures, such as metaphors.
From a conceptual standpoint, disambiguation is the process of determining the most probable meaning of a specific phrase.
These are some functions of data-capture applications. In reality, each sector or company doesn’t need all of them, only those that are sector specific or specific to the company. However, speed and progress in data analysis offer a window into the future: what is not needed today may be needed tomorrow. All data collection systems need prior implementation to define the parameters used to analyse the data.
Dolffia: the all-in-one cognitive data-capture solution
Dolffia is NTT DATA's platform for processing unstructured information based on AI, able to understand the information, learn from it and, with these two basic steps, help the person or company in question make decisions.
Dolffia offers different data-capture services adapted to each customer or use case. It’s not a closed solution. It’s a platform with language capabilities that can perform, in automated fashion, any language-based task, thus solving data capture for any company in any sector.
Dolffia offers some of the most versatile software in the data capture and processing market as part of NTT DATA’s Syntphony, a technological ecosystem of ICT solutions that reduce costs and improve the time-to-market of products and services in a wide range of sectors, including healthcare, banking, automotive, and the public sector, among others.
More information here.