Multitasking AI enables fastest cancer data retrieval yet
A new, multi-task convolutional neural network is able to extract data from cancer pathology reports multiple times faster than the single-task, deep-learning models or manual data extraction currently in use.
Researchers at Oak Ridge National Laboratory (TN, USA) have developed a multi-task learning convolutional neural network to extract data from cancer pathology reports – an artificial intelligence tool capable of extracting multiple characteristics from the complex and extensive information included in pathology reports.
The project could benefit research which utilizes digital cancer registries to find trends in cancer diagnoses and patients’ responses to treatment. This is helpful to policy and funding bodies, providing advice on the best ways to treat cancer, the success of strategic changes and areas in need of improvement.
Data from cancer pathology reports is not only complex but varies greatly in notation and language, meaning highly trained coders must manually interpret data from the large volume of reports. Single-task convolutional neural networks have demonstrated success in learning and reporting information, but they are unable to process more than one characteristic – this means that while one network may be able to identify information about the location of a cancer, a separate network would need to extract information on behavior or histology, resulting in a time-consuming process.
The researchers developed a multi-task convolutional neural network, similar to the existing single-task tools, using word embedding to process words as vectors; the vectors appear closer to each other based on the words’ relationships in natural language. By filtering the inputted text through multiple parameters, the information is sorted, and connections established which are strengthened and developed as more data are processed.
The network was tested against single-task networks to extract the same data types: primary site, laterality, behavior, histological type and histological grade. The multi-task network was able to process the information at a similar rate to the five single-task tools, meaning the multitask network operated five times faster.
The study’s lead author, Mohammed Alawad (Oak Ridge National Laboratory) pointed out: “It’s not so much that it’s five times as fast. It’s that it’s n-times as fast. If we had n different tasks, then it would take one-nth of the time per task.”
The authors noted some limitations of their work. The multi-task convolutional network is only able to learn from connections between five words, meaning linguistic relationships across longer distances are not useable. Also, the network does not establish correlations between tasks and will not learn from lateral associations such as histological types that do not occur in some cancer sites. This could lead to predictions of ‘unallowable’ combinations.
Overall, the research offers a promising advancement in artificial intelligence tools to modernize cancer surveillance and provide real-time cancer reporting.
Alawad M, Gao S, Qiu JX et al. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks. J. Am. Med. Inform. Assoc. 27(1): 89–98; (2020);