Machine Learning Algorithm for Healthcare
A team of data scientists developed a Machine Learning-based system to enable Natural Language Processing in Healthcare. The system processes patient notes and predicts which category the diagnosis falls into.
Our Client is a software development company creating analytics solutions for businesses based on cutting-edge technologies. The company’s cloud-based software solutions are primarily developed using open source technologies that are platform-agnostic. Based in California, the company was founded by a group of technology veterans who have been working on developing forward-thinking software technologies in Silicon Valley for over a decade.
The Client turned to Waverley with a request to develop a Proof of Concept solution to process medical data. Based on Machine Learning and Natural Language Processing algorithms, the model would process patient notes and identify symptoms to predict which group of diseases (Psychologic, Neurologic, Musculoskeletal, etc.) the diagnosis might belong to. Per the client’s request, the team had to develop a solution and provide detailed documentation, to make sure the client’s in-house team can easily reproduce the same solution following instructions and receive the same results.
Waverley engaged a team of data scientists for the project. The team proceeded with the research, exploring various options on how to fulfill the goal until they found the optimal solution. Before entering the data into the model, they need to be processed:
- the input data undergoes a filtering process
- with the help of Google Books Ngram Dataset the term frequency score for every word is calculated to single out meaningful words
- using the Universal Sentence Encoder, word embeddings are generated
- the system transforms the data into indices and then back with the help of a special vocabulary
As a result, the system is capable of recognizing input data, determining the possible subtree, assigning it to the relevant model, which then makes a prediction as to what is the possible category of the data. The average accuracy obtained was 95%.
Machine Learning Models
Due to the vast amount of data (around 870 000 entries), our ML engineers developed 29 models to make predictions. One additional model called the Classifier was added to determine which subtree the input text belongs to, and subsequently which model should be called to work with the data. This helped to optimize the prediction process since it was impossible to fit all data entries into the storage of one model.
To increase the processing speed, the team tried running the models on multiple GPUs simultaneously, also using Tensorflow serving for predictions. This helped to streamline the predictions while maintaining the same level of accuracy.
The team in Ukraine was in constant communication with the client in the US. The team had weekly calls to discuss development issues and further steps. Engineers in Ukraine also advised the client’s in-house developers on the issues encountered during the simultaneous system reproduction. Since detailed documentation was required on the project, Waverley provided a Technical Writer to work on an on-demand basis.
The team developed a working model, optimized the processing speed and now the model is being tested on real patient data. The next steps are to create a more flexible system, fully automate data labeling and ensure the highest prediction accuracy. After that, the team will develop a user interface and introduce the solution to be used in real hospitals.