High Accuracy Tagging
Enabled automatic analysis and tagging of blood samples with up to 98% confidence, reducing manual effort and errors.
Azati developed a machine learning-powered semantic search engine to improve the accuracy and speed of searches within vast and complex scientific datasets, specifically for a bioinformatics company.
average time to process a search query and return results
time required to retrain neural networks on new datasets
blood samples effectively analyzed and tagged
To develop an intelligent semantic search engine that addresses the inefficiency and inaccuracy of the client’s existing system, eliminating the need for manual tag selection, handling inconsistent descriptions, synonyms, and variations in blood sample data, significantly speeding up search queries from minutes to milliseconds, and providing a scalable solution capable of adapting to new datasets while ensuring relevant results are consistently found.
Blood sample descriptions and manually assigned tags were inconsistent, leading to inaccurate search results. Azati addressed this by cleansing and standardizing the data, training a custom Word2Vec model to understand synonyms and relationships between terms, ensuring the search engine could correctly interpret and match queries despite inconsistencies.
The team faced challenges due to multiple naming conventions and variations in disease names, which hindered precise tagging and search accuracy. Azati solved this by analyzing hundreds of thousands of life sciences documents to build a comprehensive thesaurus and train the Word2Vec model to detect and map synonyms, enabling accurate semantic matching.
The project involved processing a vast number of entries without any pre-labeled sample data for algorithm training. Azati overcame this by leveraging open-source life sciences documents to create a training dataset, developing intelligent matching and query analysis modules, and implementing RESTful microservices with Redis caching for efficient, scalable search performance.
Developed a pluggable module for automatic tagging of blood samples. The module analyzes sample descriptions and assigns tags with a high confidence score (around 98%), enabling accurate semantic searches even on inconsistent data.
Built a module that converts unstructured user queries into structured entities. It extracts sample types, diseases, geography, and other relevant attributes, ensuring that searches match the dataset accurately and completely.
Trained a custom Word2Vec model on life sciences documents to identify synonyms and semantic relationships between terms. This allows the system to match different expressions of the same concept, such as alternative disease names or lab test variations.
Implemented caching for preprocessed samples using Redis, enabling in-memory lookups. Combined with optimized search algorithms, this reduced search query times from several minutes to under 30 milliseconds.
All modules were implemented as RESTful microservices deployed in the cloud, allowing the system to scale horizontally and handle growing datasets without downtime or performance degradation.
Bring your complexity. We'll bring the plan. Tell us about your project and we'll get back within one business day.
Inquire for more infoThis module tags blood samples automatically by analyzing descriptions and related documents, ensuring high-confidence matches even with inconsistent or incomplete data.
Processes unstructured user search queries, extracts relevant entities, and converts them into structured data for accurate semantic matching against the dataset.
Modules are deployed as independent microservices, allowing scalability, easy maintenance, and efficient integration with cloud infrastructure.
Caching and in-memory data storage dramatically reduces query response times and improves system throughput for handling large-scale datasets.
Enabled automatic analysis and tagging of blood samples with up to 98% confidence, reducing manual effort and errors.
Search queries return results in ~27 milliseconds, improving employee productivity and satisfaction.
New datasets can be incorporated in ~3 minutes, allowing the system to adapt quickly to expanding scientific data.
Semantic matching of queries to datasets significantly reduced irrelevant results and enhanced data accessibility for researchers.
Last updated