Alzheimer’s Diagnosis through Audio Analysis
Supervised by Professor Xiang Fan, I wrote a paper as a first author.
We constructed an end-to-end AD vs. non-AD binary classification model based on speech, achieving cross-language AD diagnosis for the first time: training the model on an English public database and achieving good results on a local Chinese database. We were pioneers in using large language models and translation techniques to enhance the model’s capabilities and address cross-language issues. The entire model consists of four components: ASR (Whisper), translation (GLM-4), embedding (Embedding-3), and classifier (common models like XGBoost and our custom neural network). Throughout this process, extensive experiments were conducted to improve performance, resulting in excellent outcomes.
I led the entire project, including literature review, direction selection, model construction, experimental design and execution, and paper writing.
What I get
During this period, I significantly enhanced my literature search and reading skills, organized my code better, and systematically learned about academic writing, greatly improving my writing abilities.
This also made me realize the importance of interdisciplinary collaboration: Professor Xiang Fan, as an expert in clinical care, can inform me about the pain points of cutting-edge models and the real clinical needs, which greatly aids in our model construction and the selection of research priorities. Meanwhile, my model can conveniently assist them in pre-screening potential patients, providing a powerful analytical tool. I am also very eager to engage in interdisciplinary communication and research, offering solutions based on practical needs while integrating cutting-edge algorithms.
What next
In the future, I will attempt to explore some semantic feature extraction to find clinically interpretable models. I plan to design better classifiers to better utilize embeddings, and I will also explore the combination of semantic and temporal features to enhance the capabilities of large language models.