MedQA: Clinical AI Fine-Tuned on AMD ROCm Without CUDA
A team of developers has built MedQA, a clinical question-answering AI model fine-tuned entirely on AMD hardware using ROCm, bypassing the need for NVIDIA CUDA. The model, based on Alibaba's Qwen3-1.7B, uses LoRA (Low-Rank Adaptation) to train only 2.2 million of its 1.5 billion parameters on an AMD Instinct MI300X with 192 GB of HBM3 memory. Training on 2,000 samples from the MedMCQA dataset took approximately five minutes. The project demonstrates that the HuggingFace ecosystem (Transformers, PEFT, TRL, Accelerate) works seamlessly on ROCm with no code changes. The model outputs both the correct answer letter and a clinical explanation. Challenges included a bitsandbytes compatibility issue and NaN loss with bfloat16, resolved by using fp16. The fine-tuned adapter is publicly available on HuggingFace Hub. Next steps include scaling to the full MedMCQA corpus, adding confidence scoring, RAG integration, and proper evaluation. The project was built for the AMD Developer Hackathon on lablab.ai by Harikrishna Sivanand Iyer and Srijan Sivaram A.
Key facts
- MedQA is a LoRA fine-tuned clinical QA model built on AMD ROCm without CUDA.
- Base model is Qwen3-1.7B from Alibaba.
- Trained on AMD Instinct MI300X with 192 GB HBM3 memory.
- Used 2,000 training samples from MedMCQA dataset.
- Training took approximately 5 minutes.
- Only 2.2 million parameters trained via LoRA.
- HuggingFace ecosystem works on ROCm with three environment variables.
- Fine-tuned adapter available on HuggingFace Hub.
- Built for AMD Developer Hackathon on lablab.ai.
- Developers: Harikrishna Sivanand Iyer and Srijan Sivaram A.
Entities
Institutions
- AMD
- HuggingFace
- Alibaba
- lablab.ai
- HuggingFace Hub
- HuggingFace Spaces
- GitHub