[VLM101] Creating a Multimodal Chatbot with Fine-tuning (feat.MCP)
dreamingbumblebee
This is an introductory course to understand the concepts and application methods of Vision-Language Models (VLM), and to practically run the LLaVA model in an Ollama-based environment, practicing the process of integrating it with MCP (Model Context Protocol). This course covers the principles of multimodal models, Quantization, service, and integrated demo development, providing a balanced mix of theory and practice.
初級
Vision Transformer, transformer, Llama