The Bagel model represents an innovative approach in the realm of multimodal AI, integrating different types of data modalities. This open-source model aims to streamline the development and deployment of AI systems that require the processing of various data forms such as text, audio, and images. One notable trend is the increasing use of promotional videos accompanying academic publications, enhancing engagement and understanding of complex topics. However, hardware requirements for running Bagel efficiently remain a point of curiosity, with users eager for specific details to assess compatibility and feasibility.