An Introduction to Vision Language Model AI application now a days are not only generating texts , but also images, audio and videos. The similar approach of transformer architecture is used in Vision language model also.…