What is OpenAI’s GPT-4 Vision and how can it help you interpret images, charts?

Last Updated13/04/2024

Following its launch, OpenAI’s ChatGPT has evolved by leaps and bounds - now churning text is not the only function.
It can also create images from natural language prompts, thanks to the integration of DALL-E.

What is GPT-4 with Vision?
It's a fancy AI tool that can analyze images given to it
It's like a super-powered version of GPT-4, because it can understand both text and pictures.
There are other AI models like this, including CogVLM and LLaVA.

This AI can see the world through your eyes! It can analyze photos, screenshots, documents, and even understand charts and graphs.
It can even read words written in images, whether handwritten or typed.

GPT-4 with Vision can analyze historical documents in seconds, saving experts tons of time.
It can also help developers write code based on an image, or even create social media content with the help of another AI that creates images (DALL-E3).
This AI has its limitation : It can make mistakes, so it is advisable to always check its work.
It also avoids recognizing specific people in images, and it's not great for super precise tasks like medical analysis.