Developed by the joined forces of Apple and Cornell University researchers back in October 2023, it was first released on GitHub. Ferret was released on GitHub along with the research papers. It didn’t take long for Apple to drop the curtain and finally release this open-source generative model.
While the World has its hands full with regulating generative models, Apple has taken the big step and now has an open-source multimodal generative AI. While there are ethical risks of generative AI that cannot be taken out of the equation, Ferret combines NLP with computer vision and provides a novel approach to engagement with visual content.
With this step, Apple takes yet another action to be a step ahead of the generative AI competition.
The investments and internal conversational AI efforts of Apple
John Giannandrea is the person who leads Apple’s pursuit of Artificial Intelligence advancements. The AI chief monitors Apple’s efforts of big language models and then updates the reports directly to the CEO, Tim Cook.
Cook accumulated a conversational team four years ago, and ever since then, the work with AI and promise for future Apple AI products has accelerated.
Apple has an internal chatbot, which many engineers call AppleGPT. However, the company does not publicly wish to use the name. Access to this application is only within Apple. However, suitability is a factor for product features.
Keeping up with conversational AI research demands a lot of investment. Not to mention, it also requires a lot of computational resources, especially for large languages. Reports show that Apple is expected to spend over $ 4 billion on AI servers this year.
How does the Ferret work?
With the launch of Ferret, Apple has taken action on their promise of bridging vision and language AI. Instead of analyzing the whole image, Ferret has an amazing ability to detect concepts and semantic objects that are user-specified within the regions of the image.
If we are to give you an illustration, let’s say that if any user encircles an image of a tiger’s body and asks what the colour of the stripes in the yellow fur is, Ferret will scan the circle and say, ‘The colour of the strip is black.’
As you can see with this, Apple can be miles clear of the Google and Microsoft AI rivalry. Ferret is superior in region-centered chatting compared to previous multimodal AI systems.
Ferret utilizes the architecture that is dual-encoder. One takes care of the visual aspect, and the other takes care of the textual part. A dynamic fusion mechanism is used to fuse the data from the two streams. It is done so as to keep the balance when utilizing the two models during training.
What advantages does Ferret bring to the table?
If you don’t already, you will be surprised to know that Ferret is licensed under a non-commercial open-source license by Apple. It promises a great deal of innovation to future Apple AI products. Here are some of the benefits that come with Ferret:
- As we know, the code is publicly accessible, and many applications of ferret and novel extensions can reach heights that even Apple has not visualized itself.
- Researchers from all around the World can take the initiative to build around Ferret. And Apple can reap the benefits of collective progress.
- Among many other advantages that Ferret brings to the table, Ferret promotes transparency.
Potential Future Challenges
Scalability is one of the biggest concerns as of right now. While the advantages that come with Ferret are revolutionary, due to infrastructure limitations, Apple has yet to compete with GPT-4. Further strategic decisions and some right collaborations might answer these concerns.
You simply cannot take away anything from Ferret as it promises developments to not only Apple users but also the developers. Despite the ethical risks of generative AI, Ferret promises a positive AI innovation business impact.
While it is still an early research phase, Ferret provides the base of highly capable multimodal systems. The sky’s the limit, indeed!