Grapevine

New AI model breaks barriers in cross-modality machine vision learning

Recently, the research team led by Prof. Wang Hongqiang from the Hefei Institutes of Physical Science of the Chinese Academy of Sciences proposed a wide-ranging cross-modality machine vision AI model.

This model overcame the limitations of traditional single-domain models in handling cross-modality information and achieved new breakthroughs in cross-modality image retrieval technology.

Cross-modality machine vision is a major challenge in AI, as it involves finding consistency and complementarity between different types of data. Traditional methods focus on images and features but are limited by issues like information granularity and lack of data.

Compared to traditional methods, researchers found that detailed associations are more effective in maintaining consistency across modalities. The work is posted to the arXiv preprint server.

In the study, the team introduced a wide-ranging information mining network (WRIM-Net). This model created global region interactions to extract detailed associations across various domains, such as spatial, channel, and scale domains, emphasizing modality invariant information mining across a broad range.