Intelligent Critique

AFFILIATIONS: Al ROBOLAB. University of Luxembourg
AUTHORS: Ayoub Nainia, Robert Frankle

Introducing the VIsual-VOcabulary pre-training (VIVO)
based automatic art critique system

Previous slide
Next slide

An attempt to write art criticism with Artificial Intelligence our solution leverages the recent vision-language pre-training (VLP) approaches, including object-semantics aligned pretraining (OSCAR) and visual-vocabulary pre-training (VIVO), by combing them in a single automatic art critique system. we followed two different methodologies to build the current intelligent critique

Building an art captioning model trained on ArtEmis dataset.

Output: the colors used to depict clouds in the trees is very bright and pleasing to the lines.
Dataset: We trained our art captioning model on Contemporary a rt data from Art Em is dataset.

Photo Feature Extractor: a VGG16 model.
Sequence Processor: a word embedding layer for handling the text input, followed by a Long Short-Term Memory (LSTM) RNN layer.
Decoder:  the feature extractor and sequence processor are merged together and processed by a Dense layer to make a final prediction.
Main challenge:
It is challenging to generate art captions for novel objects which are unseen in our caption-labeled training data.

leveraging the recent vision-language pre-training (VLP) approaches and combining them to build art critique system

Output: The artwork image shows a car parked in a field. The painting’s color scheme is not black and white, while the dominant background and foreground color is Grey.
Proposed VIVO: VIVO pre-training uses paired image-tag data to learn a rich visual vocabulary where image region fea!{] tures and tags of the semantically similar objects are mapped into vectors that are close to each other.
The paired image-caption data only cover a limited numbers of objects (in blue).
During inference, the model can generalize to describe novel objects (in yellow) that are learnt during VIVO pre-training.
Main challenge: Incorporating interpretation and judgement of the work of art with the description.

Kevin Lin Lijuan Wang Lei Zhang Jianfeng Gao Zicheng Liu Xiaowei Hu, Xi Yin.
Vivo: Visual vocabulary pre-training for novel object captioning. arxiv, 2021.