Recent discussions focus on the capabilities of large language models (LLMs) and their ability to process multimodal information without extensive training. Some users question the validity of the claim, suggesting it may be exaggerated or clickbait. They note that the models might be utilizing advanced techniques like reinforcement learning or multimodal embedding to achieve results that mimic perception but do not involve traditional training. There's a debate about the implications of these findings and comparisons to basic sensors like thermostats that operate without programming. Overall, while intriguing, the statements about LLMs’ capabilities need careful scrutiny and clarification.