The post discusses the effectiveness of language models (LMs) when processing tokens that may not follow conventional reasoning patterns. Commenters emphasize that LMs produce distributions of tokens based on latent spaces, suggesting that reasoning doesn't adhere strictly to token sequences. Instead, models trained on noisy data often yield correct answers despite flawed reasoning processes. There's mention of the need for better strategies in adaptive computation per token and the implications of LMs in algorithmic tasks versus general language understanding. Overall, the conversation reflects a growing interest in refining how LMs think internally and process language beyond straightforward token generation.