OmniParser V2
Turn any LLM into a Computer Use Agent
Featured
315 Votes
Trending
134 Views


Description
OmniParser ‘tokenizes’ UI screenshots from pixel spaces into structured elements in the screenshot that are interpretable by LLMs. This enables the LLMs to do retrieval based next action prediction given a set of parsed interactable elements.