Inferring the Phylogeny of Large Language Models

(arxiv.org)

35 points | by weinzierl 3 hours ago

1 comments

  • PunchTornado 1 hour ago
    Intuitive and expected result (maybe without the prediction of performance). I'm glad somebody did the hard work of proving it.

    Though, if this is so clearly seen, how come AI detectors perform so badly?

    • Calavar 5 minutes ago
      This experiment involves each LLM responding to 128 or 256 prompts. AI detection is generally focused on determining the writer of a single document, not comparing two analagous sets of 128 documents and determining if the same person/tool wrote both. Totally different problem.
    • haltingproblem 57 minutes ago
      It might be because detecting if output is AI generated and mapping output which is known to be from an LLM to a specific LLM or class of LLMs are different problems.