• RunawayFixer@lemmy.world
    link
    fedilink
    English
    arrow-up
    54
    ·
    2 days ago

    A large language model shouldn’t even attempt to do math imo. They made an expensive hammer that is semi good at one thing (parroting humans) and now they’re treating every query like it’s a nail.

    Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question? Or is it already modular and they just suck at anything that cannot be calibrated purely with brute force computing?

    • kadu@scribe.disroot.org
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      1 day ago

      Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question?

      Precisely because this is a LLM. It doesn’t know the difference between writing out a maths problem, a recipe for cake or a haiku. It transforms everything into the same domain and is doing fancy statistics to come up with a reply. It wouldn’t know that it needs to invoke the “Calculator” feature unless you hard code that in, which is what ChatGPT and Gemini do, but it’s also easy to break.

        • zalgotext@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          8
          ·
          1 day ago

          Sort of. There’s a relatively new type of LLM called “tool aware” LLMs, which you can instruct to use tools like a calculator, or some other external program. As far as I know though, the LLM has to be told to go out and use that external thing, it can’t make that decision itself.

        • kadu@scribe.disroot.org
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 day ago

          Can the model itself be trained to recognize mathematical input and invoke an external app, parse the result and feed that back into the reply? No.

          Can you create a multi-layered system that uses some trickery to achieve this effect most of the time? Yes, that’s what OpenAI and Google are already doing by recognizing certain features of the users’ inputs and changing the system prompts to force the model to output Python code or Markdown notation that your browser then renders using a different tool.

    • Jtotheb@lemmy.world
      link
      fedilink
      English
      arrow-up
      20
      ·
      2 days ago

      Yep. Instead of focusing on humans communicating more effectively with computers, which are good at answering questions that have correct, knowable answers, we’ve invented a type of computer that can be wrong because maybe people will like the vibes more? (And we can sell vibes)

    • Blackmist@feddit.uk
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 day ago

      In fairness the example I’ve seen MS give was taking a bunch of reviews and determining if the review was positive or negative from the text.

      It was never meant to mangle numbers, but we all know it’s going to be used for that anyway, because people still want to believe in a future where robots help them, rather than just take their jobs and advertise to them.

      • RunawayFixer@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 day ago

        I would rather not have it attempt something that it can’t do, no direct result is better than a wrong result imo. Here it’s correctly identifying that it’s a calculation question and instead of suggesting using a formula, it tries to hallucinate a numerical answer itself. The creators of the model seem to have a mindset that the model must try to answer no matter what, instead of training it to not answer questions that it can’t answer correctly.

        • Blackmist@feddit.uk
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 day ago

          As far as I can tell, the copilot command has to be given a range of data to work with, so here it’s pulling a number out of thin air. Be nice if the output from this was just “please tell this command which data to use” but as always it doesn’t know how to say “I don’t know”…

          Mostly because it never knew anything to start with.

    • Evotech@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      1 day ago

      They have (well, anthropic)

      It’s called MCP

      Nowadays you just give the AI access to a calculator basically… or whatever other tools it needs. Including other models to help it answer something.

      • zod000@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 day ago

        You have to admire the gall of naming your system after the evil AI villain of the original Tron movie.

      • RunawayFixer@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        I hadn’t heard of that protocol before, thanks. It holds a lot of promise for the future, but getting the input right for the tools is probably quite the challenge. And it also scares me because I now expect more companies to release irresponsible integrations.

    • UnderpantsWeevil@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 day ago

      Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question?

      Because we vibes-coded the OpenAI model with OpenAI and it didn’t think this was the optimal way to design itself.

    • qaz@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 days ago

      OpenAI already makes it write Python functions to do the calculations.

      • Echo Dot@feddit.uk
        link
        fedilink
        English
        arrow-up
        16
        ·
        2 days ago

        So it’s going to write python functions to calculate the answer where all the variables are stored in an Excel spreadsheet a program that can already do the calculations? And how many forests did we burn down for that wonderful piece of MacGyvered software I wonder.

        The AI bubble cannot burst soon enough.

            • UnderpantsWeevil@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 day ago

              I’m wondering if this is a sleight-of-hand trick by the poster, then.

              If they typed the “1” field to Text and left the 2 and 3 as numeric, then ran Copilot on that. In that case, its more an indictment of Excel than Copilot, strictly speaking. The screen doesn’t make clear which cells are numbers and which are text.

              • absentbird@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 day ago

                I don’t think there’s an explanation that doesn’t make this copilot’s fault. Honestly JavaScript shouldn’t allow math between numbers and strings in the first place. “1” + 1 is not a number, and there’s already a type for that: NaN

                Regardless, the sum should be 5 if the first cell is text, so it’s incorrect either way.

                • UnderpantsWeevil@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  1 day ago

                  Honestly JavaScript shouldn’t allow math between numbers and strings in the first place.

                  You can explicitly type values in more recent versions of JavaScript to avoid this, if you really don’t want to let concatenation be the default. So, again, this feels like an oversight in integration rather than strict formal logic.

      • jacksilver@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 day ago

        This is the right answer.

        LLMs have already become this weird mesh of different services tied together to look more impressive. OpenAIs models can’t do math and farm it out to python for accuracy.

    • WalrusDragonOnABike [they/them]@reddthat.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question?

      Wasn’t doing this part of the reason OpenSeek was able to compete with much smaller data sets and less hardware requirements?

      • RunawayFixer@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 day ago

        I assume you mean DeepSeek? And it doesn’t look like it, according to what I could find, their biggest innovation was “reinforcement learning to teach a base language model how to reason without any human supervision”. https://huggingface.co/blog/open-r1

        Some others have replied that chatgpt and copilot are already modular: they use python for arithmetic questions. But that apparently isn’t enough to be useful.

        • absentbird@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          I feel like the best modular AI right now is from Gemini. It can take a scan of a document and turn it into a CSV, which I was surprised by.

          I figure it must have multiple steps, OCR, text interpretation, recognizing a table, then piping the text to some sort of CSV tool.

    • utopiah@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      edit-2
      1 day ago

      the LLM will call up specialized algorithms once it has identified the nature of the question?

      Because there is no “it” that “calls” or “identify” even less the “nature of the question”.

      This requires intelligence, not probability on the most likely next token.

      You can do maths with words and you can write poem with numbers, either requires to actually understand, that’s the linchpin, rather than parse and provide an answer based on what has been written so far.

      Sure the model might string together tokens that sound very “appropriate” to the question, in the sense that it fits within the right vocabulary, and if its dataset the occurrence was just frequent enough it even be correct, but that’s still not understanding even 1 single word (or token) within either the question or the corpus.

      • FishFace@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        An llm can, by understanding some features of the input, predict a categorisation of the input and feed it to do different processors. This already works. It doesn’t require anything beyond the capabilities llms actually have, and isn’t perfect. It’s a very good question why this hasn’t happened here; an llm can very reliably give you the answer to “how do you sum some data in python” so it only needs to be able to do that in excel and put the result into the cell.

        There are still plenty of pitfalls. This should not be one of them, so that’s interesting.