Fix beam search for Llama models by adding reorder_cache method #2753

amrothemich · 2025-06-17T13:41:55Z

When using beam search with Unsloth-optimized Llama models, users encounter: NotImplementedError: Make sure that a reorder_cache function is correctly implemented in transformers.models.llama.modeling_llama

This occurs because Unsloth patches LlamaForCausalLM but doesn't preserve the reorder_cache static method required for beam search operations.

The fix adds the missing reorder_cache method after Unsloth's patching, ensuring compatibility with transformers' beam search functionality. This allows users to use generation methods like model.generate(num_beams=N) without errors.

When using beam search with Unsloth-optimized Llama models, users encounter: NotImplementedError: Make sure that a `reorder_cache` function is correctly implemented in transformers.models.llama.modeling_llama This occurs because Unsloth patches LlamaForCausalLM but doesn't preserve the reorder_cache static method required for beam search operations. The fix adds the missing reorder_cache method after Unsloth's patching, ensuring compatibility with transformers' beam search functionality. This allows users to use generation methods like model.generate(num_beams=N) without errors. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

The license field was using a simple string format which causes installation failures with modern setuptools. PEP 621 requires the license to be specified as either {file: "path"} or {text: "value"}. This change updates the license field to use the correct object format, allowing successful package installation from source. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

danielhanchen · 2025-06-17T14:24:39Z

Thanks! I think pyproject.toml doesn't need changes hopefully right?
Is it possible to move the static method outside of the function?

amrothemich · 2025-06-17T14:33:37Z

Oops I wasn't trying to merge it in yet, it's not ready. I do think pyproject.toml needs changes (unrelated to the inference issue). Whenever I try to pip install, I get:

× Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [95 lines of output]
      configuration error: `project.license` must be valid exactly by one definition (2 matches found):
      
          - keys:
              'file': {type: string}
            required: ['file']
          - keys:
              'text': {type: string}
            required: ['text']
      
      DESCRIPTION:
          `Project license <[https://peps.python.org/pep-0621/#license>`_](https://peps.python.org/pep-0621/#license%3E%60_).
      
      GIVEN VALUE:
          "Apache-2.0"
      
      OFFENDING RULE: 'oneOf'
      
      DEFINITION:
          {
              "oneOf": [
                  {
                      "properties": {
                          "file": {
                              "type": "string",
                              "$$description": [
                                  "Relative path to the file (UTF-8) which contains the license for the",
                                  "project."
                              ]
                          }
                      },
                      "required": [
                          "file"
                      ]
                  },
                  {
                      "properties": {
                          "text": {
                              "type": "string",
                              "$$description": [
                                  "The license of the project whose meaning is that of the",
                                  "`License field from the core metadata",
                                  "<[https://packaging.python.org/specifications/core-metadata/#license>`_](https://packaging.python.org/specifications/core-metadata/#license%3E%60_)."
                              ]
                          }
                      },
                      "required": [
                          "text"
                      ]
                  }
              ]
          }
      Traceback (most recent call last):
        File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-91456821-3098-4104-a8da-189c5a8c8178/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-91456821-3098-4104-a8da-189c5a8c8178/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-91456821-3098-4104-a8da-189c5a8c8178/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/dist-packages/setuptools/build_meta.py", line 332, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/dist-packages/setuptools/build_meta.py", line 302, in _get_build_requires
          self.run_setup()
        File "/usr/local/lib/python3.12/dist-packages/setuptools/build_meta.py", line 318, in run_setup
          exec(code, locals())
        File "<string>", line 1, in <module>
        File "/usr/local/lib/python3.12/dist-packages/setuptools/__init__.py", line 117, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/dist-packages/setuptools/_distutils/core.py", line 158, in setup
          dist.parse_config_files()
        File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-91456821-3098-4104-a8da-189c5a8c8178/lib/python3.12/site-packages/_virtualenv.py", line 22, in parse_config_files
          result = old_parse_config_files(self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/dist-packages/setuptools/dist.py", line 608, in parse_config_files
          pyprojecttoml.apply_configuration(self, filename, ignore_option_errors)
        File "/usr/local/lib/python3.12/dist-packages/setuptools/config/pyprojecttoml.py", line 71, in apply_configuration
          config = read_configuration(filepath, True, ignore_option_errors, dist)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/dist-packages/setuptools/config/pyprojecttoml.py", line 136, in read_configuration
          validate(subset, filepath)
        File "/usr/local/lib/python3.12/dist-packages/setuptools/config/pyprojecttoml.py", line 60, in validate
          raise ValueError(f"{error}\n{summary}") from None
      ValueError: invalid pyproject.toml config: `project.license`.
      configuration error: `project.license` must be valid exactly by one definition (2 matches found):
      
          - keys:
              'file': {type: string}
            required: ['file']
          - keys:
              'text': {type: string}
            required: ['text']
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

And the change here fixes it.

The transformers library expects the method to be named _reorder_cache (with underscore) for beam search functionality. The previous commit incorrectly used reorder_cache without the underscore, causing the NotImplementedError to persist. This corrects the method name to match transformers' expectations, properly enabling beam search for Unsloth-optimized Llama models. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Moved the _llama_reorder_cache function outside of the pre_patch method to make it a clean module-level function. This improves code organization and follows Python best practices by avoiding nested function definitions where unnecessary. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

amrothemich · 2025-06-17T14:40:46Z

I'd also like to come up with a more general solution so we don't have to do this for every model, since I know Gemma has had issues too. Will update and ping you when it's ready :)

…ForCausalLM The beam search error was occurring because PEFT models wrap the base model and need their own _reorder_cache method. This commit adds the same _reorder_cache implementation to PeftModelForCausalLM class. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Added _reorder_cache method to: 1. Model instances in from_pretrained 2. PEFT model instances in patch_peft_model 3. Base model of PEFT models (model.base_model) This ensures beam search works regardless of how the model is accessed during generation. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

- Created general_reorder_cache function in _utils.py that works for all transformer models - Created patch_model_for_beam_search function to patch any model instance - Updated llama.py to use the general solution instead of model-specific implementation - This fix now automatically works for all models (Gemma, Mistral, Qwen, etc.) that inherit from FastLlamaModel - Models that don't inherit can easily use patch_model_for_beam_search to add beam search support This is a more maintainable solution that fixes beam search for all Unsloth models. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

- Added explanation of why we need to provide our own _reorder_cache implementation - Added None check in general_reorder_cache for safety - Documented that newer models rely on Cache classes but beam search might still use legacy format - Clarified that our implementation is based on working models from transformers library 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

The transformers library expects _reorder_cache to be a staticmethod on the model class, not a regular method. This matches the pattern used by models like GPT2 and OPT. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

The key issue was that we were only patching our local import of LlamaForCausalLM, but the error was looking for _reorder_cache in transformers.models.llama.modeling_llama. Now we patch both the local import AND the transformers module's version. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

…ints - Added detailed debugging history to CLAUDE.md including all attempted fixes - Documented the root cause and current state of the beam search issue - Removed debug print statements from llama.py - Ready to transfer to Windows machine for further debugging The beam search fix attempts to patch _reorder_cache in multiple places: 1. Local LlamaForCausalLM class 2. PeftModelForCausalLM class 3. transformers.models.llama.modeling_llama.LlamaForCausalLM 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

danielhanchen · 2025-06-17T23:41:47Z

@amrothemich Actually can you move the pyrpoject.toml PR to a new PR thanks :)

This notebook provides a minimal test case to debug the beam search issue on a GPU-enabled environment. It: - Clones our fix-reorder-cache branch - Installs unsloth from source - Tests beam search with a small model - Includes debugging code to check module paths To use: Upload to Google Colab and run with GPU runtime. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

…ning it

danielhanchen · 2025-06-23T12:03:49Z

@amrothemich I think it's best to open a new PR - I think you included Claude Code :)

amrothemich and others added 2 commits June 17, 2025 09:35

amrothemich and others added 2 commits June 17, 2025 10:35

amrothemich and others added 7 commits June 17, 2025 11:00

amrothemich and others added 3 commits June 18, 2025 23:06

removed check whether reorder cache is already defined before reassig…

fb36e23

…ning it

Update pyproject.toml

3a72bf3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix beam search for Llama models by adding reorder_cache method #2753

Fix beam search for Llama models by adding reorder_cache method #2753

Uh oh!

amrothemich commented Jun 17, 2025

Uh oh!

danielhanchen commented Jun 17, 2025

Uh oh!

amrothemich commented Jun 17, 2025

Uh oh!

amrothemich commented Jun 17, 2025

Uh oh!

danielhanchen commented Jun 17, 2025

Uh oh!

danielhanchen commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

Fix beam search for Llama models by adding reorder_cache method #2753

Are you sure you want to change the base?

Fix beam search for Llama models by adding reorder_cache method #2753

Uh oh!

Conversation

amrothemich commented Jun 17, 2025

Uh oh!

danielhanchen commented Jun 17, 2025

Uh oh!

amrothemich commented Jun 17, 2025

Uh oh!

amrothemich commented Jun 17, 2025

Uh oh!

danielhanchen commented Jun 17, 2025

Uh oh!

danielhanchen commented Jun 23, 2025

Uh oh!

Uh oh!