Skip to content

Upgrading from v2 to v5 GuideΒ #771

@Balearica

Description

@Balearica

Overview

According to npm statistics (and Git Issues), many users are still using Tesseract.js v2. Version 2 was released in 2019 and includes many bugs, memory leaks, and performance issues that have been fixed in subsequent versions (in some cases v2 is 20x slower than the current version), so updating is strongly recommended. Additionally, v2 is no longer supported, so updating is a requirement to receive support in Git Issues.

While the changes made in each release are fully documented, to make upgrading as easy as possible, below is a guide describing all changes that v2 users may need to make to use the latest version. This guide describes the process of upgrading from v2 to v5. If (for whatever reason) you wish to update from v2 to v4, see the comment below.

Changes Impacting Most Users

  1. createWorker is now async
    • In most code this means worker = Tesseract.createWorker() should be replaced with worker = await Tesseract.createWorker()
  2. The arguments to createWorker have changed--the first two arguments are now language and oem
    1. E.g. createWorker('eng', 1, { logger: m => console.log(m) })
  3. worker.load, worker.loadLanguage, and worker.initialize are no longer needed
    • Simply delete these functions from existing code

Changes Impacting Fewer Users

  1. Electron users
    • Use the browser version of Tesseract.js
      • In v2, many users used the Node.js version
  2. Users of getPDF function
  3. Users who set cacheMethod: 'none' or cacheMethod: 'refresh' as workaround for caching bug
    • This workaround can be removed, the underlying bug has been fixed (see this comment)
  4. Users who set the optional corePath argument
    • corePath must be pointed to a directory containing all 4 of the following files from Tesseract.js-core v5:
      1. tesseract-core.wasm.js
      2. tesseract-core-simd.wasm.js
      3. tesseract-core-lstm.wasm.js
      4. tesseract-core-simd-lstm.wasm.js
  5. Node.js <14 users
    • Node.js v14 is now the earliest version supported
  6. Users of worker.detect function
    1. This function is now disabled by default
    2. To enable, set arguments legacyCore: true and legacyLang: true in createWorker options
      1. E.g. Tesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true})
  7. Users who implemented progress bars using log messages
    1. The language used in logs was standardized, so any scripts that parse logs may need to be updated

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions