-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Overview
According to npm statistics (and Git Issues), many users are still using Tesseract.js v2. Version 2 was released in 2019 and includes many bugs, memory leaks, and performance issues that have been fixed in subsequent versions (in some cases v2 is 20x slower than the current version), so updating is strongly recommended. Additionally, v2 is no longer supported, so updating is a requirement to receive support in Git Issues.
While the changes made in each release are fully documented, to make upgrading as easy as possible, below is a guide describing all changes that v2 users may need to make to use the latest version. This guide describes the process of upgrading from v2 to v5. If (for whatever reason) you wish to update from v2 to v4, see the comment below.
Changes Impacting Most Users
createWorker
is now async- In most code this means
worker = Tesseract.createWorker()
should be replaced withworker = await Tesseract.createWorker()
- In most code this means
- The arguments to
createWorker
have changed--the first two arguments are now language andoem
- E.g.
createWorker('eng', 1, { logger: m => console.log(m) })
- E.g.
worker.load
,worker.loadLanguage
, andworker.initialize
are no longer needed- Simply delete these functions from existing code
Changes Impacting Fewer Users
- Electron users
- Use the browser version of Tesseract.js
- In v2, many users used the Node.js version
- Use the browser version of Tesseract.js
- Users of
getPDF
function- This function has been replaced by
pdf
recognize option (GetPDF() with Scheduler returns the same PDF fileΒ #488) - See browser and node examples for usage
- This function has been replaced by
- Users who set
cacheMethod: 'none'
orcacheMethod: 'refresh'
as workaround for caching bug- This workaround can be removed, the underlying bug has been fixed (see this comment)
- Users who set the optional
corePath
argumentcorePath
must be pointed to a directory containing all 4 of the following files from Tesseract.js-core v5:tesseract-core.wasm.js
tesseract-core-simd.wasm.js
tesseract-core-lstm.wasm.js
tesseract-core-simd-lstm.wasm.js
- Node.js <14 users
- Node.js v14 is now the earliest version supported
- Users of
worker.detect
function- This function is now disabled by default
- To enable, set arguments
legacyCore: true
andlegacyLang: true
increateWorker
options- E.g.
Tesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true})
- E.g.
- Users who implemented progress bars using log messages
- The language used in logs was standardized, so any scripts that parse logs may need to be updated