Skip to content

Conversation

lagru
Copy link
Member

@lagru lagru commented Jul 24, 2025

Description

Implementation of the new strategy around value range scaling in our library that we extensively discussed in the recent sprint and meeting. We will add more detailed descriptions of this strategy but in short:

  • introduce a new prescale parameter for functions that actually need / benefit from prescaling the input image.
  • Add a dedicated private (for now) function for that. I plan to use this elsewhere in the future.

Checklist

Release note

For maintainers and optionally contributors, please refer to the instructions on how to document this PR for the release notes.

...

@lagru lagru added ⏩ type: Enhancement Improve existing features 📜 type: API Involves API change(s) 🥾 Path to skimage2 A step towards the new "API 2.0" labels Jul 24, 2025
lagru added 2 commits August 15, 2025 16:16
This makes heavy use of the scaling implementation that was proposed in
[1]. Furthermore, `ski._shared.utils` seesm like a more fitting place.

[1] https://www.github.com/scikit-image/scikit-image/pull/7697
@lagru lagru marked this pull request as ready for review August 28, 2025 16:54
@lagru lagru removed the ⏩ type: Enhancement Improve existing features label Aug 28, 2025
@lagru lagru changed the title Add prescale parameter to "blob functions" Add prescale function and similar parameter to "blob functions" Aug 28, 2025
@lagru lagru changed the title Add prescale function and similar parameter to "blob functions" Add prescale parameter to "blob functions" Aug 28, 2025
@lagru lagru requested review from stefanv and matthew-brett August 28, 2025 17:03
Copy link
Contributor

@matthew-brett matthew-brett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments in passing.

I'm wondering about the divide by two heuristic. Will that always work?

with ``(img - lower) / (higher - lower)``
to 0 and 1 respectively.

``False``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a vague feeling we agreed this should be a string like 'none', but I can't remember the reasoning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't have a strong opinion here on whether prescale=False or prescale='none' is better. I'd say best not to mix argument types. But we do allow prescale=tuple(...), so that consistency is out of the window anyway? And prescale='none' does feel like a crutch for None or False.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see @stefanv's comment suggesting None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jni - I have some vague memory that you had a preference for the string 'none' here. Is that right?


``False``
Don't prescale the value range of `image` at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we discuss an int-type scaling option independent of legacy? I mean scaling int images only from dtype min -> 0, dtype max -> 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on that? Do you mean an option that only scales integer images to [0, 1] and ignores floats? Would that differentiate signed and unsigned ints? Do you think it would be more useful than confusing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what legacy does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, 'legacy' scales uints to 0, 1, and ints to -1, 1. Somewhat oddly.

@@ -1096,3 +1097,121 @@ def as_binary_ndarray(array, *, variable_name):
f"safely cast to boolean array."
)
return np.asarray(array, dtype=bool)


def _prescale_value_range(image, *, mode, stacklevel=3):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me, but can stacklevel not be auto-detected? It's a bit of a pity to have that complexity in this function, that will be commonly used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said in the meeting, there's the skip_file_prefixes argument. But that's only available since Python 3.12. According to SPEC0 we are close to dropping 3.11.

So it depends on whether you we want the very next release to support 3.11 or not. Otherwise we'd need to hack something together using either a decorator at the public API boundary or do some frame inspection ourselves (like in assert_stacklevel). Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now; we can revisit in the future.

respectively. This is a shorthand for
``prescale=(image.min(), image.max())``.

``'legacy'``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consider calling this dtype instead of legacy. It doesn't matter much, but I think it's more description, and it actually makes sense not to touch float images when normalizing by dtype (since you only know ranges for integer dtypes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not opposed. That would also underline that we intend to keep this scaling option around. It's also less judgy. 😄 I'm on board. I assume it would also keep its specific behavior for unsigned and signed ints?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, no, I suppose that's the difference between this mode and legacy that @matthew-brett may have referred to: there's no scaling to [-1, 1], only [0, 1].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have multiple names for the same thing. I have a vague desire for 'legacy' because I suspect the most common use-case will be "give me what I got before", so, without 'legacy' they will have to do more reading.

How about integer_dt_range or something?

floating dtype, it is left alone. See :ref:`.img_as_float` for
more details.

``(lower, higher)``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``(lower, higher)``
``(min, max)``

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree. The two passed values don't need to correspond to a minimum and maximum. They are just a two values – the first lower than the other – that will correspond to 0 and 1 after scaling.

I remember us introducing that option precisely for batch cases, where that might not be the case? Thinking about it, it's an assumption I'd like to avoid using in our documentation.

Using lower and higher also avoids slight confusion with Python's min() and max() functions. But that's minor thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so you think we purposefully should allow scaling that takes the image outside of [0, 1]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to enforce 0, 1 as output? I would have thought it cleaner to allow the user to do what they will with this; they can clip themselves if they want.


``(lower, higher)``
Normalize `image` such that ``lower`` and ``higher`` are scaled
with ``(img - lower) / (higher - lower)``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be (clip(img, min, max) - min) / (max - min)`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a typo? Do you actually mean to apply clip as the last operation? E.g.,

scaled = (img - lower) / (higher - lower)
scaled = clip(scaled, min, max)

I was operating under the impression that not clipping during scaling is a feature. That would allow something like Fahrenheit to Celsius conversion

fahrenheit = [-10, 1, 60, 100]
_prescale_value_range(fahrenheit, mode=(32, (9/5 + 32)))
# Out[6]: array([-23.33333333, -17.22222222,  15.55555556,  37.77777778])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned about the case where the user intends to do minmax scaling, but provides incorrect min and max values. However, I think since we already provide minmax, it's OK to let this remain as-is, and perhaps just note that in the docstring (that we do no clipping to ensure range is [0, 1] internally).

mode : {'minmax', 'legacy', False} or tuple[float, float], optional
Controls the rescaling behavior for `image`.

``'minmax'``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minmax vs min-max, preference anyone?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+0.5 for "minmax".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely prefer 'minmax' too.


try:
# Scale first, to avoid over-/underflowing float range
out /= peak_to_peak
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the most stable order? Small float - small float can be problematic.

Also, do we check that max value is >= max of image? Otherwise output will be >1. Same for lower bound.

template="Parameter `{deprecated_name}` is deprecated since version "
"{deprecated_version} and will be removed in {changed_version} (or "
"later). To avoid this warning, please use the parameters `threshold` "
"together with the desired `prescale` mode instead. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe recommend prescale='minmax'?

@@ -276,6 +296,29 @@ def blob_dog(
`exclude_border`-pixels of the border of the image.
If zero or False, peaks are identified regardless of their
distance from the border.
prescale : {'minmax', 'legacy', False} or tuple[float, float], optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of copying this docstring everywhere, refer to a central docstring? I.e., we could expose the utility function in the public namespace? Or maybe this becomes templated, and is filled out on import?

Copy link
Member Author

@lagru lagru Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal pet peeve are docstrings that are only useful at runtime. E.g., I find it really annoying when digging through SciPy's source code.

I'm also -1 on hiding the information from users behind a link for the same reason.

If you are worried about this getting out of sync. I'd rather have a decorator enforcing the consistency at runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is going to appear all over the place, so if you replicate it, you are going to be doing a lot of search and replaces as we figure out the message. Thoughts @matthew-brett?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that Scipy's automated docstrings are pretty annoying - and I think I wrote some of them. I quite like the idea of enforcing the match between docstrings with a decorator. We could add a source processor to do it, as well, to be run if the decorator check fails.

Co-authored-by: Matthew Brett <matthew.brett@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🥾 Path to skimage2 A step towards the new "API 2.0" 📜 type: API Involves API change(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants