-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add prescale
parameter to "blob functions"
#7858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This makes heavy use of the scaling implementation that was proposed in [1]. Furthermore, `ski._shared.utils` seesm like a more fitting place. [1] https://www.github.com/scikit-image/scikit-image/pull/7697
prescale
parameter to "blob functions"prescale
function and similar parameter to "blob functions"
prescale
function and similar parameter to "blob functions"prescale
parameter to "blob functions"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments in passing.
I'm wondering about the divide by two heuristic. Will that always work?
with ``(img - lower) / (higher - lower)`` | ||
to 0 and 1 respectively. | ||
|
||
``False`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a vague feeling we agreed this should be a string like 'none'
, but I can't remember the reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't have a strong opinion here on whether prescale=False
or prescale='none'
is better. I'd say best not to mix argument types. But we do allow prescale=tuple(...)
, so that consistency is out of the window anyway? And prescale='none'
does feel like a crutch for None
or False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jni - I have some vague memory that you had a preference for the string 'none'
here. Is that right?
|
||
``False`` | ||
Don't prescale the value range of `image` at all. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we discuss an int-type scaling option independent of legacy
? I mean scaling int images only from dtype min -> 0, dtype max -> 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on that? Do you mean an option that only scales integer images to [0, 1] and ignores floats? Would that differentiate signed and unsigned ints? Do you think it would be more useful than confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what legacy does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, 'legacy'
scales uints to 0, 1, and ints to -1, 1. Somewhat oddly.
@@ -1096,3 +1097,121 @@ def as_binary_ndarray(array, *, variable_name): | |||
f"safely cast to boolean array." | |||
) | |||
return np.asarray(array, dtype=bool) | |||
|
|||
|
|||
def _prescale_value_range(image, *, mode, stacklevel=3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me, but can stacklevel not be auto-detected? It's a bit of a pity to have that complexity in this function, that will be commonly used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I said in the meeting, there's the skip_file_prefixes
argument. But that's only available since Python 3.12. According to SPEC0 we are close to dropping 3.11.
So it depends on whether you we want the very next release to support 3.11 or not. Otherwise we'd need to hack something together using either a decorator at the public API boundary or do some frame inspection ourselves (like in assert_stacklevel
). Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now; we can revisit in the future.
respectively. This is a shorthand for | ||
``prescale=(image.min(), image.max())``. | ||
|
||
``'legacy'`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could consider calling this dtype
instead of legacy
. It doesn't matter much, but I think it's more description, and it actually makes sense not to touch float images when normalizing by dtype
(since you only know ranges for integer dtypes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not opposed. That would also underline that we intend to keep this scaling option around. It's also less judgy. 😄 I'm on board. I assume it would also keep its specific behavior for unsigned and signed ints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, no, I suppose that's the difference between this mode and legacy
that @matthew-brett may have referred to: there's no scaling to [-1, 1], only [0, 1].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have multiple names for the same thing. I have a vague desire for 'legacy'
because I suspect the most common use-case will be "give me what I got before", so, without 'legacy'
they will have to do more reading.
How about integer_dt_range
or something?
floating dtype, it is left alone. See :ref:`.img_as_float` for | ||
more details. | ||
|
||
``(lower, higher)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``(lower, higher)`` | |
``(min, max)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree. The two passed values don't need to correspond to a minimum and maximum. They are just a two values – the first lower than the other – that will correspond to 0 and 1 after scaling.
I remember us introducing that option precisely for batch cases, where that might not be the case? Thinking about it, it's an assumption I'd like to avoid using in our documentation.
Using lower
and higher
also avoids slight confusion with Python's min()
and max()
functions. But that's minor thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so you think we purposefully should allow scaling that takes the image outside of [0, 1]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to enforce 0, 1 as output? I would have thought it cleaner to allow the user to do what they will with this; they can clip themselves if they want.
|
||
``(lower, higher)`` | ||
Normalize `image` such that ``lower`` and ``higher`` are scaled | ||
with ``(img - lower) / (higher - lower)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be (clip(img,
min, max) - min) / (max - min)`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a typo? Do you actually mean to apply clip as the last operation? E.g.,
scaled = (img - lower) / (higher - lower)
scaled = clip(scaled, min, max)
I was operating under the impression that not clipping during scaling is a feature. That would allow something like Fahrenheit to Celsius conversion
fahrenheit = [-10, 1, 60, 100]
_prescale_value_range(fahrenheit, mode=(32, (9/5 + 32)))
# Out[6]: array([-23.33333333, -17.22222222, 15.55555556, 37.77777778])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was concerned about the case where the user intends to do minmax
scaling, but provides incorrect min
and max
values. However, I think since we already provide minmax
, it's OK to let this remain as-is, and perhaps just note that in the docstring (that we do no clipping to ensure range is [0, 1] internally).
mode : {'minmax', 'legacy', False} or tuple[float, float], optional | ||
Controls the rescaling behavior for `image`. | ||
|
||
``'minmax'`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minmax
vs min-max
, preference anyone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+0.5 for "minmax".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely prefer 'minmax'
too.
|
||
try: | ||
# Scale first, to avoid over-/underflowing float range | ||
out /= peak_to_peak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the most stable order? Small float - small float can be problematic.
Also, do we check that max value is >= max of image? Otherwise output will be >1. Same for lower bound.
template="Parameter `{deprecated_name}` is deprecated since version " | ||
"{deprecated_version} and will be removed in {changed_version} (or " | ||
"later). To avoid this warning, please use the parameters `threshold` " | ||
"together with the desired `prescale` mode instead. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe recommend prescale='minmax'
?
@@ -276,6 +296,29 @@ def blob_dog( | |||
`exclude_border`-pixels of the border of the image. | |||
If zero or False, peaks are identified regardless of their | |||
distance from the border. | |||
prescale : {'minmax', 'legacy', False} or tuple[float, float], optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of copying this docstring everywhere, refer to a central docstring? I.e., we could expose the utility function in the public namespace? Or maybe this becomes templated, and is filled out on import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal pet peeve are docstrings that are only useful at runtime. E.g., I find it really annoying when digging through SciPy's source code.
I'm also -1 on hiding the information from users behind a link for the same reason.
If you are worried about this getting out of sync. I'd rather have a decorator enforcing the consistency at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is going to appear all over the place, so if you replicate it, you are going to be doing a lot of search and replaces as we figure out the message. Thoughts @matthew-brett?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that Scipy's automated docstrings are pretty annoying - and I think I wrote some of them. I quite like the idea of enforcing the match between docstrings with a decorator. We could add a source processor to do it, as well, to be run if the decorator check fails.
Co-authored-by: Matthew Brett <matthew.brett@gmail.com>
Description
Implementation of the new strategy around value range scaling in our library that we extensively discussed in the recent sprint and meeting. We will add more detailed descriptions of this strategy but in short:
prescale
parameter for functions that actually need / benefit from prescaling the input image.Checklist
./doc/examples
for new featuresRelease note
For maintainers and optionally contributors, please refer to the instructions on how to document this PR for the release notes.