-
Notifications
You must be signed in to change notification settings - Fork 873
Description
Hey! We're using metaflow.S3
in a standalone context where it's being used as the backend to a Pytorch dataset, running within a k8s cluster. From the docs:
Note that in most cases when the data fits in memory, no local disk IO is needed as operations are cached by the operating system, which makes operations fast as long as there is enough memory available.
The memory usage on the associated pod appears to grow without bound, until it hits our cluster's alarm threshold at which point it is OOMkilled. Is there a way to manually set a memory limit on the S3 client, and have it spill over to disk past a certain threshold, or even better, if the cache could be tiered s.t. data is shuffled between disk and memory on a least-recently-used basis?
I noticed the existence of a @resource
decorator, but it's not clear to me where this would be placed if the S3 client is used outside of a Metaflow environment (in this case it's a Ray job).