-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
While Model / FrameworkModel's prepare_container_def() supports (here) manually configuring script mode environment variables for an existing model.tar.gz package, HuggingFaceModel's override implementation does not (here). User-configured env={ "SAGEMAKER_PROGRAM", "SAGEMAKER_SUBMIT_DIRECTORY", ...} are ignored regardless of whether re-packing of new entrypoint code is requested.
This is important for importing large (multi-GB) pre-trained models to SageMaker inference, because it forces us to use the SDK class' re-packing functionality to add inference code... Which is significantly slower in some cases: Can reach tens of minutes extra delay.
To reproduce
- Prepare a
model.tar.gzin S3, already containing acode/inference.pyalongside (whatever) model artifacts. For a simple reproduction, could use no model artifacts at all - and add trivial custom model loader to inference.py something likemodel_fn(data_dir): return lambda x: x.
In my current use case, my model artifacts are about 5GB and constructing/uploading this archive takes ~10min - regardless of whether the small script code is included.
- Create and deploy a Hugging Face Model from the archive on S3 via SageMaker Python SDK, indicating what code directory and entry point should be used:
model = HuggingFaceModel(
model_data = "s3://.../model.tar.gz", # (Contains code/inference.py)
role=sagemaker.get_execution_role(),
py_version="py38",
pytorch_version="1.10",
transformers_version="4.17",
env={
"SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
"SAGEMAKER_PROGRAM": "inference.py",
"SAGEMAKER_REGION": "ap-southeast-1",
"SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
},
)
predictor = model.deploy(instance_type="ml.g4dn.xlarge", initial_instance_count=1)Observed behavior
The endpoint will fail to find the inference.py entry point (and therefore will not correctly use the model_fn() and fail to load).
This is because the HuggingFaceModel overrides the SAGEMAKER_PROGRAM and SAGEMAKER_SUBMIT_DIRECTORY environment variables to empty even though no entry_point or source_dir are provided.
Expected behavior
The HuggingFaceModel should correctly propagate the user-specified environment variables, to support using a pre-prepared model.tar.gz without re-packing. In this case, the container would find the pre-loaded inference.py entry point and correctly use the override model_fn.
Screenshots or logs
N/A
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.92.1
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): HuggingFace
- Framework version: 4.17
- Python version: py38
- CPU or GPU: GPU
- Custom Docker image (Y/N): N
Additional context
I am able to deploy a working endpoint by having my code folder and inference.py locally and adding these options to the model: HuggingFaceModel(source_dir="code", entry_point="inference.py", ...).
The problem is this >doubles the time and resources taken to prepare the package:
- 10min to produce an initial "model-raw.tar.gz" and load to S3
-
10min for the SageMaker SDK to download that archive, extract and re-pack it to add
codefolder, and re-upload to a new location
Since the use case here is just to prepare the model from local artifacts+code, it would also be OK if model_data was able to accept a local, uncompressed folder: As the 10min tarball creation would still only need to be done once. From my tests though, this doesn't seem to be possible?