HuggingFaceModel does not properly accept script mode environment variables

**Describe the bug**

While Model / FrameworkModel's [prepare_container_def()](https://github.com/aws/sagemaker-python-sdk/blob/ace07d72f4f44c43fe95b05574968decc7e806ac/src/sagemaker/model.py#L483) supports ([here](https://github.com/aws/sagemaker-python-sdk/blob/ace07d72f4f44c43fe95b05574968decc7e806ac/src/sagemaker/model.py#L513)) manually configuring script mode environment variables for an existing `model.tar.gz` package, HuggingFaceModel's [override implementation](https://github.com/aws/sagemaker-python-sdk/blob/ace07d72f4f44c43fe95b05574968decc7e806ac/src/sagemaker/huggingface/model.py#L420) does not ([here](https://github.com/aws/sagemaker-python-sdk/blob/ace07d72f4f44c43fe95b05574968decc7e806ac/src/sagemaker/huggingface/model.py#L457)). User-configured `env={ "SAGEMAKER_PROGRAM", "SAGEMAKER_SUBMIT_DIRECTORY", ...}` are ignored **regardless of whether re-packing of new entrypoint code is requested**.

This is important for importing large (multi-GB) pre-trained models to SageMaker inference, because it forces us to use the SDK class' re-packing functionality to add inference code... Which is significantly slower in some cases: Can reach tens of minutes extra delay. 

**To reproduce**

- Prepare a `model.tar.gz` in S3, already containing a `code/inference.py` alongside (whatever) model artifacts. For a simple reproduction, could use no model artifacts at all - and add trivial custom model loader to inference.py something like `model_fn(data_dir): return lambda x: x`.

In my current use case, my model artifacts are about 5GB and constructing/uploading this archive takes ~10min - regardless of whether the small script code is included.

- Create and deploy a Hugging Face Model from the archive on S3 via SageMaker Python SDK, indicating what code directory and entry point should be used:

```python
model = HuggingFaceModel(
    model_data = "s3://.../model.tar.gz",  # (Contains code/inference.py)  
    role=sagemaker.get_execution_role(),
    py_version="py38",
    pytorch_version="1.10",
    transformers_version="4.17",
    env={
        "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
        "SAGEMAKER_PROGRAM": "inference.py",
        "SAGEMAKER_REGION": "ap-southeast-1",
        "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    },
)
predictor = model.deploy(instance_type="ml.g4dn.xlarge", initial_instance_count=1)
```

**Observed behavior**

The endpoint will fail to find the inference.py entry point (and therefore will not correctly use the `model_fn()` and fail to load).

This is because the `HuggingFaceModel` overrides the `SAGEMAKER_PROGRAM` and `SAGEMAKER_SUBMIT_DIRECTORY` environment variables to empty even though no `entry_point` or `source_dir` are provided.

**Expected behavior**

The `HuggingFaceModel` should correctly propagate the user-specified environment variables, to support using a pre-prepared `model.tar.gz` without re-packing. In this case, the container would find the pre-loaded `inference.py` entry point and correctly use the override `model_fn`.

**Screenshots or logs**

N/A

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: 2.92.1
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**: HuggingFace
- **Framework version**: 4.17
- **Python version**: py38
- **CPU or GPU**: GPU
- **Custom Docker image (Y/N)**: N

**Additional context**

I am able to deploy a working endpoint by having my `code` folder and `inference.py` locally and adding these options to the model: `HuggingFaceModel(source_dir="code", entry_point="inference.py", ...)`.

The problem is this >doubles the time and resources taken to prepare the package:

- 10min to produce an initial "model-raw.tar.gz" and load to S3
- >10min for the SageMaker SDK to download that archive, extract and re-pack it to add `code` folder, and re-upload to a new location

Since the use case here is just to prepare the model from local artifacts+code, it would also be OK if `model_data` was able to accept a local, uncompressed folder: As the 10min tarball creation would still only need to be done once. From my tests though, this doesn't seem to be possible?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFaceModel does not properly accept script mode environment variables #3361

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HuggingFaceModel does not properly accept script mode environment variables #3361

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions