Troubleshoot unsupported program type errors

This page describes how to resolve a known issue in Cloud Data Fusion 6.8.0 and 6.8.1 where a data pipeline fails with an unsupported program type error in Cloud Data Fusion. This issue is resolved in version 6.8.2.

To reduce the start time for pipelines, Cloud Data Fusion version 6.8.0 and 6.8.1 instances cache the artifacts that are required to start a pipeline in a Dataproc cluster inside a Cloud Storage bucket. One of these cached artifacts is application.jar. Depending on the order in which you run your pipelines, some pipelines might fail with the following error:

Unsupported program type: Spark

For example, after you create a new 6.8.1 instance (or upgrade to 6.8.1), the first time that you run a pipeline that only contains actions, it succeeds. However, the next pipeline runs, which include sources or sinks, might fail with this error.

Recommendation

To resolve this issue, do either of the following:

Recommended: Upgrade the instance to Cloud Data Fusion version 6.8.2 or later.
Disable Cloud Storage caching by a preference or runtime argument.

You can disable caching for any of the following:

For all pipelines in an instance.
For a given namespace.
For the specific Dataproc profiles that contain the failing pipelines.
For only the failing pipelines.

Disable Cloud Storage caching for all pipelines in an instance

To disable Cloud Storage caching for all pipelines in an instance, follow these steps:

Console

Go to your instance:
1. In the Google Cloud console, go to the Cloud Data Fusion page.
2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.
  
  Go to Instances
Click System Admin > System Preferences and set the value for system.profile.properties.gcsCacheEnabled to false.

REST API

To set system.profile.properties.gcsCacheEnabled to false, see Set preferences.

Disable Cloud Storage caching for a given namespace

To disable Cloud Storage caching for a given namespace, follow these steps:

Console

Go to your instance:
1. In the Google Cloud console, go to the Cloud Data Fusion page.
2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.
  
  Go to Instances
Click System Admin > Namespaces and select your namespace.
Click Preferences > Edit and set the value for system.profile.properties.gcsCacheEnabled to false.

REST API

To set this through the REST API, see Set preferences.

Disable Cloud Storage caching for a Dataproc profile

To disable Cloud Storage caching for the specific Dataproc profiles that contain the failing pipelines, follow these steps:

Console

Set gcsCacheEnabled to false in the Dataproc profile.

Disable Cloud Storage caching for only the failing pipelines

To disable Cloud Storage caching for only the failing pipelines, follow these steps:

Console

Go to your instance:
1. In the Google Cloud console, go to the Cloud Data Fusion page.
2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.
  
  Go to Instances
Click List and select the failing pipeline.
Click Expand next to Run and set the runtime argument system.profile.properties.gcsCacheEnabled to false.
Repeat for any other failing pipelines.

Runtime dialog

REST API

Cloud Storage caching can be disabled when starting a pipeline through REST API and also by optionally specifying runtime arguments as a JSON map in the request body. For more information, see Start a program.

Troubleshoot unsupported program type errors Stay organized with collections Save and categorize content based on your preferences.

Recommendation

Disable Cloud Storage caching for all pipelines in an instance

Console

REST API

Disable Cloud Storage caching for a given namespace

Console

REST API

Disable Cloud Storage caching for a Dataproc profile

Console

Disable Cloud Storage caching for only the failing pipelines

Console

REST API

Troubleshoot unsupported program type errors