Machine Learning Cheat Sheet Github



Git is the free and open source distributed version control system that's responsible for everything GitHub related that happens locally on your computer. This cheat sheet features the most important and commonly used Git commands for easy reference. INSTALLATION & GUIS With platform specific installers for Git, GitHub also provides the. This cheatsheet covers the key concepts, illustrations, otpimisaton program and limitations for the most common types of algorithms. Don’t hesitate to drop a comment! We’ll cover: Principal Component Analysis (PCA) Kernel PCA. Factor Analysis.

Concepts#

AzureML provides two basic assets for working with data:

  • Datastore
  • Dataset

Datastore#

Provides an interface for numerous Azure Machine Learning storage accounts.

Each Azure ML workspace comes with a default datastore:

ws = Workspace.from_config()

which can also be accessed directly from the Azure Portal (under the sameresource group as your Azure ML Workspace).

Datastores are attached to workspaces and are used to store connection information to Azure storage services so you can refer to them by name and don't need to remember the connection information and secret used to connect to the storage services.

Use this class to perform management operations, including register, list, get, and remove datastores.

Dataset#

A dataset is a reference to data - either in a datastore or behind a public URL.

Datasets provide enhaced capabilities including data lineage (with the notion of versioned datasets).

Get Datastore#

Default datastore#

Each workspace comes with a default datastore.

Register datastore#

Connect to, or create, a datastore backed by one of the multipleabove for more details.

Read from Datastore#

Reference data in a Datastore in your code, for example to use in a remote setting.

DataReference#

First, connect to your basic assets: Workspace, ComputeTarget and Datastore.

ws: Workspace = Workspace.from_config()
Machine
compute_target: ComputeTarget = ws.compute_targets['<compute-target-name>']

Create a DataReference, either as mount:

data_ref = ds.path('<path/on/datastore>').as_mount()

or as download:

data_ref = ds.path('<path/on/datastore>').as_download()

To mount a datastore the workspace need to have read and write access to the underlying storage. For readonly datastore as_download is the only option.

Consume DataReference in ScriptRunConfig#

Add this DataReference to a ScriptRunConfig as follows.

source_directory='.',
Machine learning cheat sheet github free
arguments=[str(data_ref)],# returns environment variable $AZUREML_DATAREFERENCE_example_data

Github Cheat Sheet

)
config.run_config.data_references[data_ref.data_reference_name]= data_ref.to_config()

The command-line argument str(data_ref) returns the environment variable $AZUREML_DATAREFERENCE_example_data.Finally, data_ref.to_config() instructs the run to mount the data to the compute target and to assign theabove environment variable appropriately.

Without specifying argument#

Specify a path_on_compute to reference your data without the need for command-line arguments.

data_ref = ds.path('<path/on/datastore>').as_mount()
source_directory='.',
compute_target=compute_target,
config.run_config.data_references[data_ref.data_reference_name]= data_ref.to_config()

Create Dataset#

From local data#

Upload to datastore#

To upload a local directory ./data/:

datastore.upload(src_dir='./data', target_path='<path/on/datastore>', overwrite=True)

This will upload the entire directory ./data from local to the default datastore associatedto your workspace ws.

Create dataset from files in datastore#

To create a dataset from a directory on a datastore at <path/on/datastore>:

dataset = Dataset.File.from_files(path=(datastore,'<path/on/datastore>'))

Use Dataset#

ScriptRunConfig#

To reference data from a dataset in a ScriptRunConfig you can either mount or download thedataset using:

  • dataset.as_mount(path_on_compute) : mount dataset to a remote run
  • dataset.as_download(path_on_compute) : download the dataset to a remote run

Path on compute Both as_mount and as_download accept an (optional) parameter path_on_compute.This defines the path on the compute target where the data is made available.

  • If None, the data will be downloaded into a temporary directory.
  • If path_on_compute starts with a / it will be treated as an absolute path. (If you havespecified an absolute path, please make sure that the job has permission to write to that directory.)
  • Otherwise it will be treated as relative to the working directory

Reference this data in a remote run, for example in mount-mode:

Machine
config = ScriptRunConfig(source_directory='.', script='train.py', arguments=arguments)

and consumed in train.py:

data_dir = sys.argv[1]
print(' DATA ')
print('LIST FILES IN DATA DIR...')

Machine Learning Cheat Sheet Github Download

print(')

Machine Learning Cheat Sheet Github Pdf

For more details: ScriptRunConfig