Provision the Microsoft Data Science Virtual Machine

The Microsoft Data Science Virtual Machine is a Windows Azure virtual machine (VM) image pre-installed and configured with several popular tools that are commonly used for data analytics and machine learning. The tools included are:

Microsoft R Server Developer Edition
Anaconda Python distribution
Jupyter notebook (with R, Python kernels)
Visual Studio Community Edition
Power BI desktop
SQL Server 2016 Developer Edition
Machine learning and Data Analytics tools
- Computational Network Toolkit (CNTK): A deep learning software toolkit from Microsoft Research.
- Vowpal Wabbit: A fast machine learning system supporting techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
- XGBoost: A tool providing fast and accurate boosted tree implementation.
- Rattle (the R Analytical Tool To Learn Easily): A tool that makes getting started with data analytics and machine learning in R easy, with GUI-based data exploration, and modeling with automatic R code generation.
- mxnet: a deep learning framework designed for both efficiency and flexibility
- Weka : A visual data mining and machine learning software in Java.
- Apache Drill: A schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage. Supports ODBC and JDBC interfaces to enable querying NoSQL and files from standard BI tools like PowerBI, Excel, Tableau.
Libraries in R and Python for use in Azure Machine Learning and other Azure services
Git including Git Bash to work with source code repositories including GitHub, Visual Studio Team Services
Windows ports of several popular Linux command-line utilities (including awk, sed, perl, grep, find, wget, curl etc) accessible through command prompt.

Doing data science involves iterating on a sequence of tasks:

Finding, loading, and pre-processing data
Building and testing models
Deploying the models for consumption in intelligent applications

Data scientists use a variety of tools to complete these tasks. It can be quite time consuming to find the appropriate versions of the software, and then download and install them. The Microsoft Data Science Virtual Machine can ease this burden by providing a ready-to-use image that can be provisioned on Azure with all several popular tools pre-installed and configured.

The Microsoft Data Science Virtual Machine jump-starts your analytics project. It enables you to work on tasks in various languages including R, Python, SQL, and C#. Visual Studio provides an IDE to develop and test your code that is easy to use. The Azure SDK included in the VM allows you to build your applications using various services on Microsoft’s cloud platform.

There are no software charges for this data science VM image. You only pay for the Azure usage fees which dependent on the size of the virtual machine you provision. More details on the compute fees can be found in the Pricing details section on the Data Science Virtual Machine page.

Other Versions of the Data Science Virtual Machine

A CentOS image is also available, with many of the same tools as the Windows image. An Ubuntu image is available as well, with many similar tools plus deep learning frameworks.

Prerequisites

Before you can create a Microsoft Data Science Virtual Machine, you must have the following:

An Azure subscription: To obtain one, see Get Azure free trial.
An Azure storage account: To create one, see Create an Azure storage account. Alternatively, the storage account can be created as part of the process of creating the VM if you do not want to use an existing account.

Create your Microsoft Data Science Virtual Machine

Here are the steps to create an instance of the Microsoft Data Science Virtual Machine:

Navigate to the virtual machine listing on Azure portal.
Select the Create button at the bottom to be taken into a wizard.
The wizard used to create the Microsoft Data Science Virtual Machine requires inputs for each of the five steps enumerated on the right of this figure. Here are the inputs needed to configure each of these steps:
1. Basics
  1. Name: Name of your data science server you are creating.
  2. User Name: Admin account login id.
  3. Password: Admin account password.
  4. Subscription: If you have more than one subscription, select the one on which the machine is to be created and billed.
  5. Resource Group: You can create a new one or use an existing group.
  6. Location: Select the data center that is most appropriate. Usually it is the data center that has most of your data or is closest to your physical location for fastest network access.
2. Size: Select one of the server types that meets your functional requirement and cost constraints. You can get more choices of VM sizes by selecting “View All”.
3. Settings:
  1. Disk Type: Choose Premium if you prefer a solid-state drive (SSD), else choose “Standard”.
  2. Storage Account: You can create a new Azure storage account in your subscription or use an existing one in the same Location that was chosen on the Basics step of the wizard.
  3. Other parameters: Usually you just use the default values. You can hover over the informational link for help on the specific fields in case you want to consider the use of non-default values.
4. Summary: Verify that all information you entered is correct.
5. Buy: Click Buy to start the provisioning. A link is provided to the terms of the transaction. The VM does not have any additional charges beyond the compute for the server size you chose in the Size step.

Search This Blog

Designersviewdata