Understanding Azure HPC | InfoWorld

Method again when, so the story goes, somebody stated we’d solely want 5 computer systems for the entire world. It’s fairly straightforward to argue that Azure, Amazon Internet Companies, Google Cloud Platform, and the like are all implementations of a massively scalable compute cluster, with every server and every information middle one other part that provides as much as construct an enormous, planetary-scale pc. The truth is, lots of the applied sciences that energy our clouds had been initially developed to construct and run supercomputers utilizing off-the-shelf commodity {hardware}.

Why not make the most of the cloud to construct, deploy, and run HPC (high-performance computing) programs that exist for under so long as we want them to resolve issues? You may consider clouds in a lot the identical approach the filmmakers at Weta Digital thought of their render farms, server rooms of {hardware} constructed out to be able to ship the CGI results for movies like King Kong and The Hobbit. The tools doubled as a brief supercomputer for the New Zealand authorities whereas ready for use for filmmaking.

The primary huge case research of the general public clouds targeted on this functionality, utilizing them for burst capability that previously might need gone to on-premises HPC {hardware}. They confirmed a substantial price saving without having to put money into information middle area, storage, and energy.

Introducing Azure HPC

HPC capabilities stay an vital function for Azure and different clouds, not counting on commodity {hardware} however now providing HPC-focused compute situations and dealing with HPC distributors to supply their instruments as a service, treating HPC as a dynamic service that may be launched rapidly and simply whereas having the ability to scale together with your necessities.

Azure’s HPC instruments can maybe greatest be regarded as a set of architectural rules, targeted on delivering what Microsoft describes as “huge compute.” You’re making the most of the size of Azure to carry out large-scale mathematical duties. A few of these duties is perhaps huge information duties, whereas others is perhaps extra targeted on compute, utilizing a restricted variety of inputs to carry out a simulation, as an example. These duties embody creating time-based simulations utilizing computational fluid dynamics, or operating by means of a number of Monte Carlo statistical analyses, or placing collectively and operating a render farm for a CGI film.

Azure’s HPC options are supposed to make HPC out there to a wider class of customers who might not want a supercomputer however do want a better degree of compute than an engineering workstation or perhaps a small cluster of servers can present. You received’t get a turnkey HPC system; you’ll nonetheless must construct out both a Home windows or Linux cluster infrastructure utilizing HPC-focused digital machines and an applicable storage platform, in addition to interconnects utilizing Azure’s high-throughput RDMA networking options.

Constructing an HPC structure within the cloud

Applied sciences equivalent to ARM and Bicep are key to constructing out and sustaining your HPC setting. It’s not like Azure’s platform providers, as you might be answerable for most of your individual upkeep. Having an infrastructure-as-code foundation in your deployments ought to make it simpler to deal with your HPC infrastructure as one thing that may be constructed up and torn down as crucial, with similar infrastructures every time you deploy your HPC service.

Microsoft supplies several different VM types for HPC workloads. Most applications will use the H-series VMs which are optimized for CPU-intensive operations, much like those you’d expect from computationally demanding workloads focused on simulation and modelling. They’re hefty VMs, with the HBv3 series giving you as many as 120 AMD cores and 448GB of RAM; a single server costs $9.12 an hour for Windows or $3.60 an hour for Ubuntu. An Nvidia InfiniBand network helps build out a low-latency cluster for scaling. Other options offer older hardware for lower cost, while smaller HC and H-series VMs use Intel processors as an alternative to AMD. If you need to add GPU compute to a cluster, some N-series VMs offer InfiniBand connections to help build out a hybrid CPU and GPU cluster.

It’s important to note that not all H-series VMs are available in all Azure regions, so you may need to choose a region away from your location to find the right balance of hardware for your project. Be prepared to budget several thousand dollars a month for large projects, especially when you add storage and networking. On top of VMs and storage, you’re likely to need a high-bandwidth link to Azure for data and results.

Once you’ve chosen your VMs, you need to pick an OS, a scheduler, and a workload manager. There are many different options in the Azure Marketplace, or if you prefer, you can deploy a familiar open source solution. This approach makes it relatively simple to bring existing HPC workloads to Azure or build on existing skill sets and toolchains. You even have the option of working with cutting-edge Azure services like its growing FPGA support. There’s also a partnership with Cray that delivers a managed supercomputer you can spin up as needed, and well-known HPC applications are available from the Azure Marketplace, simplifying installation. Be prepared to bring your own licenses where necessary.

Managing HPC with Azure CycleCloud

You don’t have to build an entire architecture from scratch; Azure CycleCloud is a service that helps manage both storage and schedulers, giving you an environment to manage your HPC tools. It’s perhaps best compared to tools like ARM, as it’s a way to build infrastructure templates that focus on a higher level than VMs, treating your infrastructure as a set of compute nodes and then deploying VMs as necessary, using your choice of scheduler and providing automated scaling.

Everything is managed through a single pane of glass, with its own portal to help control your compute and storage resources, integrated with Azure’s monitoring tools. There’s even an API where you can write your own extensions to add additional automation. CycleCloud isn’t part of the Azure portal, it installs as a VM with its own web-based UI.

Big compute with Azure Batch

Although most of the Azure HPC tools are infrastructure as a service, there is a platform option in the shape of Azure Batch. This is designed for intrinsically parallel workloads, like Monte Carlo simulations, where each part of a parallel application is independent of every other part (though they may share data sources). It’s a model suitable for rendering frames of a CGI movie or for life sciences work, for example analyzing DNA sequences. You provide software to run your task, built to the Batch APIs. Batch allows you to use spot instances of VMs where you’re cost sensitive but not time dependent, running your jobs when capacity is available.

Not every HPC job can be run in Azure Batch, but for the ones that can, you get interesting scalability options that help keep costs to a minimum. A monitor service helps manage Batch jobs, which may run several thousand instances at the same time. It’s a good idea to prepare data in advance and use separate pre- and post-processing applications to handle input and output data.

Using Azure as a DIY supercomputer makes sense. H-series VMs are powerful servers that provide plenty of compute capability. With support for familiar tools, you can migrate on-premises workloads to Azure HPC or build new applications without having to learn a whole new set of tools. The only real question is economical: Does the cost of using on-demand high-performance computing justify switching away from your own data center?

Copyright © 2022 IDG Communications, Inc.

Source link

Leave a Reply