Microsoft Build 2024: A Look at the Latest AI Innovations from the Tech Giant

Arva Rangwala

Microsoft held its annual Build developer conference this week in Seattle, unveiling a slew of new artificial intelligence (AI) tools, features and services. The company is clearly doubling down on integrating AI capabilities into virtually every product and service it offers, both for consumers and businesses.

From intelligent virtual assistants to AI-powered coding tools, generative AI models to on-device AI chips, Microsoft laid out its ambitious “AI everywhere” vision at Build 2024. Here’s a breakdown of the major AI announcements and what they could mean for you.

AI Agents as Virtual Employees

One of the headlining AI announcements was around Microsoft’s Copilot AI agents, which the company says can soon be used like virtual employees within businesses. These AI assistants will be able to monitor emails, carry out automated tasks, help with employee onboarding, and even do data entry — all without being explicitly prompted.

How does it work? The AI agents use large language models trained on vast datasets to understand natural language inputs and respond accordingly. They can be customized for different roles and workflows within an organization.

For example, an HR Copilot agent could automatically respond to common employee questions, schedule interviews and onboard new hires. A sales Copilot could draft email pitches, update customer databases, and set follow-up reminders.

Microsoft says the virtual AI employees won’t be taking over human jobs entirely, but rather handling the “boring” administrative tasks to boost productivity. The Copilot agent abilities will roll out to Copilot Studio later this year.

Multimodal AI Model for Mobile Devices

Another key AI announcement was Phi-3-vision, a new “multimodal” AI model that can understand both text and visual inputs like images. What makes Phi-3-vision unique is that it is a “small language model” designed to run efficiently on mobile devices and Internet of Things (IoT) gadgets with limited computing power.

Despite being a pared-down model, Phi-3-vision can perform tasks like transcribing text from images, analyzing charts and tables, and providing visual explanations. Microsoft touts its capabilities in areas like education (e.g. tutoring for math word problems) and creative workflows.

The “multimodal” aspect means the AI can make connections between different data types like text, images, and numerical values. For example, it could look at a graph, understand the context from the caption, and summarize the key insights in clear language.

Multimodal AI models are an emerging field, with tech giants racing to develop versions that combine different AI skills like vision, language, reasoning and generation into a single framework. The goal is to create AI assistants that can more closely mimic human intelligence across multiple domains.

Coding Assistance with AI Co-Pilots

For developers, Microsoft announced new AI integrations for its popular GitHub code repository platform. The GitHub Copilot feature now extends to Azure cloud services, allowing coders to provision Azure resources using simple voice commands or text instructions.

There are also new extensions to customize the GitHub Copilot AI agent’s behavior using third-party tools like Docker, enabling capabilities like automated code testing.

These coding co-pilot AI tools use large language models trained on billions of lines of code to generate suggestions for completing tasks, explaining errors, writing documentation and more. The AI assistants learn from each developer’s code patterns over time to provide personalized recommendations.

Microsoft says AI coding co-pilots save programmers significant time and effort. Instead of constantly searching for solutions online, the AI brings relevant guidance right into the integrated development environment (IDE).

Real-Time Video Translation With AI

For consumers, Microsoft highlighted how its Edge web browser will soon gain real-time video translation powered by AI language models.

With this feature, Edge will be able to provide live dubbed translations while you watch videos on platforms like YouTube, effectively removing the language barrier. It will launch supporting translation between English and several major languages like Spanish, German, Hindi, Italian and Russian.

The video translation utilizes automatic speech recognition to transcribe the spoken words, which are then translated to the target language by a large language model. The translated text is finally rendered as synthesized speech in the new language, playing over the original audio.

While live dubbing AI isn’t new, bringing it into mainstream web browsers could make multilingual videos vastly more accessible for casual users. Microsoft plans to expand language support over time.

Custom Copilot Agents for Enterprises

Building on its enterprise AI vision, Microsoft announced capabilities for businesses to create custom Copilot agents tailored to their specific needs.

For example, a customer support Copilot agent could be trained on a company’s product documentation and knowledge base articles, while a sales operations Copilot could learn from CRM data, contract templates and so on.

These custom agents will utilize the same large language models as their generalized counterparts, but further fine-tuned on an organization’s proprietary data. The idea is to create AI assistants that deeply understand the business context.

Microsoft says these bespoke Copilot agents will help streamline operations while ensuring proper security, compliance and data privacy controls are in place. The custom agent building features will hit the Copilot Studio platform later in 2024.

New AI Hardware: Copilot+ PCs and Devkits

On the hardware front, Microsoft introduced Copilot+ PCs powered by the latest AI accelerator chips from Intel, Qualcomm and AMD. These Windows laptops and desktops integrate dedicated artificial intelligence processors from companies like Nvidia to run AI models directly on the device.

Apart from being faster for AI workloads, Copilot+ PCs unlock new experiences like an AI-powered “photographic memory” feature called Recall. It allows you to retrieve information on everything you’ve seen or done on the device using a searchable, screenshot-based visual history.

The Copilot+ PCs will start at $999 and go on sale from Microsoft and its OEM partners in June.

Microsoft also showcased the Snapdragon Dev Kit, a compact $899 PC aimed at developers to build native Arm64 apps for Windows 11 using the power-efficient Snapdragon X Elite chip.

Both products highlight Microsoft’s strategy of pursuing AI acceleration on the silicon level, optimizing chips for faster AI processing on everything from mobile devices to high-end desktops.

OpenAI Partnership Takes Center Stage

With offerings heavily based on OpenAI’s language models like GPT-4, Microsoft reinforced its “most strategic” partnership with the AI research firm.

This was accentuated by a surprise on-stage appearance from OpenAI CEO Sam Altman at the Build keynote. While no groundbreaking revelations were announced, Altman emphasized the company’s goal of delivering ever more capable AI models with improved speed, cost, safety and multimodal capabilities.

OpenAI has faced some controversy recently around an AI voice mimicking the actress Scarlett Johansson without her consent. However, the topic did not come up during Altman’s appearance.

AI Startups Pitch Their Innovations

Apart from the AI product launches, Microsoft brought several AI startups to the Build stage to showcase their innovations. These included:

  • A demonstration of QWAO.io’s physics engine that can translate natural language into 3D animated scenes
  • DoOne’s AI chatbot designed to automate sales activities like lead qualification and follow-ups
  • Simulated game environments from Panacea游 for testing AI systems’ understanding and reasoning
  • Valen Analytics’ AI assistant for medical coding using natural language processing

The emerging AI ecosystem is rapidly expanding, with Microsoft keen to highlight its role in enabling these startups to build on Azure through programs like the AI Incubation Initiative.

Responsible AI, but Privacy Concerns Remain

While the AI announcements were undoubtedly compelling, the challenges around responsible AI development and data privacy concerns remain.

Microsoft reiterated its focus on ethical AI principles through initiatives like Azure AI Content Safety tools for detecting offensive content and expanding AI governance capabilities.

However, features like Windows Recall raised eyebrows over privacy and the idea of an omniscient AI indexing all user activity. Microsoft says all processing is on-device and users can control the data, but such capabilities will inevitably fuel debates around surveillance capitalism and consent.

Looking Ahead: AI Arms Race Intensifies

Microsoft Build 2024 made it clear the tech giants like Microsoft, Google, and OpenAI are locked in an AI arms race as they vie to become the leaders in this transformative technology.

The battle lines seem drawn around developing ever more sophisticated AI models, from large ones that can power intelligent cloud services down to smaller, mobile-friendly versions for on-device experiences.

Whether it’s intelligent coding assistants, multimodal AI, or virtual AI employees, Microsoft is betting big that embedding AI smarts into its products and platforms will be the key differentiator.

Share This Article
Leave a comment