I recently ran some experiments with llama.cpp on serverless platforms and hit an interesting issue with hardware incompatibility. The Docker images I built worked fine locally but failed in the cloud, throwing Illegal Instruction errors. Turns out, while my local machine supports AVX512, the cloud hardware doesn’t.
This got me thinking—what kind of hardware is really powering these "serverless" environments? While serverless abstracts away infrastructure, you’re still relying on physical servers. And when you're running resource-heavy applications, like machine learning models, those details matter.
I wanted to find out exactly what hardware was backing popular serverless platforms, so I set up an experiment across AWS Lambda, Google Cloud Run, and Azure Container Apps using Nitric, which lets you deploy the same code across multiple serverless environments.
The Experiment
The goal was simple: figure out the CPU capabilities of these platforms. I wrote a small app to inspect /proc/cpuinfo
, which shows detailed CPU info, and deployed it to each cloud provider.
Disclaimer: I'm aware that just reading /proc/cpuinfo
is a highly naive approach to retrieving CPU information at runtime on these platforms. In a datacentre setting it's entirely possible that the reported CPU information differs from the actual hardware that an instance is running on due to the types of virtualization that might be occurring in order to keep instruction sets homogenous across mixed hardware clusters.
That said, I'm more curious about the capabilities flagged in each of these platforms so if they're being limited to a lowest common denominator instruction set then this is still relevant information and I'll be taking the results at face value.
All deployments were in Sydney, Australia 🦘 as that's where I intended to run my workloads. Your results may differ across clouds/regions. Try it for yourself and let me know what you find.
Here’s the code snippet I used (Node.js with Nitric):
import { api } from '@nitric/sdk'import { exec } from 'node:child_process'const helloApi = api('main')helloApi.get('/cpus', async (ctx) => {ctx.res.body = await new Promise((resolve) => {exec('cat /proc/cpuinfo', (error, stdout, stderr) => {if (error) {ctx.res.status = 400resolve(stderr)} else {resolve(stdout)}})})})
AWS Lambda
Deploying this to AWS Lambda in ap-southeast-2, the CPU info revealed something interesting: AWS is running Intel Xeon Haswell processors, likely in the Intel Xeon E5-2600 v3 series, an architecture from 2013. Lambda was released in 2014, so it's highly possible that AWS hasn’t upgraded Lambda’s hardware across the board.
Here’s a snippet of what /proc/cpuinfo
showed:
vendor_id : GenuineIntelcpu family : 6model : 63model name : Intel(R) Xeon(R) Processor @ 2.50GHzcpu MHz : 2499.988flags : avx2 fma sse4_1 sse4_2 ...
I was unable to run any tooling that could determine the MT/s of the memory, but based on the CPU we can assume that it's likely clocked at 1600/1866 MT/s (DDR4). Given Haswell is nearly 11 years old and AWS Lambda is 10 (being released in Nov. 2014), we might assume the platform started with this hardware. However, announcements such as adding AVX2 support just 4 years ago (Nov. 2020) suggest AWS Lambda may have transitioned to these Haswell CPUs from older models.
Whatever the case might be, the net result is that these CPUs support AVX2, but anything beyond that (like AVX512) is a no-go.
Google Cloud Run
Google Cloud Run, on the other hand, provided mixed results. The first deployment indicated Skylake-SP Xeons (2017), while a subsequent deployment returned a CPU family from an older architecture. This variation isn't shocking given how virtualization works—resources are shared and you may not always get the same exact hardware.
Here are snippets of the values I retrieve from /proc/cpuinfo
:
vendor_id : GenuineIntelcpu family : 6model : 85model name : unknowncpu MHz : 2560.807flags : avx avx2 avx512f avx512dq sse4_1 sse4_2 ...
vendor_id : GenuineIntelcpu family : 6model : 79model name : unknowncpu MHz : 2199.999flags : avx avx2 sse4_1 sse4_2 ...
Google Cloud Run was released to Alpha in 2018, a year after the Beta release of Google Cloud Functions (2017) and 4 years after AWS Lambda. So it makes sense that the CPUs are slightly more modern.
Skylake-SP offers AVX512 support, which would make Google Cloud Run more capable for workloads requiring advanced instruction sets. However, the inconsistency of the results makes it hard to rely on these capabilities being present.
Azure Container Apps
Somewhat unsurprisingly, Azure Container Apps provided the most modern hardware: AMD EPYC 7763 processors from 2021. These CPUs are significantly newer than what AWS and Google are offering on their respective platforms.
Again, here is a snippet of what was available from /proc/cpuinfo
:
vendor_id : AuthenticAMDcpu family : 25model : 1model name : AMD EPYC 7763 64-Core Processorcpu MHz : 2445.426flags : avx avx2 sse4_1 sse4_2 ...
It's cool to see we got an AMD EPYC on Azure Container Apps and surprisingly no obfuscation of the full model name. Azure seems to provide the most modern CPUs when up against comparable serverless platforms, I might follow up on this in future to check out Azure Functions as well, although these can also be hosted using the Container Apps platform.
It's not surprising the hardware for Container Apps is more modern considering the platform is the newest of the 3 tested.
Results for Container Apps were consistent for the region I was deploying to, so it looks like the hardware is homogenous. I expect these CPU capabilities can be relied upon. However, given they're AMD CPUs the AVX512 instruction set was understandably absent.
Conclusion
This experiment revealed some interesting differences in the underlying hardware of serverless platforms. AWS Lambda runs on older Intel Haswell CPUs, Google Cloud Run offers slightly newer Skylake-SP processors with some CPU mixing, and Azure Container Apps are powered by more recent AMD EPYC processors.
For those looking to run workloads like LLMs in serverless environments with on-device models such as those from the llama 3.2 family, understanding these hardware differences is crucial. If you'd like to try this out yourself, you can view the application I used here.
Give it a try, you might get different results across your deployments! If you do, let me know @timjholm.
Checkout the latest posts
Hey ops people, you’re doing it wrong. (But we can fix it!)
Challenging operations teams on a commonly made mistake
Cloud SDKs Can Chain You Down
With the right level of abstraction over cloud resources, we can separate the responsibilities and concerns of developers and operations teams.
Polyglot projects made easy with Nitric
Nitric makes building cloud applications in multiple languages a breeze
Get the most out of Nitric
Ship your first app faster with Next-gen infrastructure automation