SiriusXM EVP and CPO Joe Inzerillo Talks Satellite and Streaming Delivery and Personalization
SiriusXM EVP and chief product and technology officer Joe Inzerillo is probably the first streaming pro I’ve interviewed who also has building satellites in his job description. Inzerillo spent 16 years as a streaming CTO with MLB Advanced Media, BAMTECH Media, and Disney Streaming before crossing over to satellite radio with SiriusXM in 2022.
If you’re not a SiriusXM listener, you might know the company through its other audio service, Pandora. Either way, its media business gets 80% of its revenue from subscriptions on top of an extremely low churn rate. ChatGPT tells me this is a more profitable business than Spotify, but we’re here to talk tech.
Nadine Krefetz: What is your role at SiriusXM?
Joe Inzerillo: As chief product and technology officer, I’m responsible for product design, data science, data engineering, and all of the engineering from building satellites to modules that go in the car through all of the streaming products. Our consumer-facing apps are under my purview as well as broadcast and enterprise IT.
Krefetz: You’re essentially running both a streaming business and a linear broadcast business. Do any parts converge?
Inzerillo: We’re working on trying to make more of it converge. What is linear right now, in an hour could be an archive that’s available on demand. We want to get those to blur a lot more and allow people to navigate in between the two without worrying about if a show is live to you, plausibly live, really live, or a piece of on-demand content. The exceptions are obviously news and sports. A pure news show or a pure sporting event has less value in archive, very analogously to the way it works in the video world.
Krefetz: Are there any big problems you are solving on the data science/data engineering side that you can speak about?
Inzerillo: The ultimate, hardest challenge is the intersection between content understanding and people’s preferences. A video streaming content catalog is relatively small compared to the size of your userbase. Maybe you have 25–30,000 items of content and have 200 million people. The level of mapping is still challenging, but pretty straightforward, because there’s just not that many things you can get people into.
When you look at us, we’re more on the order of millions of pieces of content intersected by tens of millions of folks who want to listen to it. We have to understand much, much more about the content to park people in the right asset.
What we’ve done with the Pandora Music Genome is a great example of what we’ve been able to do from a music-understanding standpoint: figuring out what track goes with what track algorithmically, like we do in Pandora. We [also do this] in some aspects of the algorithmic artist stations in SiriusXM. But we’re also working really hard on things like understanding talk, sports, and podcast content.
Certain games might not be interesting, except that they have ramifications for a playoff berth for a game [in the future]. Being able to extract that information and recommend it to the right folks is actually much more complicated than just profiling songs, which is already very complicated.
Krefetz: You were in the streaming video side of things for a long time, with Disney Streaming, BAMTECH Media, and MLB. I’m curious about what you brought from the video world into the role you have now.
Inzerillo: A lot of the protocols—HLS is one of those things. [At SiriusXM,] I’ve tried to bring us back more toward, “Let’s draft off the work that other people have done because it’s less expensive and more well-understood as to how to achieve reliability.” This means using standard CDNs and using a bunch of things that are non-proprietary and just participate in that ecosystem writ large.
Krefetz: Are you architecting streaming and satellite the same way, or is it totally different?
Inzerillo: It’s its own thing. For satellite, [the architectures are] all essentially proprietary at this point in time; they’re not interoperable in any way. [The two merged companies Sirius Satellite Radio and XM Satellite Radio] had different technologies.
We have cars out there that we support which [use one of] two different kinds of satellites, and we’ve been converging them over time.
SiriusXM is in a little bit of a time capsule in audio formats and things like that. It’s not AES, but it’s AES-ish. The underlying audio format is similar to AES compression, but the modulation schemes and all that sort of stuff are highly proprietary and baked into the hardware.
As far as streaming goes, we use HLS. [Internet] audio has been small enough that people just do weird things like background downloading, [etc.,] and then have the player orchestrate the ad breaks. In a car where you’re streaming over a cellular network, some of those concerns come back. From a carrying capacity [standpoint], the cellular network can easily do the bitrate of audio. But when a cell network gets loaded, it looks like a really slow internet, so you have to start playing some of the same tricks that you did in video to abate the fact that you’re going to have congestion on the network.
We’re very much patterning after the video SSAI standards and more modern HLS standards in general to just try to bring higher quality and higher reliability to some of the stuff that we had done in the past. Pandora is mostly MP3 files.
Krefetz: I understand you’re reaching 50% of the population in your target markets and 84% of the total audio spend. Is there a short list in order of importance you can give me of the most important parts of your tech stack you need to focus on to keep the business thriving?
Inzerillo: The services layer and the content distribution network are two parts that have to be both relatively bulletproof. When you look at those stats, they’re a combination of SiriusXM and Pandora and also what we do with AdsWizz [SiriusXM’s supply-side ads business, which also does targeted SSAI where possible].
All three of those have surface area in the services world that has to be very robust and resilient to failures with cloud providers and other technology providers, in addition to the things we might do to ourselves errantly, as bugs make it into the field for everybody. I think the services stack is where it starts.
Obviously, content distribution is important too. If the services are working perfectly and you can log in and browse but you can’t actually play anything, that’s a pretty unsatisfying experience. We spend a lot of time making sure that we have multiple routes to get through various different CDNs and origins and networks, etc., to make sure that people can have a good listening experience.
SiriusXM’s AdsWizz supply-side ads business
The apps themselves have to work. But those are one of those things that are load-independent. Once people download it onto their devices, whether 100,000 or a million people are using it doesn’t really change the load on our system other than the services layer and content distribution. But the app itself is isolated to the person’s device.
Krefetz: Do you work in the cloud?
Inzerillo: Absolutely. Some things were done in the cloud when I got here, but trying to get out of some of the legacy systems is a big part of what we’ve been doing. We are a more proprietary industry, and there’s a lot of the “If it’s not broke, don’t fix it” mentality. That has served us very well, but as we start to extend the surface area of the products that we’re providing to customers, you have to go back and modernize some of that infrastructure. That’s going to lead you into moving a lot of those things that were typically done on-site, into the cloud.
Krefetz: When you talk about “expanding the surface area,” do you mean getting SiriusXM outside of the car and into different environments?
Inzerillo: The devices are proliferating, and even the car itself is under quite a bit of change. When you look at most automotive manufacturers, OEMs have said that probably in the U.S., 70% are going to wind up using the Android operating system. It’s not exactly the Android that’s on your phone.
One of the things that’s a challenge is the fact that cars [with legacy hardware] still exist on the road. So, we have to continue to support them. We don’t want to abandon subscribers, but every time you turn around, there’s another car manufacturer, another EV, and they generally have very different tech stacks.
Krefetz: Can you talk about personalization and product strategy for different audiences?
Inzerillo: Pandora has been personalized since Day 1. The notion of algorithmic music is inherently personalized. The Sirius side was always curated and never personalized by the nature of the satellite. It was a one- to-many kind of broadcast model that was much closer to analog radio.
There are people that go to SiriusXM who would probably not be happy in Pandora and vice versa, which is one of the reasons that there is this sort of a federation of brands for us.
Pandora’s personalized Thumbprint Radio
Music is a deeply personal thing. No two people are exactly the same. We have this thing in Pandora called Thumbprint Radio, which is radio that’s designed based upon your listening experiences. If you’re a Pandora person, it’s this sort of intangible feeling at some level that Pandora really understands you and gets your musical taste.
People on SiriusXM have a lot of affinity to stations or hosts. [They] are a little bit more used to this radio model where they’re selecting within this group.
Since we’ve started to deploy what we call 360L or internet-connected cars that have both the satellite and the internet connection and direct- to-consumer devices, we’re now sitting on this mountain of signals coming in about what people are actually doing and what they’re listening to. This gives us the opportunity to create deep personalization for what people are talking about.
Krefetz: Are you using a multi-CDN or multi-cloud strategy?
Inzerillo: When I got here, we were single-CDN, but now we’re definitely multi-CDN. CDNs are a good example of “Everybody has a bad day at some point,” and so you really want to be diverse in that. There’s a technical tax if you are building stuff multi-cloud. I’m not a believer in write-once, deploy-in-many clouds. You spend a lot of time either porting or abstracting, and that takes away from value you’re delivering a customer. We use Amazon and Google, but we use them for different purposes. I would say that the gravity is pulling us closer to Amazon.
I’ve been able to attain high levels of reliability by having a single cloud provider. If you’re going to be in a single cloud, be in multiple instances in different regions of it.
Krefetz: Why did you decide to build a new platform?
Inzerillo: One of the things that we wanted to do is write a contemporary back end. It’s really a re-architecture. SiriusXM was a linear radio provider, so all of our back end is oriented around making linear radio, and things like on demand came later. It’s very much the same sort of trajectory that the cable world took where cable companies were linear broadcasters of television, and then technologies like TiVo came out, and everybody was like, “Whoa, wait, what’s this? I can actually record shows! I can have subscriptions!”
Most people at some point along that continuum rewrite their back end to understand that some shows are on demand, some shows are live, and some shows are archives of shows that were live and understand all of those content modalities as first-class citizens in the ecosystem. That was a big part of what we wanted to do; a lot of the visual differences in the apps read through in that.
In the past, we would have a tile for channels mostly, but now you can get podcasts, and you can get entity pages that describe bands that are available on channels or in on-demand content. That whole structure didn’t exist, so it had to be built from scratch.
Krefetz: Since 80% of your revenue comes from subscriptions, is there something you have to do in your CMS to support that?
Inzerillo: One of the things we think about is a new content pipeline and the fact that it may or may not always be subscription. [Previously,] subscription or the preview channel was the way our back-end systems thought about things. Occasionally, we could do free listening weekends where we just made everything part of the preview channels, but it was never this notion of a continuum of where things can go.
For content sampling, as an example, I can drive you from a piece of performance media on Facebook to a page where you can sample that show before you get to the pay product. Knowing at the beginning that you’re setting out to design something that has the flexibility to handle paid use cases as well as free use cases and all the stuff that’s in between—ad-supported, etc.—is super helpful.
Ultimately, we wanted to have this platform power Pandora, which has a very robust, very large free service. It’s not like you want the programmers spending their days trying to figure out what’s free or paid. You really want a high degree of automation and a high degree of determinism.
But you also want to have a lot of flexibility. So, if something important happens and you want to put stuff in front of a paywall or you want to lean into late paywall, you can do all those types of things.
Krefetz: What’s most important to younger audiences?
Inzerillo: They expect everything to be in the cloud, and they don’t really think about it. To be quite honest, most of us don’t either. I’ve made the point to a lot of friends that your phone is no longer the hardware that’s sitting in your pocket. It’s really the image of that hardware that sits in Apple’s cloud or Google’s cloud because you go in and get a new one and you just make it the same thing that you had before, but new.
If you take everything else that’s not sports or news, there’s no question that a younger audience doesn’t have this kind of hierarchical notion of a channel, potentially even a provider. They really think about it as an on demand. What’s the show, the song, the personality? A radio dial means nothing to them.
There’s this real expectation of gratification that the thing you’re clicking does the thing you expect it to. If you have a title with an artist like Taylor Swift, it better play Taylor Swift when you hit the button. This is not true of a linear channel, where I might hear Taylor Swift or I might hear somebody else.
The device and the hardware are very transient. The data structure is the thing that’s actually being preserved. It’s a very mobile-centric culture. CarPlay and Android Auto, for example, are very popular for them because they bring their mobile environment into the thing that they’re driving, this transient piece of hardware called a car. [The] mentality is that the car isn’t the thing that has the service; it’s the phone that has this service. It is a very different mental model.
Krefetz: Can you elaborate on “mobile-centric” and “on demand”?
Inzerillo: Those are really the two axes. We have to serve both, and that’s a lot of what we’ve been working on over the last couple of years here as we try to be really good at serving the younger demographic. Clearly, we are great at reaching the older-demographic folks that have the mental model for us and love our service. That mental model is a big thing, and it’s generational.
Krefetz: For ad targeting, is it personalized or a cohort?
Inzerillo: I would definitely call it cohorted. It works in very similar ways in that there are ad providers that are on the supply side and the demand side very much like video. The exchange for those is generally some metadata that goes back and forth about audience, and that metadata generally conforms to some sort of specification. We participate in the IAB to help define what those metadata standards look like.
If you have a podcast—let’s say SmartLess, which we own—Apple’s got SmartLess on it, Spotify, our competitor, has SmartLess on it, etc. The ads that are getting baked in are under our control, but they’re not DIA [dynamic ad insertion] in podcasts because those playback platforms don’t support it. Podcasts are definitely unique [in that you’re] figuring out what the segmentation of that show looks like and then really selling audience that way so you’re not getting down to anywhere near the granularity you get down to with a real DIA kind of thing we would do on Pandora.
Krefetz: In streaming, churn is rampant, but your monthly churn rate at SiriusXM is very low, at 1.6%. When developing technology for the newer generations of listeners, is there anything else you want to mention that supports this low churn rate?
Inzerillo: When services have a low churn rate, it’s generally because their customers are taking value from the service. The more that people are engaging with our product, the more hours they listen, the more times a week they come back. These are engagement metrics we obsess about. All of the things that we try to do from a feature-set standpoint—recommendations, push notifications—can help drive engagement, which then drives value, which then leads to a lower churn rate. It goes without saying that what we do on the technology side is probably dwarfed by what happens on the content side. So, we have to have compelling content that people want to come back for.
Krefetz: How many markets are you in?
Inzerillo: SiriusXM is in the U.S. and Canada. Pandora is U.S.-only. We also own AdsWizz, which is our own ad server, but also has clients, and that’s essentially global, [albeit] North America- and Europe-heavy. SoundCloud, for example, is using the AdsWizz ad server. AdsWizz is both a SaaS technology provider, and half the business is powering our own ad-supported products.
Krefetz: How are streaming video and audio similar?
Inzerillo: Streaming music and streaming video have a lot of commonalities. There are also a lot of differences. When you talk about music, everybody’s got all the same songs, as opposed to the video world, where having a show on more than one service is an anomaly.
I think that perhaps what we do more robustly than video services is programming a full ecosystem, because there’s so much more audio content out there than there is video (principally because it’s less expensive to produce). Right now, in the SiriusXM app, we have a curated set, but it’s 10,000 different podcasts out of probably 150,000 podcasts that are available. You don’t have 150,000 original production shows on the streaming services put together in a year, let alone monthly or weekly. Part of our world is putting more emphasis on personalization, just because if you’re wrong, you might be wildly off. What I’ve learned from being here is that understanding what’s evergreen and what you can recommend to people is super important.
There aren’t as many video services that think about the deep catalog and how you can bring things back.
When I was at Disney, we had 10,000–15,000 pieces of content in circulation at any given point in time. Here, it’s like 250,000 pieces of content. It’s a numbers game of how many subscribers do you have, how big is the content set? With audio services, because we haven’t been able to uniquely differentiate on content as much (again, because everybody’s got the same music, and everybody’s got the same podcasts for the most part), we’ve had to get much craftier about figuring out how to surround you with stuff that’s deeply personalized so that you feel like this is the service for you. You can get that same content in other competitive services pretty easily.
Krefetz: What’s SiriusXM’s best feature?
Inzerillo: In the audio world, we have license to be a little bit more creative and start to mash things up a little bit. Because you listen to heavy metal and hair metal, I can just make up a new genre of stuff and give it a punchy title—especially with gen AI—and people would be like, “OK, I get that.” It’s not just prioritizing between a bunch of sets or reordering the stuff that’s in the sets. Pandora is a great example. Thumbprint Radio is all yours. There’s no thumbprint video [equivalent], and I think that’s because the working set is so small relative to audio.
Krefetz: Is gen AI going to help?
Inzerillo: Probably the most profound immediate effect is that we’re doing some amazing stuff in customer service. That makes sense because customer service is a conversation, and so these GPTs are fantastic at it. I don’t think that the customer base is expecting every app to be able to have that kind of conversation [about content] yet.
I think that understanding of content allows us to extract things that we then can use for personalization to summarize what a podcast is about. The personalization itself tends to not be these large models because there’s so much stuff running just to paint a screen and have 40 or 50 personalized elements on the screen.
The latency of the large models is too high to try even to use them, even if they work exceptionally well. More specialized machine learning models that we build ourselves do a much better job of personalization than that.
“With audio services, because we haven’t been able to uniquely differentiate on content as much (again, because everybody’s got the same music, and everybody’s got the same podcasts for the most part), we’ve had to get much craftier about figuring out how to surround you with stuff that’s deeply personalized so that you feel like this is the service for you.”—Joe Inzerillo, EVP and chief product and technology officer at SiriusXM
Companies and Suppliers Mentioned