Discussing Bloomberg’s Cloud Native Journey with Andrey Rybka
This conversation covers:
How Bloomberg is demystifying bond trading and pricing, and bringing transparency to financial markets through their various digital offerings.
Andrey’s role as CTO of compute architecture at Bloomberg, where he oversees research implementation of new compute related technologies to support kind of our business and engineering objectives.
Why factors like speed and reliability are integral to Bloomberg’s operations, and how they impact Bloomberg’s operations . Andrey also talks about how they impact his approach to technology, and why they use cloud-native technology.
How Andrey and his team use containers to scale and ensure reliability.
Why portability is important to Bloomberg’s applications.
Bloomberg’s journey to cloud-native.
Some of the open-source services that Andrey and his team are using at Bloomberg.
Unexpected challenges that Andrey has encountered at Bloomberg.
Primary business value that Bloomberg has experienced from their cloud-native transition.
Links
Transcript
Emily: Hi everyone. I’m Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product’s value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn’t talk about them. Instead, we talk a lot about technical reasons. I’m hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you’ll join me.Emily: Welcome to The Business of Cloud Native, I'm your host Emily Omier. And today I'm chatting with Andrey Rybka from Bloomberg, thank you so much for joining us, Andrey.Andrey: Thank you for your invitation.Emily: Course. So, first of all, can you tell us a little bit about yourself and about Bloomberg?Andrey: Sure. So, I lead the secure computer architecture team, as the name suggests, in the CTO office. And our mission is to help with research implementation of new compute-related technologies to support our business and engineering objectives. But more specifically, we work on ways to faster provision, manage, and elastically scale compute infrastructure, as well as support rapid application development and delivery. And we also work on developing and articulating company’s compute strategic direction, which includes the compute storage middleware, and application technologists, and we also help us product owners for the specific offerings that we have in-house. And as far as Bloomberg, so Bloomberg was founded in 1981 and it's got very large presence: about 325,000 Bloomberg subscribers in about 170 countries, about 20,000 employees, and more news reporters than The New York Times, Washington Post, and Chicago Tribune combined. And we have about 6000 plus software engineers, so pretty large team of very talented people, and we have quite a lot of data scientists and some specialized technologists. And some impressive, I guess, points is we run one of the largest private networks in the world, and we move about a hundred and twenty billion pieces of data from financial markets each day, with a peak of more than 10 million messages a second. We generate about 2 million news stories—and they're published every day—and then news content, we consuming from about 125,000 sources. And the platform allows and supports about 1 million messages, chats handled every day. So, it's very large and high-performance kind of deployment.Emily: And can you tell me just a little bit more about the types of applications that Bloomberg is working on or that Bloomberg offers? Maybe not everybody is familiar with why people subscribe to Bloomberg, what the main value is. And I'm also curious how the different applications fit into that.Andrey: The core product is Bloomberg Terminal, which is Software as a Service offering that is delivering diverse array of information of news and analytics to facilitate financial decision-making. And Bloomberg has been doing a lot of things that make financial markets quite a bit more transparent. The original platform helped to demystify a lot of bond trading and pricing. So, the Bloomberg Terminal is the core product, but there's a lot of products that are focused on the trading solutions, there is enterprise data distribution for market data and such, and there is a lot of verticals such as Bloomberg Media: that's bloomberg.com, TV, and radio, and news articles that are consumer-facing. But also there is Bloomberg Law, which is offering for the attorneys, and there is other verticals like New Energy Finance, which helps with all the green energy and information that helps a lot to do with helping with climate change. And then there's Bloomberg Government, which is focused on, specifically, research around government-specific data feeds. And so in general, you've got finance, government, law, and new energy as the key solutions.Emily: And how important is speed?Andrey: It is extremely important because, well, first of all, obviously, for traders, although we're not in high-frequency game, we definitely want to deliver the news as fast as possible. We want to deliver actionable financial information as fast as possible, so definitely it is a major factor, but also not the only factor because there's other considerations like reliability and quality of service as well.Emily: And then how does this translate to your approach to new technology in general? And then also, why did you think cloud-native might be a good technology to look into and to adopt?Andrey: So, I guess if we define cloud-native, a little because I think there's different definitions; many people think of containers immediately. But I think that we need to think of outside of not just, I guess, containers, but I guess the container orchestration and scaling elastically, up and down. And those, I guess, primitives. So, when we originally started on our cloud-native journey, we had this problem of we were treating our machines as pets if you know the paradigm of pets versus cattle where pet is something that you care for, and there’s, like, literally the name for it, you take it to the vet if it gets sick. And when you use think of herd of cattle, there's many of them, and you can replace, and you have quite a lot of understanding of scalability with the herd versus pets. So, we started moving towards that direction because we wanted to have more uniform infrastructure, more heterogeneous. And we started with VMs. So, we didn't necessarily jump to containers. And then we started thinking like, “Is VMs the right abstraction?” And for some workloads it is, but then in some cases, we started thinking, “Well, maybe we need something more lightweight.” So, that's how we started looking at containers because you could provision them faster, and they could start off faster, and developers seem to be gravitating towards containers quite a bit because it's very easy to bootstrap your local dev environment with containers. And when you ship a container to the higher environment, it actually works. Used to be a problem where you developed on your local machine and you’d ship your code to production or higher environment, and it doesn't work because some dependency get missed. And that's where containers came about, to help with that problem.Emily: And then how does that fit in with your core business needs?Andrey: So, one of the big things is obviously, we need to ship products faster—and that's probably common to a lot of businesses—but we also want to ensure that we have highest availability possible, and that's where the containers help us to scale out our workloads and ensure that there's some resurrection happens with things like Kubernetes when something dies. And we also wanted to maximize our machine utilization. So, we have very large data centers and edge deployments—which I guess could be referenced as a private cloud—so we want to maximize utilization in our data centers. So, that's where virtualization and containers help quite a bit. But also, we wanted to make sure our workloads are portable across all the environments, from private cloud to the public cloud, or the edge. And that's where containerized technologies could help quite a bit. Because not only you can have, let's say Kubernetes clusters on-prem on the edge, but also, now all the three major cloud providers support a managed Kubernetes offering. And in this case, you have basically highly portable deployments across all the clouds, private and public.Emily: And why was that important?Andrey: Basically, we wanted to have, more or less, very generic way to deploy something, an application, right? And if you think of containers, that's pretty much, like I say, Docker is pretty standard these days. And developers, we were challenged with different package formats. So, if you do any application Ruby and Rails, or Java and Python, there is a native packages that you can use to package your application, distribute it, but it's not as uniformly support it outside of Bloomberg or even across various deployment platforms. But containers do get you that abstraction layer that helps you to basically build once and deploy many different targets in very uniform way. So, whether we do it on-premises, or to the edge, or to the public cloud, we can effectively use the same packaging mechanism. But not only for deployment, which is the one problem, but also for post-deployment. So, if we need to self-heal the workload. So, all those primitives are built there in the, I guess, Kubernetes fabric.Emily: But why is being portable important? What does it give you? What advantage? I mean, I understand that's one of the advantages of containers. But why specifically for Bloomberg, why do you care? I mean, are you moving applications around between public cloud providers, and—Andrey: So, we're definitely adopting public cloud quite a bit, but I guess what I was trying to hint is we have to support the private cloud deployments as our primary, I guess, delivery mechanism. But the edge deployments, when we actually deploy something closer to the customers, to your point about being faster, to deliver things faster to our customers, we have to deliver things to the edge, which is what I'm describing as something that is close to the customer. And then as far as the public clouds, we started moving a lot of workloads to public cloud, and that definitely required some rethinking of how do we want to adopt public cloud. But whether it's private or public, our main goal, I think, here is to make it easier for developers to package and deploy things and effectively, run faster, or deploy things faster, but also do it in more reliable way, right? Because it used to be that we could deploy things to a particular target of machines, and we could do it relatively reliably, but there was no auto-healing, necessarily, in place there. So, resilience and reliability wasn't quite as good as what we get with Kubernetes. And what I mentioned before, machine utilization, or actually ability to elastically scale workloads, and—within vertical and horizontal—vertical, we generally knew how to do that. Although I think with containers and VMs, you can do it much better to higher degree, but also horizontally, obviously, this was pretty challenging to do before Kubernetes came about. You had to bootstrap, even your bunch of VMs and different availability zones, figure out how you're going to deploy to them, and it just wasn't quite there as far as automation and ease of use.Emily: Let's change gears just a little bit to talk about a little bit of your journey to cloud-native. You mentioned that you started with VMs, and then you moved to containers. What time frame are we talking about? In addition to containers, what technologies do you use?Andrey: So, yeah, I guess we started about eight years ago or so, with OpenStack as a primary virtualization platform, and if you look at github.com/bloomberg, you will see that we actually open-sourced our OpenStack distribution, so anyone can look and see if, potentially, they can benefit from that. And so OpenStack provided the VMs, basic storage, and some basic Infrastructure as a Service concepts. But then we also started getting into object storage, so there was a lot of investment made into S3 compatible storage, similar to how it AWS’s S3 object storage, or it's based on the [00:14:21 Sapth] open-source framework. So, that was our foundational blocks. And then, very shortly thereafter, we started looking at Kubernetes to build a general-purpose Platform as a Service. Because effectively developers generally don't really want to manage virtual machines, they want to just write applications and deploy them to the—somewhere, right, but they don't really care that much about the where I use Red Hat, or Ubuntu or the don't really care to configure proxies or anything like that. So, we started rolling out general-purpose Platform as a Service based on Kubernetes. So, that was with initial alpha release of Kubernetes; we already started adopting it. And then thereafter, we also started looking into how we can leverage Kubernetes for data science platform. Well, now we have a world-class data science platform that allows data scientists to train and run inference on the various large clusters of compute with GPUs. Then we quickly realized that on-prem, if we're building this on-prem, we need to have similar constructs to what you normally find in public cloud providers. So, as AWS add Identity Access Management, we started introducing that on-prem as well. But more importantly, we needed something that would be a discovery layer as a service, or if I'm looking for service, I need to go somewhere to look it up. And DNS was not necessarily the right construct, although it's certainly very important. So, we started looking into leveraging Consul as a primary discovery as a service. And that actually paid quite a lot of dividends and it's helped us quite a bit. We also looked into Databases as a Service because everything that I described so far was really good for stateless workloads, to a greater degree with Kubernetes, I think you can get really good at running stateless workloads, but for something's stateful, I think you needed something, basically, that will not run necessarily with Kubernetes. So, that's where we started looking at offering more Database as a Service, which Bloomberg has been doing this quite a bit before that. We open-sourced our core relational database called Comdb2. But we also wanted to offer that for MySQL, for Postgres, and some other database flavors. So, I think we have a pretty decent offering right now, which offers variety of Databases as a Service, and I would argue that you can provision some of those databases faster than you can do it on AWS.Emily: It sounds like—and correct me if I'm wrong, but it sounds like you've ended up building a lot of things in-house. I mean, you used Kubernetes, but you've also done a lot of in-house custom work.Andrey: Right. So, custom, but with the principle of using open-source. Everything that I described actually has an open-source framework behind it. And this open first principle is something that now this is becoming more normal. Before we build something in-house, we looked at open-source frameworks, and we look at which open-source community we can leverage that has a lot of contributors, but also, can we contribute back, right? So, we contribute back to a lot of open-source projects, like for example, Solr. So, we offer search as a service based on the Solr open-source framework. But also, we have Redis caching as a service, queuing as a service based on RabbitMQ. Kafka as a service, so distributed event streaming as a service. So, quite a few open-source frameworks. We're always thought of, “Can we start with something that's open-source, participate in the community, and contribute back?”Emily: And tell me a little bit about what has gone really well? And also what has been possibly unexpectedly challenging, or even expectedly, but it's always even more interesting to hear what was surprising.Andrey: I think, generally open-source first, as a strategy worked out pretty well. I think we have, I've listed only some of the services that we have in-house, but we certainly have quite a bit more. And the benefits, I think, I don't even know how to quantify it, but it certainly enabled us to go fast and deliver business value as soon as possible versus waiting for years before we build our alternative technology. And I think developer happiness also improved quite a bit because we started investing heavily into our developer experience and as a major effort. And this everything as a service makes it extremely easy for developers to deliver new products. So, all of the investment we've made so far paid huge dividends. Challenges: I think that as with anything, starting with open-source projects, you certainly have bugs and things like that. So, in this case, we preferred to partner with a company that effectively has inside knowledge into the open-source project so we can have at least for a couple of years, somebody who can help us guide us and potentially, we—by actually invest in actual money into the project, we get it to the point where it's mature enough and actually meets a certain quality criteria. And some of the projects we invested heavily in which many people don't know, probably, but we—like Chromium Project. So, many people use Chrome, but Bloomberg has been sponsoring Chromium and WebKit open-source development quite a bit. JavaScript, V8 Engine, even the newer technologies like WebAssembly we’re heavily invested in sponsoring that. But again, one thing that it's very clear, it's not just we're going to be the consumers of the open-source, but we're going to be contributors back with either our developers helping on the projects, or we need to invest to help this actual open-source project we're leveraging to be successful, and not just by saying, like, Bloomberg consumes it, but actually investing back. So, that's one of the things that was a big lesson learned. But currently, I think we have a really good enough system in place where we always adopt open-source projects in a very conscious and serious way with investment going back into the open-source community.Emily: You mentioned being able to deliver business value sooner. What do you think are the primary two or three business values that you get from this cloud-native transition?Andrey: So, ability to go faster. That's one thing that's very clear. Ability to elastically scale workloads, and ability to achieve uniformity of deployments across various environments: private, edge, public, so we are able to deliver now products to our customers as they transition to the public cloud, for example, much faster because we're have a lot of standardized and a lot of technologists that helped us with adoption. And including Kubernetes is one of them, but not only Kubernetes. We also use Terraform, extensively, some other multi-cloud frameworks.And then also delivering things more reliably. That's I think, one of the things that is not always recognized, but I think reliability is a huge differentiator, and some of it has to do with how we deliver things to the customer with some resiliency and redundancy. So, we run very large private content delivery network as a service, and it's also based on open-source technologies. And the reliability is one of the main things that I would say we get from a lot of this technologies because if we do it on our own, yes, it would be generally Bloomberg working on this problem and solving it, but you get a, actually, a worldwide number of experts from different companies who’re contributing back to this technologies, and I see this as a, obviously, a huge benefit because it's not just Bloomberg working on solving some distributed system framework, but it's actually people worldwide working on this.Emily: And would you say there's anything in moving to cloud-native that you would do differently?Andrey: I think what I see as the big challenge, especially with Kubernetes, is adoption of stateful workloads because I still think it's not quite there yet. Generally, the way we're thinking right now is we leverage Kubernetes for our stateless workloads, but some stateful workloads require some cloud-native storage primitives to be there, and this is where I think it's still not quite mature. You can certainly leverage various vendors for that, but I really would like to see better support for stateful workloads in the open-source world. And definitely still looking for a project to partner with to deliver better stateful workloads on Kubernetes. And I think, to a various degree, the public cloud providers, so hyperscalers are getting pretty good at this, but that is still private to them. So, whether it's Google, Amazon, or Azure, they deliver the statefulness to varying degree of reliability. But I would like this to be something that you can leverage on private clouds or anywhere else, and having it somewhere, well-supported through an open-source community would be, I think, hugely beneficial to quite a few people. So, Kubernetes, I think, is the right compute fabric for the future, but it still doesn't support some of the workload types that I would like to be there.Emily: Are any other continuing challenges that you're working on, or problems that you haven't quite solved yet, either that you feel like you haven't solved, maybe internally that might be specific to you, or that you just feel like the community hasn't quite figured out yet?Andrey: So, this whole idea of multi-cloud deployments, we leverage quite a few technologies from Terraform, to Vault, to Consul, to some other frameworks that help with some of it, but the day two alerting, monitoring, and troubleshooting with multi-cloud deployments is still not quite there. So, yes, you can solve it for one particular cloud provider, but as soon as you go to two, I think there's quite a few challenges that left unaddressed from just, like, single pane of glass—the view of all of your workloads, right. And that's definitely something that I would like to address: reliability, alerting across all the cloud providers, security across all the cloud providers. So, that's one of the challenges that I'm still working on—or, actually, quite a few people are working on at Bloomberg. As I said, we have 6000 plus talented engineers who are working on this.Emily: Excellent. Anything else that you'd like to add?Andrey: You know, I’m very excited about the future. I think this is almost like a compute renaissance. And it's really exciting to see all of these things that are happening, and I'm really excited about the future, I guess.Emily: Fabulous. Just a couple more questions. First of all, is there a tool that you feel like you couldn't do your job without?Andrey: Right. Yes, VI editor or [laugh] [00:27:42 unintelligible]? No. So, I think obviously, Docker has done quite a bit for the containerization. And I know, we're looking at alternatives to Docker at this point, but I do give Docker quite a bit of a credit because right now, local development environment, we bootstrap with Docker, we ship it as a deployment mechanism all over the place. So, I would say Docker, Kubernetes, and the two primary ones, but I don't necessarily want to pick favorites. [laugh]. I really like a lot of HashiCorp tools, you know Terraform, Consul, Vault, fantastic tools; a really good community. I really like Jenkins. We run Jenkins, the service; really good. Kafka has been extremely reliable and scalability-wise, Kafka is just amazing. Cache in Redis is really one of my favorite cache tools.There's probably a lot to mention. I've mentioned. [00:28:43 unintelligible] databases, Postgres is one of my favorite databases, or in so many varieties and different types of workload. But we also gain quite a lot from Hadoop and HBase. But one of my favorite NoSQL databases is Cassandra, an extremely reliable, and the replication across, I guess, low-quality bandwidth and of environment has been really awesome. So, I guess I'm not answering with just one but many tools, but I really like all of those tools.Emily: Excellent. Okay, well, just the last question is, where could listeners connect with you or follow you?Andrey: I am on Twitter as @andrey_rybka. I'm happy to get any direct messages. We are hiring. We're always hiring A lot of great opportunities. As I said, we’re open-source first company these days, and we definitely have a lot of exciting new projects. I haven't mentioned even probably 90 probably of other exciting projects that we have. We also have github.com/bloomberg, so you're welcome to browse and look at some of the cool open-source projects that we have as well.Emily: Excellent. Cool. Well, thank you so much for joining me.Andrey: Thank you very much.Emily: Thanks for listening. I hope you’ve learned just a little bit more about The Business of Cloud Native. If you’d like to connect with me or learn more about my positioning services, look me up on LinkedIn: I’m Emily Omier—that’s O-M-I-E-R—or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.