The Kubernetes Learning Curve with Edgaras Apsega

Some of the highlights of the show include
  • Why Adform decided to move to a cloud native architecture and Kubernetes specifically
  • Who was the driving force behind the move to Kubernetes?
  • Was the switch purely an engineering decision or did it involve people outside of engineering?
  • Positive and less positive surprises that come with switching to cloud native
  • Organizational and technical problems Edgaras has faced
  • What’s next for Adform on their cloud journey
Links
TranscriptAnnouncer: Welcome to The Business of Cloud Native Podcast where we explore how end users talk and think about the transition to Kubernetes and cloud-native architectures.
Emily: Welcome to The Business of Cloud Native. I’m Emily Omier, your host. And I’m here today with Edgaras Apsega, lead IT systems engineer at AdForm. Edgaras, what I’d like to do is just start out with you introducing yourself.
Edgaras: I’m Edgaras. I’m working in the Adform. For anyone that doesn't know, Adform is one of the leading advertising technology companies in the world, and provides the software used by buyers and sellers to automate digital advertising. And, probably one of the most interesting parts of our solution stack is demand-side platform that has real-time bidding. And, what it means is that when that page is loading for some kind of internet users, behind the curtain, there's actually a bidding process that takes place for the placeholders to show ads. So, basically, you're doing low latency stuff. And, in Adform, I'm a lead systems engineer for the cloud services team. Our team consists of eight people, and we are providing private cloud storage, load balancing, CDN, service discovery and Kubernetes platforms for our developers that are in [00:01:36 unintelligible] production services.  So, to better understand the scale that our team is working on, first of all, you can see that we are not using public cloud and we have our own private cloud that has six regions, more than 1500 physical servers, and there are more than 4000 [00:01:55 unintelligible]. And, for Kubernetes, we have seven clusters, more than 50 physical machines and around 300 constantly running [00:02:05 pods].  So, we can say that we prefer bigger clusters with bigger resources sharing pools. And you asked, how do I spend my daily work, right?
Emily: Yeah. So, when you get into the office or—right now you're not going into the office—get into your table or your [laughs] home office, what are the first couple things that you do, or…
Edgaras: Yeah, so, when I arrive at work, or, like, at these times, just get off the showers straight into work desk, [laughs] actually, I'm most productive in the mornings and evenings. So, in the mornings, when I go to my work desk, I try to do as much as I can. My sprint plan tasks, and then I scroll through the Slacks, emails, and the tickets assigned to me because we have a development team in another region. So, instantly in the mornings, we have some kinds of support tasks that we need to do.
Emily: Let's go ahead and talk about what this is all about, the business of cloud native, and tell me a little bit about why Adform decided to move to a cloud native architecture. Why did you decide to use Kubernetes, for example?
Edgaras: I'd say, actually, there were two parts. At first, we moved from traditional and, let's say, old-fashioned monitoring solutions to Prometheus, and its integration with service discovery solved lots of operational time for constantly managing and configuring monitoring and alerting for our, quite often, changing infrastructure. And the second part is the adoption of Kubernetes and all of the together coming parts like continuous integration and delivery. So, why we moved to this kind of architecture? It was because the biggest pain points for developers were to maintain actually their virtual machines. And rolling out new software releases in an old-fashioned way, took just lots of time for new software releases to reach production. So, we were looking at the new solutions that were available in the market, and Kubernetes was actually one of them. So, after successful proof of concept, we have selected it as our main application scheduler and orchestration tool.
Emily: What would you say was, like, the business value that you were hoping to get out of Kubernetes, out have the ability to release software faster, for example?
Edgaras: Yeah.  So, actually, we wanted to remove the operational time from our developers so that they could spend more time coding without taking care of all of the infrastructure surrounding parts, like the application operating system management, [00:04:58 unintelligible] monitoring, alerting, logging, and so on. So, basically what, I'm saying is that the business value was for the developers to be able to ship features faster, and have a more stable platform that scales application [00:05:15 unintelligible] as well. So, in addition to that, we have a big research department, and the research department always wanted us to have a dynamic environment where they could just launch an applications around some research models, and then shut it down.  So, I believe that was the business value.
Emily: Who in the organization do you think was motivating, or driving the move to Kubernetes?
Edgaras: I'd say, actually, it was more like the operation engineers, because the developers ended taking care of their environment virtual machines. They don't know much about it, but they still have to look after it, and constantly asking us for help. And we wanted to have this operational stuff only in our hands and for the developers to run only the code.  So, I believe, yeah.
Emily: To what extent was the move to Kubernetes, or to cloud native in general, just purely an engineering decision? Or did it involve other people outside of engineering?
Edgaras: Well, it wasn't only the engineering decision, because we had to take it to the upper levels, just to show this new cloud native, the modern way of developing and running applications. So, the upper management level had to invest time for us to move to microservices oriented architecture and so on. So, basically, we had to show that with a little bit of time investment we can gain lots of benefits, like faster code deploys. So, we are taking the operational work from developers, and developers, when they're releasing their applications, they have full stack monitoring, logging, and they don't need to do any of the operational tasks.
Emily: How difficult was it to have this conversation? Do you feel like the upper management, did they understand the value?
Edgaras: Yeah, it was kind of hard, because nobody wants to invest time to write the code. And, as we are a software company, we always need to write new features. But, once we showed a good example, when investing not so much time, we have those kinds of benefits, then it was quite easy to change the mindset of upper management.
Emily: And, how important do you think this was for Adform?
Edgaras: I think it was very important because now what we see, we have, basically, until now we had only dozens of deployments per day. Now with Kubernetes, we have more than 500 deployments per day, which is a big number for us, and this means that we are making releases more faster.
Emily: Tell me a little bit about any surprises that you had as you were moving to Kubernetes, as you were moving to microservices. Surprises, and I'm interested in hearing both about surprises that were positive and surprises that were less positive.
Edgaras: Probably the biggest surprise for us, for our thing was just how amazing the communities. When we faced any kind of issue, most of the time there's simply a GitHub issue that’s described fixes or workarounds. You can always get an answer for questions in Slack. I remember when we had actually an issue with Kubernetes and persistent storage, and in Kubernetes Slack channel, one engineer from a company that provides storage solutions, he just provided me lots of information and several ways of tackling the problem that we were facing, actually, and that really stood me out. And, actually, we just recently started a cloud native [00:09:28 unintelligible] meetup group, with which we gather lots of folks for knowledge sharing presentations, and discussions afterwards, and it feels like the community is really strong and is eager to share their knowledge freely. So, that really amazes me about this journey.
Emily: What about some less positive surprises?
Edgaras: Yeah.  So, moving to Kubernetes from virtual machines world, first of all, to change the developers mindsets about the resources utilization, I'd say. Because coming to Kubernetes world, developers need to set containers, resource limits, and often they're setting amounts similar to what they had on virtual machines with other services like monitoring, log shippers, and so on. And we see on Kubernetes, that for some applications, the resource usage is very low, but the requests of CPU is quite high, so we're still monitoring resource utilizations, and communicating with teams to lower them. Because one good example would be that while general CPU usage in whole environment is around 30 percent, we're constantly reaching fully CPU requested Kubernetes nodes, and other teams are facing deployment issues. The nodes are full. And, probably I should share that we had one interesting example that when we have migrated a service from virtual machines to Kubernetes, that service was using nine virtual machines with 16 CPUs each. And then they migrated to Kubernetes with all of the built in monitoring tools and so on. They have noticed that for the current workloads, they only needed six CPUs.  So, instead of nine virtual machines with 16 CPUs, they only needed six CPUs, and so they returned just a lot of resources to the shared pool.
Emily: Wow.
Edgaras: Yeah, that's amazing. And another big pain point is always the security.  So, we’re struggling a lot with the security part at the moment. And, as you may know, often security is focused on the IP address based identity, and in Kubernetes, those IPs are always changing and you can't rely on the fact that a specific IP address is tied to a particular service. So, yeah, so all the cloud native mindset needs to be changed, not only for the developers and operational engineers, but for the security engineers as well.
Emily: Where would you say you are in the cloud native transition? Are you there, have you done everything that you can, or are you somewhere on the journey?
Edgaras: I’d say, we are more than halfway through because simply [00:12:31 unintelligible] have some legacy applications that need to be rewritten so those can run in a containerized workloads. And for our critical and user-facing applications, we’re still have lots of discussions with our security team about how the infrastructure and all of those access control things should look like.  So, yeah, at first, load services owners were looking at Kubernetes from a distance, and after a few successful migrations, more and more high load services are scheduled to do migrations. But in terms of the legacy applications, business still doesn't invest money, because it's not a critical application.  So, I think they're going to stick for a while on that kind of phase.
Emily: Do you think that that's okay? Would you rather invest the money in—is there any disadvantages to keeping these legacy applications around?
Edgaras: I think the one point or another, they'll be completely rewritten or terminated for good. So, actually, it depends. I think, if it's not business critical, then probably it's okay. But if it is business critical, then I'd say migrate it to Kubernetes to have the self-healing infrastructure that scales just beautifully.
Emily: When you think about some of your pain points, do you think of them as technical issues? Do you think of them as, sort of, organizational issues? What are some examples of both organizational and technical problems that you've had?
Edgaras: Yeah.  So, regarding the resources of the organization on how the developers are setting the resources, I think that it's kind of organizational issue. We did some Kubernetes trainings internally, and developers are always asked us to [00:14:25 unintelligible] one more time, those trainings because they're interactive. And it's a [00:14:29 hike] now. But there's always new developers coming and you still have to share your knowledge about how the resources should be implemented, how they should set the requests, or how they should set their limits and so on. Regarding the security around the Kubernetes, I think that this field is quite new, and I remember the last KubeCon in Barcelona, there were lots of buzz about the Kubernetes security and just shows that this, kind of, new field, and everybody needs information about it.
Emily: I think that you're right. It seems like both of those things are really almost skill gap issues. Do you think that there any real technical problems? So, things that the technology isn't quite there, or it's not really a problem with the way that your team members are thinking about something, or that they don't have the skills.
Edgaras: Yeah, so actually, about Kubernetes. As I mentioned, we're running Kubernetes on bare metal. And the technical stuff with Kubernetes is that it's actually first class citizen for public clouds, but when you're trying to run it on bare metal, there's some issues that you cannot expose services with, let’s say, Type LoadBalancer, you cannot have quite easily service mesh that talks not only within Kubernetes cluster, but also outside Kubernetes cluster with your virtual machines because you need to have BGP mesh, and that's your current network equipment. And there's a lot of technical issues, actually, around running Kubernetes on bare metal.
Emily: I think that that's really interesting. What are you doing to make it easier to run Kubernetes on bare metal? Or are you? Is that something that you're investing time and money into making easier?
Edgaras: Yeah.  So, for running Kubernetes on bare metal, actually, we’re not using any of the automation that's provided publicly. So, we took parts of Kubernetes and automated those parts by ourselves. And we have those three data centers close to each other, connected via that fiber and we have one logical Kubernetes cluster across three data centers. And for the services to be exposed as a, let’s say, Type LoadBalancer, they do have some workarounds that will put in a custom load balancers in front of the Kubernetes cluster.
Emily: Is that something that you would hope that the community would do more of, or do you feel like you've got a pretty good handle on it at the moment?
Edgaras: Probably, I would like to see this addressed by the community more because everything that's being built for Kubernetes, it seems that it's being built for the public clouds, but not for the bare metal.
Emily: This actually, sort of, leads me to a future-oriented question. Where do you see, sort of, your next steps on the cloud journey as being?
Edgaras: Yeah, so service mesh. [laughs] everyone's talking about the service mesh. Probably, you'll have—actually we have plans to look at it and to do a proof of concept but, as I mentioned before, there are some technical issues if you want to make services talk between Kubernetes and virtual machines services between.  So, looks like a journey.
Emily: What do you hope to get out of completing the journey?
Edgaras: So, service mesh, I believe, would provide this circuit-breaking, and service discovery would take us to another level. And so, I believe that when we end this journey, the scalability of our platform should improve as much as platform stability, and for the developers it would remove the operational tasks completely.
Emily: And, what do you think that that would mean in terms of the business?
Edgaras: You know, business is always looking for two things: to have stable platform for our customers, and to run infrastructure at the lowest possible costs. So, I think that the Kubernetes with container orchestration and auto-scaling solves the first problem, while the nature of shared resources in Kubernetes helps teams to achieve lower infrastructure costs.
Emily: And are you pretty happy with where you are now, with, sort of, the results that you've gotten at this stage? Would you do it over again?
Edgaras: Definitely. As I mentioned before, before Kubernetes, we had like, only tens or twenties deployments per day, now we have 500 deployments per days. And the developers are even happy with more features that they're getting. They're getting feature branch deployments, and green/blue deployments, and so on. So, for us operational engineers, there's less work to maintain everything because everything comes standardized. And for the developers, it's less operational work, and the just develop new service or feature and just push it.
Emily: Anything else that you want to add? And then I have a couple, sort of, closing questions to ask as well, but before then, is there anything else that you want to add?
Edgaras: What I'd like to add is that with Kubernetes, probably the biggest issues is with security because Kubernetes is kind of new thing. And seems like security stuff around the Kubernetes is one step behind.  So, what I'd like to see is more solutions from the security perspective around Kubernetes.
Emily: So, just sort of in closing, I have a couple of fun questions. The first one is, what do you think, for you personally, and possibly organizationally, what's your can't live without engineering tool?
Edgaras: Prometheus because if you don't have monitoring, then it's like, flying the plane without any of dashboards, so you’ll crash soon.
Emily: Excellent. And then how can people connect with you?
Edgaras: LinkedIn is always open.
Emily: Are you on Twitter?
Edgaras: Yes, I am.
Emily: Fabulous. I think we can go ahead and wrap it up there, and thank you so much for chatting.
Edgaras: Cool. Thanks for having me.
Announcer: Thank you for listening to The Business of Cloud Native podcast. Keep up with the latest on the podcast at thebusinessofcloudnative.com and subscribe on iTunes, Spotify, Google Podcasts, or wherever fine podcasts are distributed. We'll see you next time.
This has been HumblePod production. Stay humble.
Uncategorized