Exploring Single Music’s Cloud Native Journey with Kevin Crawley

Kevin Crawley works as a developer advocate at Traefik Labs, the company that introduced the Traefik cloud native load balancer. He also has an interesting side hustle, moonlighting as co-founder and co-owner of Single Music — a company that creates direct-to-fan music tools for Shopify. In this episode of The Business of Cloud Native, host Emily Omier talks with Kevin about Single Music’s technical journey, and how the company is using Kubernetes to expand and help hard working musicians remain profitable. The conversation covers:

Why Kevin helped launch Single Music, where he currently provides SRE and architect duties.
Single Music’s technical evolution from Docker Swarm to Kubernetes, and the key reasons that drove Kevin and his team to make the leap.
What’s changed at Single Music since migrating to Kubernetes, and how Kubernetes is opening new doors for the company — increasing stability, and making life easier for developers.
How Kubernetes allows Single Music to grow and pivot when needed, and introduce new features and products without spending a large amount of time on backend configurations.
How the COVID-19 pandemic has impacted music sales.
Single Music’s new plugin system, which empowers their users to create their own middleware.
Kevin’s current project, which is a series of how-to manuals and guides for users of Kubernetes.
Some common misconceptions about Kubernetes.

Transcript

Emily: Hi everyone. I’m Emily Omier, your host, and my day job is helping companies position themselves in the cloud-native ecosystem so that their product’s value is obvious to end-users. I started this podcast because organizations embark on the cloud naive journey for business reasons, but in general, the industry doesn’t talk about them. Instead, we talk a lot about technical reasons. I’m hoping that with this podcast, we focus more on the business goals and business motivations that lead organizations to adopt cloud-native and Kubernetes. I hope you’ll join me.Emily: Welcome to The Business of Cloud Native. I'm Emily Omier, your host, and today I am chatting with Kevin Crawley. And Kevin actually has two jobs that we're going to talk about. Kevin, can you sort of introduce yourself and what your two roles are?Kevin: First, thank you for inviting me on to the show Emily. I appreciate the opportunity to talk a little bit about both my roles because I certainly enjoy doing both jobs. I don't necessarily enjoy the amount of work it gives me, but it also allows me to explore the technical aspects of cloud-native, as well as the business and marketing aspects of it. So, as you mentioned, my name is Kevin Crawley. I work at a company called Containous. They are the company who created Traefik, the cloud-native load balancer. We've also created a couple other projects, and I'll talk a little bit about those later. For Containous, I'm a developer advocate. I work both with the marketing team and the engineering team. But also I moonlight as a co-founder and a co-owner of Single Music. And there, I fulfill mostly SRE type duties and also architect duties where a lot of times people will ask me feedback, and I'll happily share my opinion. And Single Music is actually based out of Nashville, Tennessee, where I live, and I started that with a couple friends here.Emily: Tell me actually a little bit more about why you started Single Music. And what do you do exactly?Kevin: Yeah, absolutely. So, the company started out of really an idea that labels and artists—and these are musicians if you didn't pick up on the name Single Music—we saw an opportunity for those labels and artists to sell their merchandise through a platform called Shopify to have advanced tools around selling music alongside that merchandise. And at the time, which was in 2016, there weren't any tools really to allow independent artists and smaller labels to upload their music to the web and sell it in a way in which could be reported to the Billboard charts, as well as for them to keep their profits. At the time, there was really only Apple Music, or iTunes. And iTunes keeps a significant portion of an artist's revenue, as well as they don't release those funds right away; it takes months for artists to get that money. And we saw an opportunity to make that turnaround time immediate so that the artists would get that revenue almost instantaneously. And also we saw an opportunity to be more affordable as well. So, initially, we offered that Shopify integration—and they call those applications—and that would allow those store owners to distribute that music digitally and have those sales reported in Nielsen SoundScan, and that drives the Billboard Top 100. Now since then, we've expanded quite considerably since the launch. We now report on sales for physical merchandise as well. Things like cassette tapes, and vinyl, so records. And you'd be surprised at how many people actually still buy cassette tapes. I don't know what they're doing with them, but they still do. And we're also moving into the live streaming business now, with all the COVID stuff going on, and there's been some pretty cool events that we've been a part of since we started doing that, and bands have gotten really elaborate with their live production setups and live streaming. To answer the second part of your question, what I do for them, as I mentioned, I mostly serve as an advisor, which is pretty cool because the CTO and the developers on staff, I think there's four or five developers now working on the team, they manage most of the day-to-day operations of the platform, and we have, like, over 150 Kubernetes pods running on an EKS cluster that has roughly, I'd say, 80 cores and 76 gigabytes of RAM. That is around, I'd say about 90 or 100 different services that are running at any given time, and that's across two or three environments, just depending on what we're doing at the time.Emily: Can you tell me a little bit about the sort of technical evolution at Single? Did you start in 2016 on Kubernetes? That's, I suppose, not impossible.Kevin: It's not impossible, and it's something we had considered at the time. But really, in 2016, Kubernetes, I don't even think there wasn't even a managed offering of Kubernetes outside of Google at that time, I believe, and it was still pretty early on in development. If you wanted to run Kubernetes, you were probably going to operate it on-premise, and that just seemed like way too high of a technical burden. At the time, it was just myself and the CTO, the lead developer on the project, and also the marketing or business person who was also part of the company. And at that time, it was just deemed—it was definitely going to solve the problems that we were anticipating having, which was scaling and building that microservice application environment, but at the time, it was impractical for myself to manage Kubernetes on top of managing all the stuff that Taylor, the CTO, had to build to actually make this product a reality. So, initially, we launched on Docker Swarm in my garage, on a Dell R815, which was like a, I think was 64 cores and 256 gigs of RAM, which was, like, overkill, but it was also, I think it cost me, like, $600. I bought it off of Craigslist from somebody here in the area. But it served really well as a server for us to grow into, and it was, for the most part, other than electricity and the internet connection into my house, it was free. And that was really appealing to us because we really didn't have any money. This was truly a grassroots effort that we were just—we believed in the business and we thought we could quickly ramp up to move into the Cloud. So, that's exactly what happened though. Like, we started making money—also, this was never my full-time job. I started traveling a lot for my other developer relations role. I worked at Instana before Containous. Eventually, the whole GarageOps thing just wasn't stable for the business anymore. I remember one time, I think I was in Scotland or somewhere, and it was, like, two o'clock in the morning at home here in Nashville, and the power went out. And I have a battery backup, but the power went out long enough to where the server shut down, and then it wouldn't start back up. And I literally had to call my wife at two o'clock in the morning and walk her through getting that server back up and running. And at that point in time, we had revenue, we had money coming in and I told Taylor and Tommy that, “Hey, we're moving this to AWS when I get back.” So, at that point, we moved into AWS. We just kind of transplanted the virtual machines that were running Docker Swarm into AWS. And that worked for a while, but up until earlier this year, it became really apparent that we needed to switch the platform to something that was going to serve us over the next five years.Emily: First of all, is ‘GarageOps’ a technical term?Kevin: I mean, I just made it up.Emily: I love it.Kevin: I mean, it was just one of those things where we thought it was a really good idea at the time, and it worked pretty well because, in reality, everything that we did, up into that point was all webhook-based, it was really technically simple. But anything that required a lot of bandwidth like the music itself, it went directly into AWS into their S3 buckets, and it was served from there as well. So, there wasn't really any of this huge bandwidth constraint that we had to think about, that ran in our application itself. It was just a matter of really lightweight JSON REST API calls that you could serve from a residential internet connection if you understand how to set all that stuff up. And at the time, I mean, we were using Traefik, which version 1.0 at the time, and it worked really well for getting all this set up and getting it all working, and we leveraged that heavily. And at that time in 2016, there wasn't any competitor to Traefik. You would use HAProxy or you use NGINX, and both of those required a lot of hand-holding, and a lot of configuration, and it was all manual, and it was a nightmare. And one of the cool things about Docker Swarm and Traefik is that once I had all the tooling set up, it all sort of just ran itself. And the developers, I don't know around 2017 or ’18, we had hired another developer on the staff. And realistically, if they wanted to define a new service, they didn't have to talk to me at all. All they did was create a new repo in GitHub, change some configuration files in the tooling we had built—or that I had built—and then they would push their code to GitLab, and all the automation would just take over and deploy their new service, and it would become exposed on the internet, if it was that type of a service, it was an API. And it would all get routed automatically. And it was really, really nice for me because I really was just there in case of the power went out in my garage, essentially.Emily: You said that up until earlier this year, this was more or less working, and then earlier this year, you really decided it wasn't working anymore. What exactly wasn't working?Kevin: There were a few different things that led us to switching, and the main one was it seemed like that every six to twelve months, the database backend on the Swarm cluster would fall over. For whatever reason, it would just—services would stop deploying, the whole cluster would seemingly lock up. It would still work, but you just couldn't deploy or change anything, and there was really no way to fix it because of how complicated and how I want to say how complex the actual databases and the data that's been stored in it because it's mostly just stateful records of all the changes that you've made to the cluster up until that point. And there was no real easy way to fix that other than just completely tearing everything down and building it up from scratch. And with all the security certificates, and the configuration that was required for that to work, it would literally take me anywhere between five to ten hours to tear everything apart, tear everything down, set up the worker nodes again, and get everything reestablished so that we could deploy services again, and the system was accepting webhooks from Shopify, and that was just way too long. Earlier this year, actually we crossed into, I want to say in January, we had over 1400 merchants in Shopify sending us thousands of orders every day, and it just wasn't acceptable for us to have that length of downtime 15, 20, 35 minutes, that's fine but several hours just wasn't going to work.Our reputation up until that point had been fairly solid. That issue or incident hadn't happened in the past eight months, but we were noticing some performance issues in the cluster, and in some cases where we were having to redeploy services two, three times for those services to apply, and that was sort of like a leading indicator that something was going to go wrong pretty soon. And it was just a situation where it was like, “Well, if we're going to have to go offline anyways, let's just do the migration.” And it just so happened that in April, I was laid off from my job at Instana and I was fortunate enough to be able to find a new job in, like, a week, but I knew that I wanted to complete this migration, so I went ahead and decided to put off starting the new job for a month. And that gave me the means, and the opportunity and the motive to actually complete this migration. There were some other factors that played into this as well, and that included the fact that in order to get Swarm stood up in 2016, I had to build a lot of bespoke tooling for the developers and for our CI/CD system to manage these services in the staging and production environment, handling things like promotion and also handling things like understanding what versions of the services are running in the cluster at any given time, and these are all tools that are widely available today in Kubernetes. Things like K9s, or Lens, or Helm, Kustomize, Skaffold, these are all tools that I essentially had to build myself in 2016 just to support a microservice environment, and it didn't make sense for us to continue maintaining that tooling and having to deal with some of their limitations because I didn't have time to keep that tooling fresh and keep it up-to-date and competitive with what's in the landscape today, which are the tools that I just described. So, it just made so much sense to get rid of all that stuff and replace it with the tools that are available today by the community and has infinitely more resources poured into them than I was ever able to provide, or I will ever be able to provide even as a single person working on a project. The one that was sort of lingering in the background was the fact that we have here recently started doing album releases, and artists are coming to us where they will sell hundreds of thousands of albums within a very short period of time, within several hours, and we were reaching the constraints of some of our database and our backend systems to where we needed to scale those horizontally. We had, kind of, reached the vertical limits of some of them, and we knew that Kubernetes was going to give us these capabilities through the modern operator pattern, and through just the stateful tooling that has matured in Kubernetes that wasn't even there in 2016, and wasn't something that we could consider, but we can now because the ecosystem has matured so much.Emily: So, yeah, it sounds like basically you were running up against some technical problems that were on the verge of becoming major business problems: the risk of downtime, and the performance issues, and then it also sounds like some of the technical architecture was limiting the types of products, the types of services that you could have. Does that sound about right?Kevin: Yeah, that's a pretty good summary of it. I think that one of the other things that we had to consider too was that the Single ecosystem, like the Single Music line of products has become so wide and so vast—I think we're coming up on five or six different product lines now—and developers need an 8 core laptop with 32 gigs of RAM just to stand up our stack because we're starting to use things like Kafka and Postgres to do analytics on all this stuff, and we're probably going to get to the point within the next 18 months to where we can't even stand up the full Single Music stack on a local machine. We're going to have to leverage Kubernetes in the Cloud for developers to even build additional products into the platform. And that's just not possible with Swarm, but it is with Kubernetes.Emily: Tell me a little bit about what has changed since making the migration to Kubernetes. And I'm actually also curious, the timeframe when this happened is really interesting, and you talked a little bit about offering these streaming services for musicians. I mean, it's an interesting time to be in the music industry. Interesting, probably in both the exciting sense and also negative sense. But how have things changed? And how has Kubernetes made things possible that maybe wouldn't have been possible otherwise?Kevin: I think right now, we're still on the precipice, or on the leading edge of really realizing the capabilities that Kubernetes has unlocked for the business. I think right now, I mean, the main benefit of it has been just a overwhelming sense of comfort and ease that has been instilled into our business side of the company, our executive side, if you will. The marketing and—of course, the sales and marketing people don't really know that much about the technical challenges that the engineering side has, and what kind of risk we were at when we were using Swarm at the time, but the owner did. There's three co-owners of the company, it's myself, Taylor, and Tommy. And Taylor, of course, is the CTO, and he is very well have the risk because he is deeply invested in the platform and understands how everything works. Now, Tommy, on the other hand, he just cares, “Is it up?” Are customers getting what their orders—are they getting their music delivered? And so, right now it's just there's a lot more confidence in the platform behaving and operating like it should. And that's a big relief for the engineers working on the project because they don't have to worry about whether or not the latest version of their service that they deployed has actually been deployed; or if the next time they deploy, are they going to bring down the entire infrastructure because the Swarm database corrupts, or because the Swarm network doesn't communicate correctly like it missed routes. We had issues where staging versions of our application would answer East-West traffic—like East-West request traffic that is supposed to go in between the services that are running in the cluster—like staging instances would answer requests that were coming from production instances when they weren't supposed to. And it's really hard to troubleshoot those problems, and it's really hard to resolve those. And so right now it's just a matter of stability. The other thing that is enabling us to do is handle the often difficult task of managing database migrations, as well as topic migrations, and, really, one-off type jobs that would happen every once in a while just depending on new products being introduced or new functionality to existing products being introduced. And these would require things like migrations in the data schema. And this used to have to be baked into the application itself, and this was really sometimes kind of tricky to manage when you start talking about applications that have multiple replicas, but with Kubernetes, you can do things like tasks, and jobs, and things that are more suited towards these one-off type activities that you don't have to worry about a bunch of services running into each other and stepping on each other's feet anymore. So, this, again, just gives a lot of comfort and peace of mind to developers who have to work on this stuff. And it also gives me peace of mind because I know ultimately, that this stuff is just going to work as long as they follow the best practices of deploying a Kubernetes manifest and Kubernetes objects, and so I don't have to worry about them breaking things per se, in a way in which they aren't able to troubleshoot, diagnose, and ultimately fix themselves. So, it just creates less maintenance overhead for me because as I mentioned at the beginning of the call, I don't get paid by Single Music, unless of course, they go public or they sell. But I'm not actually a full-time employee. I'm paid by Containous, that's my full-time job, so anything that allows me to have that security and have less maintenance work on my weekends is hugely beneficial to my well being and my peace of mind, as well. Now, the other part of the question you had, as well, is in terms of how are we transitioning, and how are we handling the ever-changing landscape of the business? I think one of the things that Kubernetes lets us do really well is pivot and introduce these new ideas and these new concepts, and these new services to the world. We get to release new features and products all the time because we're not spending a ton of time having to figure out, “Well, how do I spin up a new VM, and how do I configure the load balancer to work, and how do I configure a new schema in the database?” The stuff, it's all there for us already to use, and that's the beauty of the whole cloud-native ecosystem is that all these problems have been solved and packaged in a nice little bundle for us to just scoop up, and that enables our business to innovate and move fast. I mean, we try not to break things, but we do. But for the most part, we are just empowered to deliver value to our customers. And for instance the whole live-streaming thing, we launched that over the course of, maybe, a week. It took us a week to build that product and build that capability, and of course, we've had to invest more time into it as time has gone on because not only do our customers see value in it, we see value in it, and we see value in investing additional engineering and business marketing hours into selling that product. And so again, it's just a matter of what Kubernetes, and the cloud-native ecosystem in general—and this includes Swarm to some extent because we could not have gotten to where we did without Swarm in the beginning, and I want to give it its proper dues because, for the most part, it worked really well, and it served our needs, but it got to the point where we kind of outgrew it, and we wanted to offload the managing of our orchestrator to somebody else. We didn't want to have to manage it anymore. And Kubernetes gave us that.Emily: It sounds like, particularly when we're talking about the live streaming product, that you were able to build something really quickly that not only helped Single’s business but then obviously also helped a lot of musicians, I'm assuming at least. So, this was a way to not just help your own business, but also help your customers successfully pivot in a time of fairly large upheaval for their industry.Kevin: Right. And I think one of the cool things that we experienced through the pandemic is that we saw a fairly sharp rise in sales in general in music, and I think it kind of speaks to the human nature. And what I mean by that, is that music is something that comforts people and gives people hope, and also it's an outlet. It's a way for people to, I don't want to say, disconnect because that's not really what I mean, but it gives them a means to experience something outside of themselves. And so it wasn't really that big of a surprise for us to see our numbers increase. And, I mean, the only thing that kind of did surprise—I mean, it's not a surprise now in retrospect, but one of the things that we observed as well, as soon as all the George Floyd protests started happening across the United States, the numbers conversely dropped, and at that point, we realized that there was something more important going on in the world. And we expected that and we were… it was just an interesting observation for us. And right now, I mean, we're still seeing growth, we're still seeing more artists and more bands coming online, trying to find new ways to innovate and to try to sell their music and their artwork, and we love being a part of that, so we're super stoked about it.Emily: That actually might be a good spot for us to wrap up, but I always like to give guests the opportunity to just say anything that they feel like has gone unsaid.Kevin: Well, I mean, one of the things I do want to talk about a little bit is some of the stuff that we're doing at Containous as well. As a developer advocate, I think one of the things that I really enjoy in that aspect is that this gives me an opportunity to work closely with engineers in a way in which—a lot of times, they don't have an opportunity to experience the marketing and the business side of the product, and the fact that I can interact with my community and I can work with our open-source contributors and help the engineers realize the value of that is incredible. A few things that I've done at Containous since I've joined is we are working really hard at improving our documentation and improving the way in which developers and engineers consume the Traefik product. We also are working on a service mesh, which is a really cool way for services to talk to each other. But one of the things that we've recently launched two that I want to touch on is our plugin system, which is a fairly highly requested feature in Traefik. And we launched it with Pilot, which is a new product that allows the users of Traefik to install these plugins that manipulate the request before it gets sent to the service. And that means our end-users are now empowered to create their own middleware, in essence. They're able to create their own plugins. And this allows them really unlimited flexibility in how they use the Traefik load balancer and proxy. The other thing that we're working on, too, is improving support for Kubernetes. One of the surprises that I had when migrating from Traefik version 1 to Traefik 2, when we did the Single migration to Kubernetes, was once I figured out the version two configuration, it was really easy to make that migration, but it was difficult at first to make the translation between the version 1 schema of the configuration into the version 2. So, what we're working on and what I'm working on right now with our technical writer, is a series of how-tos and guides for users of Kubernetes to be empowered in the same way that we are at Single Music to quickly and easily manage and deploy their microservices across their cluster. With that, though, I mean, I do want to talk one more thing, on maybe some misconceptions about cloud-native and Kubernetes.Emily: Oh, yes, go ahead.Kevin: Yeah, I mean, I think one of the things that I hear a lot of is that Kubernetes is really hard; it's complex. And at first, it can seem that way; I don't want to dispute that, and I don't want to dismiss or minify people's experience. But once those basic concepts are out of the way, I think Kubernetes is probably one of the easiest platforms I've ever used in terms of managing the deployment and the lifecycle of applications and web services. And I think probably the biggest challenge is for organizations and for engineers who are trying to adopt Kubernetes is that in some ways, perhaps they're trying to make Kubernetes work for applications and services that weren't designed from the ground up to work in a cloud-native ecosystem. And that was one of the things that we had the advantage of in 2016 was even though we were using Docker Swarm, we still followed something which was called the ‘Twelve-Factor App’ principle. And those principles really just laid us out for a course of smooth uninterrupted, turbulence-free flying. And it's been really an amazing journey because of how simple and easy that transition from Docker Swarm into Kubernetes was, but if we had built things the old way, using maybe Packer and AMIs and not really following the microservice route, and hard coding a bunch of database URLs and keys and all kinds of things throughout our application, it would have been a nightmare. So, I want to say to anybody who is looking at adopting Kubernetes, and if it looks extremely daunting and technically challenging, it may be worth stepping back and looking at what you're trying to do with Kubernetes and what you're trying to put into it, and if there needs to be some reconciliation at what you're trying to do with it before you actually go forth and use something like Kubernetes, or containers, or this whole ecosystem for that matter.Emily: Let me go ahead and ask you my last question that I ask everybody which is, do you have a software engineering tool that you cannot live without, that you cannot do your job without? If so, what is it?Kevin: Yeah, I mean, Google’s probably… [laughs] seriously, it's one of my most widely used tools as a developer, or as a software engineer, but in terms of, like, it really depends on the context of what I'm working in. If I'm working on Single Music, I would have to say the most widely used tool that I use for that is Datadog Because we have all of our telemetry going to there. And Datadog gives me a very fast and rapid understanding of the entire environment because we have metrics, we have traces, and we have logs all being shipped there. And that helps us really deep dive and understand when there's any type of performance regression, or incident happening in our cluster in real-time.As far as what my critical tooling at Containous is, because I work in Marketing and because I work more in an educational-type atmosphere there, one of the tools that I have started to lean on heavily is something most people probably haven't heard of, and this is for managing the open-source community. It's something called Bitergia. And it's an analytics platform, but it helps me understand the health of the open-source community, and it helps me inform the engineering team of the activity around multiple projects, and who's contributing, and how long is it taking for issues and pull requests to be closed and merged? What's our ratio of pull requests and issues being closed for certain reasons. And these are all interesting business-y analytics that is important for our entire engineering organization to understand because we are an open-source company, and we rely heavily on our community for understanding the health of our business.Emily: And speaking of, how can listeners connect with you?Kevin: There's a couple different ways. One is through just plain old email. And that is kevin.crawley@containous—that’s C-O-N-T-A-I-N-O—dot U-S. And also through Twitter as well. And my handle is @notsureifkevin. It’s kind of like the Futurama, “Not sure if serious.” I mean, those are the two ways.Emily: All right. Well, thank you so much. This was very, very interesting.Kevin: Well, it was my pleasure. Thank you for taking the time to chat with me, and I look forward to listening to the podcast.Emily: Thanks for listening. I hope you’ve learned just a little bit more about The Business of Cloud Native. If you’d like to connect with me or learn more about my positioning services, look me up on LinkedIn: I’m Emily Omier, that’s O-M-I-E-R, or visit my website which is emilyomier.com. Thank you, and until next time.Announcer: This has been a HumblePod production. Stay humble.

UncategorizedSeptember 9, 2020

Exploring Single Music’s Cloud Native Journey with Kevin Crawley

Links

Transcript

Free eBook Download: Positioning Free Open Source Software