A couple of weeks ago, I had the opportunity to speak about Azure Cosmos DB at our local Microsoft Developers community. From the questions I got during and after my talk, I sensed that many developers already using Cosmos DB - or considering using it - were seeking stories from the field, told by other developers who are also actively using it and could share their unbiased experiences: What use-cases are they using Cosmos DB for? How well does it work for them? What pain points did they encounter?
As I was trying to find a way to fill this gap, I noticed this tweet in my timeline:
I realized that it would be great to interview developers willing to talk about their Cosmos DB experiences and share their stories here. So I reached out to Lance, asked him if he was interested and he accepted right away! Here's a transcript of the conversation we had together over Skype.
Let's start with Lance introducing himself:
Lance continued by presenting his company and the context in which they're using Cosmos DB:
"The name of the company is Mesh Technologies and our primary product is a social event planning app. So basically we're helping people to plan events, things like bachelor parties, vacations, business trips or even something as simple as a concert with friends. Our app lets our users share their flight or hotel information and chat all about it in one singular place. Normally what people do is they would start a conversation in emails, then they would go to text messages and it's just hard to gather all that information together. That's what Mesh is hoping to solve."
Interestingly, it turns out that there was already a first version of Mesh being built before Lance joined the team and he's currently working on a complete rewrite to modernize the platform:
Talking about his experience with Cosmos DB, Lance explained that he didn't have prior exposure to any non-relational database and wasn't particularly keen to try it at first, but got pleasantly surprised by its comprehensive management capabilities:
As this is the first time that Lance is working with a NoSQL data store, I was very interested to hear about his transition from relational to non-relational models. I know that the paradigm shift required by such a transition can be a major pain point for developers, but Lance embraced this shift successfully:
"I don't think the pain point was ever Cosmos DB or NoSQL; it was in my mind, it was the way I thought about it. When learning how to build data models, you're usually told that you have a users table, and that table has rows with foreign keys, or you have tables that are dedicated to those relationships between users and other items for example. So you don't want data replication, you don't want the same data to exist in multiple tables, and if that happens it means you have a bad structure, or a bad data model design. That was really the biggest pain point, having to realize that there's another side to that argument which is, we can serve your data at lighting fast speeds, we can give you large datasets if you need them, and all you have to do is to think about the data differently! So the pain point for me really was this idea that my world was blown apart, because all the sudden I was told that the sky's not blue anymore, the sky can be green or red or purple or whatever color I want it to be! So that was fun, it was actually a great learning experience."
Here, Lance is referring to the way you should adapt your data model to fit the constraints of a non-relational database, like using denormalization (which I covered in a previous article). For an extensive overview of data modeling best practices with Cosmos DB, be sure to watch this Ignite 2018 session with Deborah Chen.
"One of the early learning points was that in Cosmos DB, you're creating collections and you're automatically thinking about them as tables, but you should not. I originally thought OK, it's not a table, it's not a table... but what did I do? I made them all like tables! So I would just create a users collection, and then an images collection, and soon after I was being told that I was going over my BizSpark credits and I was like, why? So I searched online and it was actually easy to fix: you use partition keys and you keep it all within a single collection or maybe just a few collections. That was an interesting learning point."
Another topic I wanted to explore with Lance was scale and performance. Cosmos DB is built on a "provisioned performance" model where you have to define the throughput you expect your collections to meet (expressed in Request Units per second). Some developers find this model to be too rigid but Lance had a different opinion on that:
"As any start-up, we obviously have ambitions to scale to millions and we'll need some elasticity in terms of scale. Talking about provisioned throughput, I'm OK with this because if you think about the alternatives, it's a dedicated server, that itself has limitations. It's just a different version of the exact same thing we're given in Cosmos DB because a dedicated server has cores, it has limited processing power. But in order to provision new servers, then replicate all your data, propagate access to those servers at scale, we're talking about days and it's just a lot to do. So actually I'm OK with this because with Cosmos DB you can easily scale it up and as long as I'm diligent in designing my data models in a way that scales, it will allow Cosmos DB to scale it for me; I can up my throughput by issuing an API call that will upgrade that throughput within minutes. Give us another year and let's have this conversation again, I'm sure I'll tell you that I'm completely happy with this!"
On that topic of scale, Lance mentioned that he appreciates the convenience of scaling his collections' throughput programmatically, but he would also welcome new ways to perform that automatically:
"I'm hoping that things like auto-scaling will be coming within the next year, and when I say auto-scaling, I'm not saying something that's 100% dynamic and would ensure that no request gets throttled, but something in the middle would be great. Right now, there are a lot of people who have to redesign the wheel every time and put together code to say, on Monday at this time, do this. So if the team could just give us a tool to allow us to do that and create simple algorithms that hit those APIs for us, it would still be our responsibility to design the correct auto-scaling rules but it would make things much easier."
Overall, it was great to hear that Lance is having a very positive development experience with Microsoft Azure in general, and Cosmos DB in particular:
"With Azure Cosmos DB and Azure Functions, Mesh is really embracing a serverless approach and its scalability is gonna serve us really well."
I would like to thank Lance for his kind and enthusiastic participation to this little interview. Make sure to check out Mesh the next time you're planning an event with your family, friends or colleagues!
Are you a developer currently working with Azure Cosmos DB and would like to share your experience here? Feel free to contact me at firstname.lastname@example.org!