moo

How to use Cloud without losing sleep

Hi webdev,About a month ago, I published a post about our incident while experimenting with GCP and Firebase. There were lot of great comments, and after the post few folks reached out to me on recommendations for their projects.In the new year I took some time to write best practices I follow before setting up any new project on GCP that helps me to sleep well without having any on-call team.My blog post is here: https://ift.tt/3oJcZq3 all the best practices (except images, screenshots and some quotes) to this post.I hope that small teams, and individual developers find it useful.Note:This is not a sponsored post. I don’t recommend one cloud platform over another, or even using a cloud platform if you can do the entire hosting on your own servers. Cloud platforms do provide advantage to small teams or individual developers in moving fast and scaling well.Examples are for GCP but the practices hopefully apply to all providers.This is not an exhaustive list. I’m still learning and growing. Below is what I learnt from several mistakes, figuring out solutions and going through lot of literature online from great kind minds.Let’s get into it!1. Use multiple forms of payments (FOP), preferably with spend capsHave spending limits/caps on the forms of payments (FOPs) you use.Most cloud services have a monthly billing except if a threshold spend is reached. If the threshold is reached, the Cloud platforms charge the given account right away. If for some reason, one of your services fault, the cloud service would directly charge the form of payment (like credit cards) on file, and on non payment, stop the service.Ideally the spending limit could be anywhere between 120-150% of the cost you expect to incur given the usage of your platform.From what I’ve read, it’s relatively easier to get bill waived than to get a refund. While the suggestions in this post will make sure such situations don’t arise, it’s better to have it set up anyways.Use different FOPs for development and production.After we messed up in one of our projects, our credit card was declined, and this had several unexpected ripple effects.First GCP suspended all our billing accounts tied to the same credit card for a suspect of fraud.Second, our bank started suspending all future transaction requests from GCP suspecting fraud, and the loop began.Sorting this took several days, and it halted our development cycle. Thankfully we didn’t have a production service back then.In an ideal setup, I recommend using one FOP for production account, and another one for dev/test accounts. Never mix these two.Note:If you’re a solo developer, I highly recommend setting up an LLC or some umbrella that gives your personal assets protection. Doesn’t matter where you live, in recent years this process has become quite standard everywhere in the world, thanks to the entrepreneurial boom.Some people have suggested setting up a shell company for using cloud services. I personally don’t recommend spending time doing this. If you’re building something meaningful, spend all your time making it better in a legit way.2. Setup Service QuotasIn GCP users can define “Quotas” for most services. AWS has similar feature called “Service Quotas”, and I’m sure other providers have this as well.Depending on type of Quota, some of them can be set for usage per day, per minute or even per user per minute. I don’t really trust the per user quota because who defines “a user”? The others quotas for the most part are fairly reliable.For any account with billing, when I enable a new service, I first check if there’s a quota, and if there’s one, I set it to really low while getting familiar with the service. A lot of the services are charged per use and setting up this quota also helps in validating the costs that will be incurred on use.As an example, if I’m testing GCP Adult Image Classifier service, as soon as I enable the service, I would first set the Quota to 100 per day and then try the Codelab provided.If the cloud provider you use neither has auto billing shutoff nor budgets, it should be a big red flag for you to use the service.Some CaveatsDefault Values are counter intuitive!Most services have a preset default quota of unlimited or some absurd value like 1,000,000. Why? God knows. Maybe GCP engineers never paid attention to it, or they assume each of their users, considerable population of which is students, build apps with million dollar budgets. Setting quotas is also multi click arduous process probably because they are optimized for large organizations. Don’t fret the extra clicks, it’s worth it.Not all services have Quota limitsFor example Firestore Read/Write ops. From engineering standpoint, this is understandable because if the service has to check for quota, how can it be real time? Don’t assume that every service has a quota.Not all quotas work as advertisedTest each of the quotas before fully relying on them. I found several that don’t work and my consults led to bugs with Google. The bugs are not fixed yet, and so if something were to go wrong with those services, they would act as a fall back because it’s an issue on GCP’s side.3. Cloud MonitoringWhile billing is delayed by about a day, most metrics provided by Cloud platforms are delayed by only a few minutes. For GCP this is called Cloud Monitoring, for AWS it’s called CloudWatch and in Microsoft – Azure Monitor.These monitoring services are either free for the project services (standard metrics), or available at very, very cheap prices.Setting up Monitoring wasn’t really in the face when I started using Cloud Services, nor is it generally advertised, but it’s a great feature that everyone should use.What it allows:Creating beautiful custom dashboards with usage graphs for the services you care aboutCreating alerts that fire if usage goes beyond a user defined limit. The alerts can be SMS, email and app notifications.How to use Monitoring serviceSet up AlertsYou can also set up alerts that’ll fire emails, text messages and mobile app notifications, all for free. While they may not be of much use while you’re sleeping, they do provide a lot of support while you’re awake.Case study from our development cycle: During development of our first product Announce, one of the engineers on my team developing locally (live build on localhost) accidentally created an infinite loop that led to infinite Firebase read ops for a few minutes. We had a free limited project for development so nothing could go wrong, but the monitoring alerted me and I reached out to the team to see what was up.The only down side in this was that we had a very limited quota remaining for the day and the team waited until next day to resume development.Look for spikesMost of the graphs may look daunting and it’s probably not worth understanding each and every metric and figure. To simplify your job, just look into anomalies.After few steady days, your job should be to watch out for spikes in usage as compared to the usage in longer period of time. Set the usage to 6 weeks and see if there is a spike. If there is a spike, it should be explained, say too much local testing, surge in usage, expected triggers etc.Looking for spikes only takes 2-3 minutes for the dashboard to load, and your job is done. This should be part of daily working routine if you have any service deployed on cloud.Anticipate and Reduce CostsAs the graphs are only a few minutes late, after doing something you can wait for a few minutes to see what resources you consumed. You can also predict costs of something before it shows up in billing.Another big advantage is aide in reducing costs! For example, looking at Storage, I understood that GCP was continuously storing build artifacts from each build which took a development project storage that shouldn’t be using anymore than 100MB to 25GB. There are ways to fix this, but as a developer, one must know that this problem exists.Another example is memory or CPU consumption from the deployed services. Some graphs can easily tell if you have over/under allocation of resources to the services.Write better CodeThese graphs easily show low hanging fruits to optimize code. Maybe your cloud function is going into background processes, or perhaps its timing out due to some other service being called serially (instead of asynchronously).4. Use Free ProjectsCloud Platforms today provide a lot of free services per project, and there are a lot of projects that a user can create. There’s a reason why the systems are set up in such a way. Primary reason, as far as I can think of, is to give enough room for testing and learning about the service.Firebase and GCP allow more than 10 projects, which can be extended further. If there’s some issue with your account, you can always file for an extension.That’s great… but how to use these free projects?Well, for one, set up multiple environments. Software development cycle is common knowledge so I’ll be brief about it. In any project ideally you should have dev, test, staging (alpha), preview (beta) and production environments. No matter how small the project be, you should still have at least dev and prod environments that are completely decoupled from each other. Read that again: completely decoupled from each other.You can use a free firebase project for development, and a paid one for production. If you have backend services that need cloud sources, you can grant permissions to your free project sources to the paid project service accounts.This topic could be an entire post, as creating an architecture that supports multiple environments (dev, test, prod) for multiple platforms (Web, Android, iOS, API) can be a little complex.Granting permissions of one project to anotherAn important concept to understand is types of accounts and accesses. There are two types of user accesses in services: 1. Access to human, and 2. Access to a service account.Ideally almost all accesses should be only to service accounts and none to humans (except maybe admins).In GCP it’s possible to give permissions to access resources of one project from another project, which is a great feature. I’m very sure every other platform has the same feature and I highly recommend using it.5. Spend good amount of time understanding and predicting costsBefore going through any Codelabs for a service, first spend some time on Pricing of that service/feature. This is tied to designing the architecture of a project which can be a long post of its own.Most cloud services now have a great cost calculator (GCP Calculator here). This is a tool for developers to decide whether and how to use that service. Spend some time on these calculators, and test the costs in extreme cases.Over the course of time, I have started testing services for a day or two in a safe environment (development account). Wait for the billing to process and when I understand it correctly, then move on to integrating it in the product.A better approach would be to predict costs before and then test the service. The cost of service and your prediction should match. Never assume the pricing works as is because there are lot of factors to consider.6. CICD = Operational EfficiencyAny project that has billing enabled, should only be accessed by machines (service accounts) and not humans. This means all code should be deployed through CICD piplines triggered after code merge, which should ideally only happen after code review.This falls in operational efficiency, and it doesn’t matter how small your project be, it’s a good idea to set it up like this because there’s no reason not to.Less human intervention is better, both for security and efficiency. If you’re on Github, Github Actions is one of the most valuable sources available, which is mostly free. Until now, there hasn’t been a month that I paid for Actions and I manage lot of CICD Pipelines.GCP Cloud Build gives 120 minutes per day per billing account free, which to be honest is a lot. Unless you’re deploying ML models, or have lot of projects + large team, I highly doubt that this free tier would be breached.Firebase has a great CLI, but I don’t recommend using it for deploying to production environments. For one, it’s harder to version the code and you might end up deploying buggy code. Second, any change in the deployment has to go through your machine.CICD forces you to be good at code management and versioning, and that’s a good thing.What if I have to deploy just one function?I still recommend setting up CICD because it’s reusable. Once you’ve figured out to set it up for one function, all your future deployments will require zero work.7. Protect the keys (and tokens)!Most starter Codelabs suggest downloading “service-key.json” and setting the environment locally. This is a great suggestion for someone testing a new service on free projects. However, any project that you plan on adding Billing, don’t download a key. It’s just not needed, and there’s lot of risk that goes with it.For example you may accidentally add it to a git commit, forget to delete it while sharing code, or simply leave it somewhere on your system.I do download service keys but only when absolutely required, and only for development accounts that are on Free plan. For example in order to use Firebase emulator.When keys are needed in CICD, encrypt them right away and delete the local copies before committing any code.You can easily use encryption on Cloud without spending any time learning encryption itself. For example KMS and Secret Manager on GCP are as simple as it can get.8. Multi CloudIf you have resources and not bootstrapped with an extremely small team, you probably are using more than just one Cloud. DO, GCP, Azure and AWS, all have some great pluses and minuses. If you have dedicated dev-ops and SREs I highly recommend making use of multi-cloud.However, if you’re a solo indie developer, or a small startup, I recommend using one cloud unless you really need just one feature from another cloud service. This, because each cloud service has its own dictionary, and learning curve.It doesn’t matter which service, which language, which platform you choose, my recommendation is to spend as much time on it to learn and experiment that you’re the most knowledgeable about it. Of course once you’re confident, you can expand to other platforms as well.Each cloud provider has similar set of tools, but using them, understanding their pricing/dictionary, and navigating their console takes lot of time to learn and adapt. My recommendation is to master the one you choose and optimize your usage based on the positives and negatives of that platform.9. Read the best practices from the providerThink of cloud providers as coffee machines. The fundamental principles of making coffee remain the same, but the engineers who design the machines chose to prioritize a certain set of features that lead to certain design trade offs. The machine must be used in a certain way to work the best, and certain things should always be avoided.Cloud Platforms are very similar to fancy expensive coffee machines in this way. They come with a guidebook manual that’s continuously evolving, and often has a set of dos / don’ts included.I highly recommend going through the dos and don’ts for each of the service you use not only for best performance, but also to avoid unnecessary troubles.Good example of this is Google Cloud Run development tips. This article clearly states that developers should avoid background activities. If you as a developer didn’t go to this page on your own, and wrote a program that has some potential background activities but works perfectly on your local machine, you’d never realize why your code performs so badly on Cloud.Point being, always proactively look for best practices for any service you use. SO and Medium have some great resources for most services.10. Billing Budget Alerts / NotificationsEvery single guide, article, post recommends setting up billing alerts, and auto shut-off cloud functions. This my least favorite feature, and the last on my list simply because it’s not proactive but post facto.I do recommend setting Billing/Budget alert because it’s there, but it isn’t a feature I suggest anyone to rely upon.Note: If your cloud service allows for near realtime auto shutoff after hitting a certain cap, then that’s really good, and I recommend using that feature and sticking to that platform. via /r/webdev https://ift.tt/3i9ysWL

Categories: Uncategorized