Metabase Up and Running
上QQ阅读APP看书,第一时间看更新

Creating the Metabase application

With our sizing decisions made, and our IAM user and VPC created, we can now configure and deploy our Metabase instance inside our new VPC. We'll get started by leaving AWS momentarily to visit :

  1. Scroll down to the bottom of the page, where it reads Read about how to run Metabase on AWS, and click the AWS link.
  2. From there, click the Launch Metabase on Elastic Beanstalk link. You'll be redirected to the AWS console and prompted to create the web app.

Creating a web app in Elastic Beanstalk

Before we continue on the Create a web app page, let's get clarity on some confusing AWS terminology.

In Elastic Beanstalk, an application is defined as a "logical collection of components, including environments and configurations" and can be thought of as a folder. The actual running instance of Metabase, what we generally think of as "our application," is called an Environment in Elastic Beanstalk parlance. An Elastic Beanstalk application can contain multiple environments, meaning we could have several different Metabase instances running. Reasons we might want multiple instances would be having a testing and production environment, or to keep an older and newer version of Metabase running concurrently.

Now that we understand what AWS considers an application versus an environment, let's create both. Follow these steps to fill out the form properly:

  1. In the Application Information section, you are asked to provide an application name. It can be anything you like, although I think the default of Metabase makes the most sense.
  2. Next, let's fill out the Environment Information section. As mentioned above, the environment refers to this specific instance of Metabase being deployed. In the Environment name field, enter a name. This is simply a label and won't appear outside of AWS. I would recommend keeping the default Metabase-env. Next, we will add a domain, or URL, for our environment.

    Under Domain, you will choose a unique domain name for your app, which will be the URL that will take you to your Metabase instance, so make it something memorable. I'm picking the name pickles-and-pies to go along with my made-up business from Chapter 1, Overview of Metabase, which we'll explore in more detail later. Whatever you pick will be concatenated with .us-east-1.elasticbeanstalk.com, so it will not be the most attractive and shareable URL. To remedy that, we'll learn how you can redirect this to your own domain or preferably a subdomain in Chapter 3, Setting Up Metabase. Most organizations like to use a format like www.mb.mydomain.com or www.metabase.mydomain.com.

  3. The Platform section, as of this writing, will be prefilled with the following values. These will change as new versions are released, so I recommend just using whatever is prefilled:

    a. Platform: Docker

    b. Platform branch: Docker running on 64bit Amazon Linux 2

    c. Platform version: 3.1.0

  4. The next section is called Application Code. Click the radio button next to Upload your code. Clicking this button should reveal another section called Source code origin.
  5. In the Source code origin section, a radio button should be next to Public S3 URL. There should be a URL prepopulated from downloads.metabase.com. There should also be a default value in the Version label section.
  6. Click Review and launch to configure your environment.

    Important note

    Although the link Metabase provides redirects you to the Northern Virginia region in the AWS console, you are allowed to pick a different region if you like. However, the public S3 URL in step 6 is specific to Northern Virginia, so you will get an error if your region differs. To fix this, you can visit the Public S3 URL in a browser and download the zip file locally. Then, change the Source Code Origin to Local file and upload the zip file you just downloaded.

You have now created both an application called Metabase and an environment called Metabase-env. Next, we will configure this specific environment.

Configuring your environment

The next page presents us with a 4x3 grid of configuration options. This is where we can fine-tune everything about our application. Let's learn what each card is and what we should do with each one.

We'll start with software, which is the option in the upper left-hand corner of the grid. Click Edit on the card to get started.

Software

There is not much you need to do in this section. Most of it will just be ensuring that the defaults are properly filled in:

  1. In Container Options, the proxy server should be Nginx.
  2. Make sure AWS X-Ray is not enabled (unchecked).

The next two sections have to do with storing log files. Your Metabase application has a feature where you can view log files, but if you would also like to store your logs using other AWS services, you may do so (you may incur costs by doing this):

  1. If you would like to be able to store logs in Amazon's S3 storage system, click the Enabled checkbox in the S3 log storage section.
  2. You can also stream logs to another AWS service called CloudWatch, which is a service that allows you to monitor the health of your app. To turn this on, click the Enabled checkbox under Log Streaming. If you do, you can also choose a retention policy for your logs, which will determine how many days the logs are kept. You can also decide whether you want to retain these logs after your environment is terminated or delete them.
  3. The final section, Environment Properties, allows you to pass environment variables to your application. Since we will never need shell access to instances running our application, we will leave this blank.

At this point, you can click Save. Even though there were not any required actions to do in this section, you now know what all the options are for. Next, let's move on to the Instances card.

Instances

In this section, you can configure the storage options for your instances. However, unless you have a good reason to change it, I recommend keeping the Container Default option.

Leave the Instance Metadata service (IMDS) section blank. The EC2 security groups should also be blank. We will learn more about security groups later in the chapter. You may click Save or Cancel since we didn't take any action in this step.

Next, we will configure the Capacity card.

Capacity

Earlier in the chapter, we learned about scalability. In this section, we will learn exactly how to configure your app's scalability to meet your needs. Note that you can also skip this section, but if you want to learn what everything means, read on.

The first section is named Auto Scaling Group. This is where we specify how many EC2 instances we'd like to be able to scale up and down to:

  1. The first section allows us to select our Environment Type, which can either be Load Balanced or Single Instance. Choose Load Balanced, which should be the default. Even if you think a single instance is more than adequate for your application, I noticed at the time of this writing that when I try to select Single Instance, it just seems to refresh the page.
  2. Next, in the Instances section, we get to pick the minimum and maximum number of EC2 instances we'd like our environment to scale up and down to. I recommend keeping Min at 1 and Max at 4, but you can lower it if you want a low scalability option. Next, we will learn about the Fleet Composition section.
  3. The Fleet Composition section allows you to decide whether you want to use on-demand EC2 instances or a combination of on-demand and something called spot instances. Let's learn quickly what these are:

    a. On Demand means you pay the a la carte price for your instance.

    b. A Spot instance is something you bid for, and if there are resources available at prices under your max bid, you get them.

    Spot instances are an interesting idea, but I recommend keeping it simple and starting with on-demand only. As you learn about your app's needs, you can revisit this section and opt into a mixture of on-demand and spot price instances. There are many options to configure this section and they can get pretty sophisticated.

    Next, we will choose our instance type, which determines how powerful we want our compute instances to be.

    Important note

    The default value in Instance Type is t2.small, however, this is one size too large for coverage under AWS's free tier. This is the size Metabase recommends, but if you want to avoid incurring costs, use the t2.micro version instead.

  4. In Instance type, change from t2.small to t2.micro. While this is great for a tutorial, if you end up using it for more data-intensive purposes, I do recommend paying for the t2.small instance.
  5. The next section is the AMI ID section. AMI stands for Amazon Machine Image, and is what contains the information required to launch your instance. There will be a default value here, which you will keep as is.
  6. For the Availability Zones, choose Any. This will not affect the availability of your app, as we will configure that part later.
  7. Leave Placement blank, since we chose Any in the last section.

We have now picked the minimum and maximum number of EC2 instances in our scalability equation. You may be wondering how our application will decide to scale up or down. That is covered in the Scaling triggers section, which we'll learn about next. I also recommend keeping the defaults here, but read on if you'd like to understand how it works:

  1. The Metric option lets you pick a metric to base your scaling decisions on. The default is NetworkOut, which measures the amount of outbound traffic. The idea is that if outbound traffic gets high enough, another EC2 instance will be launched and the load balancer will start routing traffic there. In addition to NetworkOut, there are about 10 other metrics you can base your scaling decisions on.
  2. Statistic lets you specify what aggregation of your metric to base your scaling decisions on. The options are Minimum, Maximum, Sum, and Average. The default and recommended statistic is Average.
  3. Period is the time grain on which the statistic will be measured. The default is 5 minutes. If you've chosen NetworkOut as your metric and Average as your statistic, then the scaling decisions will be based on the average NetworkOut over 5-minute intervals.
  4. Breach Duration is the amount of time your metric needs to exceed the threshold before the scaling operation is triggered. The default and recommended value is 5 minutes.
  5. Lastly, you will specify the threshold values for your metric. The default upper threshold is 6,000,000 bytes, or 6 MB. That means that once the average network output over a 5-minute interval is above 6 MB, another EC2 instance will be added to handle the increased traffic. Similarly, the lower threshold is 2 MB. When the average network output is under 2 MB, additional EC2 instances will be removed (but won't go to zero).

Now that we know exactly how to set the rules for scaling our environment up and down, you may be wondering what it takes to reach 6 MB of network output. As I write this, I have two users using Metabase. Later in the chapter, we will learn how to monitor various performance-related statistics, but for now, here is a graph of my environment's average network output (Figure 2.9), currently showing about 1 MB per user:

Figure 2.10 – Monitoring Metabase's NetworkOut with two active users

Next, we will configure the Load Balancer card.

Load Balancer

Let's learn all the ways we can set up our load balancer. Remember that a load balancer decides which EC2 instance should be used for traffic based on usage, so if you do not believe your app is going to require multiple EC2 instances, you can skip over all these steps:

  1. At the top of the page, you'll see three different load balancer options: Application, Classic, and Network. Classic Load Balancer should be selected.

    Below the load balancer options is a section named Listeners. By default, you will have a listener configured to use Port 80, which is the standard port for HTTP traffic. You need not make any changes here for now. In Chapter 3, Setting Up Metabase, we'll learn how to configure listeners to send traffic over the more secure HTTPS protocol.

  2. The next section is named Sessions. A session in this context is a web browser connected to your app for some period of time. The app uses a cookie in the browser to keep track of the session. By checking Session stickiness enabled, your load balancer will keep a session alive on the same EC2 instance even if there is another instance available with a lower load. You can also configure how long you want that session to persist. My recommendation is to leave this disabled, but based on your specific needs, you might want to try it out.
  3. Next, in the Cross-Zone Load Balancing section, you can check whether you want your load balancer to span across multiple availability zones. If you have decided to deploy all your resources in the same availability zone, then checking this option will have no effect.
  4. In the Connection Draining section, checking the Connection draining enabled box will cause your load balancer to keep connections alive to instances in unhealthy states for a specified amount of time. I generally do not turn this on, since high availability and functionality usually isn't a requirement.
  5. Finally, the last section is Health Check. In the Health Check Path field, enter /api/health. Elastic Beanstalk will send a GET request to this path to determine whether your instances are in a good state. You can alter the configuration as you please, knowing that it is defaulted such that:

    a. The GET request will happen every 10 seconds.

    b. It will wait 5 seconds for a response.

    c. Five failures in a row will throw an Unhealthy status.

    d. Three successes will flip an Unhealthy status back to healthy.

  6. Click Save to move on.

Now that we understand how to configure our load balancer, let's move on to the next section: Rolling Updates and Deployments.

Rolling Updates and Deployments

This section allows you to configure your environment so you can update it in a rolling fashion across instances and prevent downtime. That means that when one EC2 instance gets taken offline for a software update, the load balancer will route traffic to another. Since 100% uptime is not a requirement for us, and we will likely just have a single EC2 instance, we will leave this alone.

Security

We will leave the defaults in this section and move on to Monitoring.

Monitoring

In this section, we'll learn how to configure various health reporting options for our app:

  1. If you skipped the configuration of your load balancer, enter /api/health in the Health Check path field. This field is only available if you've enabled classic load balancing.
  2. Next, in the Health reporting section, turn on the Enhanced option. This is available on the Free tier and lets you see the health of your instances in Elastic Beanstalk.
  3. In Health monitoring rule customization, you can choose to exclude ranges of status codes from your health reporting. For our application, I recommend leaving this section unchecked.
  4. I recommend keeping Health event streaming to CloudWatch logs unchecked. I feel like with the configuration we have now, we have plenty of information to monitor our app.

These steps will allow us to easily monitor the health of our application on the Elastic Beanstalk dashboard. Now let's move on to the next card: Managed Updates.

Managed Updates

This section allows you to schedule managed updates at the platform level. This is different from updating Metabase to newer versions, which we'll learn about later in the chapter. I generally leave this off.

Notifications

Moving on to the Notifications section, enter your email address in the field to get email notifications about your app. Note that you have to confirm the subscription from your email.

We are almost done with configuration. In the next step, Network, we'll configure the environment so that our newly deployed resources live in the VPC we created.

Network

Here is where we will select which subnets in our newly created VPC we want various resources to be launched in. Let's get started:

  1. In Load Balancer Settings, set Visibility to Public.
  2. For a low-availability deployment, check only one of the subnets, such as us-east-1a. For high availability, check both. Check that both subnets are in different availability zones.
  3. In Instance Settings, check the Public IP Address option.
  4. Again, for a low-availability deployment, check off the same subnet you checked two steps ago. For high availability, check both.
  5. Finally, choose both subnets for Database Settings. It is required to choose two for the app to properly deploy, so there is no single-subnet option here.

Now that we've applied our VPC settings to the environment, the last step we need to do is configure our application database.

Configuring your database

This section is where you will configure your application database:

  1. The Engine section should default as postgres. Although different database engines will work as an application database, PostgreSQL is recommended.
  2. To stay on the Free Tier, change the instance class to db.t2.micro.
  3. The Storage section can be left at 10 GB, which should be plenty.
  4. In the Username section, I've chosen mbAppDB.
  5. In the Password section, choose a password for this database. Note that you will likely never need to connect to it. In fact, the way the application is set up, you will not be able to connect to it from the public internet.
  6. In the Retention section, it's recommended to choose Create Snapshot. This way, when your environment is terminated, a snapshot of your database will be kept in storage.
  7. In the Availability section, I recommend choosing Low (one AZ). By choosing this option, your database will only be provisioned in one Availability Zone in your data center region. This means that if that Availability Zone were to have an outage, your database would be unavailable during it. Personally, I feel like that is an acceptable risk to take since I don't believe Metabase needs to be available 100% of the time; outages are rare anyway. Alternatively, you could pick High (Multi AZ), but if you do, your database costs will be higher once your free tier period ends.

    Important Note:

    Recently a bug in AWS was introduced that may cause the Database card section to malfunction. If you are unable to select values in the Database card form, please visit https://github.com/PacktPublishing/Metabase-Up-and-Running/tree/master/chapter2 for alternative instructions that will unblock you.

Tags

Optionally, you can add a Tag, which will help you tie all the resources that will be created together. A Tag is a key-value pair. You may use something like app as the key and metabase as the value.

App creation

Finally, with the configuration done, you are ready to launch. Click Create app and your environment and app's creation process will start. You'll see messages in a terminal-like window as various components of the app are created. This generally takes around 10 to 20 minutes, so go have a coffee or tea while you are waiting.

Once the app has been created, you can visit the unique URL you created and see your newly created Metabase instance.

Overview of the Metabase app infrastructure

Now that we have Metabase properly running, it may be helpful to summarize what has actually been created and see where these resources live in AWS. This is just for our understanding:

  • We created a VPC, or Virtual Private Network, to run our app in. This consists of two subnets running in Availability Zones A and B with internet gateways.
  • Our app environment consists of one compute instance, called an EC2 instance, running in one of our public subnets in an Availability Zone. According to our configuration, if the app comes under heavy usage, it can scale up to as many as four of these instances. To view the EC2 instances in the AWS console, visit the EC2 service and click Running Instances in the dashboard.
  • Our app environment also has a load balancer, currently running in the same Public Subnet as our EC2 instance. This is what will distribute traffic to each of the EC2 instances, should our app scale to multiple instances. A load balancer is a type of EC2 instance itself, so to see it in the console, go to the EC2 dashboard and click Load Balancers.
  • We have a Postgres database acting as our application database. This could be running in either subnet. In my deployment, it is running in Availability Zone B. To see your database in the AWS console, visit RDS (Relational Database Service) and click DB Instances. Note that you cannot connect to this database from your computer; at this point, only the Metabase app itself can connect.
  • We also have a number of security groups. As your app environment was created, you may have noticed in the logs that several security groups were created. Security groups are an important concept in AWS and something we'll use later. For now, all you need to know about them is that they limit what types of traffic can visit the resources you've deployed. For example, one security group opens up HTTP traffic to your app's URL from any IP address. Another one only allows your Postgres database to accept traffic over port 5432 from resources with a specific security group themselves.

Figure 2.11 – A diagram of what our app environment looks like

Your Metabase environment is now completely set up! In the next section, we'll learn how to terminate the environment, so that, if required, you can start all over again.

Terminating your environment

After all that work setting up your environment, probably the last thing on your mind is trashing all your work. However, sometimes mistakes happen and bad deploys occur. If you want to stop running your Metabase app, perhaps because you want to try re-deploying with a different configuration, just do the following:

  1. In the AWS console, find the Elastic Beanstalk service.
  2. Click Applications. You should have an application called Metabase. Click it.
  3. Find your Metabase environment. If you have been following along with the tutorial, your environment should be Metabase-env. Click it.
  4. Towards the upper right-hand corner of the page, find the Environment actions drop-down menu and choose Terminate Environment.
  5. A modal will appear asking you to confirm that you would like to permanently delete your environment. It will list some of the resources that will be deleted or released, including the URL you created. To confirm termination, type the name of your environment at the bottom and click Terminate. Again, this is likely going to be Metabase-env.

Once your environment has been terminated, it will still temporarily appear in your list of environments as Metabase-env (terminated). It will remain visible for about an hour after termination.

Deleting your application

Recall that Elastic Beanstalk considers an application a "folder of environments." In the previous step, we terminated our environment, so now let's learn how to delete the application that contained our environment:

  1. In the AWS console, find the Elastic Beanstalk service.
  2. Click Applications. You should have an application called Metabase. Click the radio button next to it.
  3. In the Actions drop-down menu, select Delete application.
  4. You will see a modal explaining which environments will be permanently deleted should you delete your application. To confirm, type the name of your application in the textbox at the bottom of the modal. This name should be Metabase.

Now let's see how to upgrade Metabase to newer versions.

Upgrading Metabase on Elastic Beanstalk

As mentioned in Chapter 1, Overview of Metabase, Metabase is constantly launching new versions with new features, bug fixes, and improvements to the current product. To upgrade Metabase on Elastic Beanstalk, I recommend first visiting their guide as it contains the link to the latest version. Their guide can be found here: https://www.metabase.com/docs/latest/operations-guide/running-metabase-on-elastic-beanstalk.html#deploying-new-versions-of-metabase Then, follow these steps:.

  1. Download the latest version of Metabase. The link will look like this, but with the latest version in place of <latest_version>: https://downloads.metabase.com/<latest_version>/metabase-aws-eb.zip.
  2. Open the Elastic Beanstalk service in the AWS Management Console.
  3. Click Applications. Find your application, which should be called Metabase, and click it.
  4. On the left side of the screen, you should see your application's name with a toggle arrow next to it. Underneath that will be a link to Application versions. Click that.
  5. In Applications versions, click the Upload Button.
  6. Click Choose file and select the .zip you just downloaded with the latest version. Give the version label a name, preferably with the new version number in it.
  7. Click Upload.
  8. Now, click the checkbox next to the new version label you just created.
  9. Once it's checked, click the Actions drop-down menu and select Deploy.
  10. You will be given the option of which environment you would like to deploy this new version to. If you have been following along, you should only have Metabase-env. Select that and click Deploy.

This will trigger the deploy process all over again, using the same configuration you had for your original environment. Next, let's see how we can monitor costs in AWS in case we decide to use more than the free tier allows.