Python Environment Setup in AWS

Beverly Wang
6 min readDec 9, 2021

--

Photo by Ryan Yao on Unsplash

As Infrastructure as a service (IaaS) becomes more accessible to individuals, more data fans may end up using cloud service to train their models. This essay aims to play as the first step for those who want to use cloud to do data processing and model development. It will focus on setting up python environment in AWS, specifically Elastic File System (EFS). As long as you launch an instance in the same region as EFS, you can mount EFS to that instance and use the python environment you set up before. The general steps are as below:

  • Log into AWS
  • Initiate EC2 Instance
  • Install Anaconda to EFS
  • Create Virtual Environment and Install Packages
  • Stop EC2 Instance

Log into AWS

Sign in AWS as IAM user. Enter Account ID, then IAM user name and password. After that, you will get access to AWS console. We need to launch an EC2 instance to transfer files and install softwares in AWS. EC2 is a very flexible service provided by AWS. You can interpret an EC2 instance as a virtual computer.

Initiate EC2 Instance

To launch an EC2 instance, first type “EC2” in the search bar, then click “EC2” in the search result. It will lead you to the page of EC2 service.

Click “instances” on the right bar and then click “launch instances”. There will be some configuration steps to complete for the instance. Choose Free Tier options for this instance, because computing power and storage do not matter and no need to waste money on that. For example,

  • AMI: Amazon Linux
  • Instance type: t2.micro
  • Configure instance details: no need to change
  • Add storage: default setting
  • Add tags: no need to add
  • Configure security group: create two custom TCP rules. One with port 8000 and source “Anywhere” (for the use of internet download), the other with port 8888 and source “Anywhere” (for the use of jupyter notebook)

Then, click “Review and Launch” and then on the next page “Launch”. You will get a pop-up window to select a Key pair. If you don’t have a key or want others to use this instance, you can create a private key for them. Otherwise, you can select an existing key if you already have one.

In a few minutes, the instance will start and pass the health check. Then, go to the EC2 dashboard, you will find instance state of the instance you created just now becomes “running”. Mount EFS on the instance then. I will not cover it in this essay. You can refer to the link on how to do it https://docs.aws.amazon.com/efs/latest/ug/wt1-test.html#wt1-mount-fs-and-test

Install Anaconda on EFS

Following the steps above, we will SSH into the instance you created. Right click the instance ID, and select “connect”, go to the “SSH Client” tab, note down the information of the public DNS. Remember, each time when you start the instance, it will be assigned with a different public DNS unless you attach it with an elastic IP.

Open a command prompt on your computer, run to SSH into the EC2 instance:

ssh -i “myEC2KeyPair.pem” username@EC2_public_DNS

If you are logging in for the first time, use the chmod command to make sure your private key file isn’t publicly viewable. You only have to do it once.

chmod 400 my-key-pair.pem

Once log into the EC2 instance, change to admin account. (This enables you to create a folder under EFS if necessary, but you need to have the password of admin ready.)

sudo su

Get the installation package of Anaconda (Noted that it is for Linux) and run the bash file.

wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh

bash Anaconda3–2019.07-Linux-x86_64.sh

Press enter a few time and type ‘yes’ to agree. Then it will ask you to confirm the location to install, key in the folder under EFS you want to install. For example, “/mnt/efs/username”. Press ENTER to proceed.

It takes a couple of minutes to install. After that, exit the admin account to stay with your own account.

exit # exit admin account

Then, go to your own folder under EFS, you will find a new subfolder anaconda3 there. Open the bash file to add the path of your Anaconda to PATH.

cd /mnt/efs/username

nano .bash_profile

It will open the bash_profile, key in the following in the file, then Ctrl + X to exit and press ENTER in the next page:

export PATH=/mnt/efs/username/anaconda3/bin:$PATH

If Anaconda is installed properly and set as preferred environment, you will find the output of the command below is /mnt/efs/username/anaconda3/bin/python

which python

Create Virtual Environment and Install Package

Following the steps above, create a virtual environment (ensure that Anaconda you installed uses python 3.7.3) and activate it. You can find all virtual environments by using the command conda info --envs

conda create — n myenv python=3.7.3

source activate myenv

Once the virtual environment is activated, there are two ways to install packages. One is to manually install they one-by-one, the other is to use requirements.txt.

Method 1

Find a list of packages to install, and run the installation command for each of them: e.g.

conda install nb_conda

conda install ipykernel

Method 2

Generate a file of requirements.txt with a list of packages you want to install. Next, upload this file to AWS EBS (I found myself unable to upload the file directly to EFS, so my way to circumvent is to upload the file to EBS, then copy it from EBS to EFS using admin account). Open a new window in command prompt, and run:

scp -i “myEC2KeyPair.pem” “filepath_of_requirements.txt” username@EC2_public_DNS:/home/username

Go back to the window of command prompt which you previously SSH into the instance, change to admin account and copy the requirements.txt from EBS to EFS.

sudo su

cp “/home/username/requirements.txt” “/mnt/efs/username”

Go to your folder under EFS /mnt/efs/username, you will find requirements.txt there, Since the virtual environment is already activated, you can run pip install -r requirements.txt to install all packages in the file. However, the pain point is that there is always some issue with some packages which results in ERROR and interrupts installation of the rest packages. Most of time, it is OK to ignore this error and solve it when you really need to use these packages. Hence, you don’t want this error to interrupt the installation of the other packages. To do so, you can install using the following command (it also removes anything in the comments and gets rid of empty lines).

cat requirements.txt | cut -f1 -d”#” | sed ‘/^\s*$/d’ | xargs -n 1 pip install

(xargs -n 1 pip install : for each item (xargs -n 1) execute (pip install), cut -f1 -d “#" : extract the each item in the first field (cut -f1) before the symbol # (-d “#”)).

Stop EC2 Instance

Once everything is done, remember to “Stop” the instance!

--

--

No responses yet