How to Run Python Scripts in AWS?

Home » AWS » How to Run Python Scripts in AWS?

1. AWS Python Environment

1. AWS APIs

Everything in AWS is an API call, and each AWS service has its own set of APIs to interact with.

2. AWS Command Line Interface

The AWS Command Line Interface (AWS CLI) is an open-source tool that allows you to interact with AWS services using commands in your command line shell.

Once configured, the AWS CLI can run commands from your terminal program’s command prompt, providing functionality equivalent to the browser-based AWS Management Console.

3. AWS Cloud9

AWS Cloud9 is a cloud-based IDE with numerous built-in features for developing and collaborating on projects. One feature Cloud9 offers is the use of the AWS CLI.

The AWS Cloud9 IDE can be accessed via a web browser and configured to your preferences. You can switch color themes, bind shortcut keys, enable language-specific syntax highlighting, and format code, among other things.

In Cloud9, you should avoid hardcoding AWS credentials into your code for any reason. Instead, rely on IAM roles for access whenever possible.

4. AWS SDK for Python

Boto3 is the AWS SDK for Python. It allows Python developers to create, configure, and manage AWS services such as EC2 and S3.

Some concepts in Boto3 include:

Client: Provides a low-level interface to AWS, with methods that map closely to service APIs. All service operations are supported by clients, which are generated from JSON service definition files.

Resource: Represents AWS’s object-oriented interface, offering a higher-level interface than service clients.

Session: Manages state about a particular configuration. By default, a session is created for you when needed.

Credentials: Can be configured in multiple ways. Regardless of the source, you need both AWS credentials and an AWS region set to make requests.

5. AWS Toolkit and AWS Serverless Application Model

The AWS Toolkit for PyCharm is an open-source plugin for the PyCharm IDE that simplifies creating, debugging, and deploying Python applications on Amazon Web Services.

The AWS Toolkit uses the AWS Serverless Application Model (AWS SAM) to create and manage AWS resources such as AWS Lambda functions.

Serverless applications are combinations of Lambda functions, event sources, and other resources that work together to perform tasks.

2. API Gateway

Amazon API Gateway is an AWS service used to create, publish, maintain, monitor, and secure REST, HTTP, and WebSocket APIs at any scale. API developers can create APIs to access AWS or other web services and data stored in the AWS cloud.

1. API Gateway REST APIs

API Gateway supports various types of APIs, with a focus here on REST APIs.

A REST API in API Gateway is a collection of resources and methods integrated with backend HTTP endpoints, Lambda functions, or other AWS services. API Gateway REST APIs use a request/response model, where a client sends a request to the service, and the service responds synchronously. This model suits many applications that rely on synchronous communication.

● Method Request: The method request accepts client input and optionally validates it if configured. If validation fails, API Gateway immediately fails the request.

● Integration Request: The client input is passed to the backend through the integration request. This is where you configure the backend resource to which the API will pass client input. It is also where you can perform any mapping or data transformation using VTL (Velocity Template Language) mapping.

● Method Response: The method response returns the backend’s output to the client via the integration response.

● Integration Response: The integration response encapsulates the backend’s response into an HTTP response. You can map the backend response to specific HTTP codes.

2. Features of API Gateway

a) Request Validation

API Gateway can perform basic validation, executed in the method request and method response of the API. For basic validation, API Gateway checks one or both of the following conditions:

● Required request parameters in the URI, query string, and headers of the incoming request are included and not blank.

● The applicable request payload adheres to the JSON schema request model configured for that method.

b) Models

Models in API Gateway define the format of data received in the Method Request or sent out in the Method Response.

c) Mapping

Mapping templates are written in Velocity Template Language (VTL) and can be applied to the integration request or integration response of a REST API. Mapping templates allow data transformation, including injecting hard-coded data or altering the data’s shape before passing it to the supporting service or before sending the response to the client.

3. Deployment of API Gateway

A Stage is a named reference to a deployment, used to manage and optimize specific deployments. For example, you can configure stage settings to enable caching, customize request limits, configure logging, and define stage variables for testing.

Stage Variables are name-value pairs defined as configuration attributes associated with the deployment stage of a REST API. They function similarly to environment variables and can be used in API settings and mapping templates.

3. AWS Lambda

AWS Lambda is a compute service that runs your code without provisioning or managing servers. AWS Lambda executes your code only when needed and automatically scales from a few requests per day to thousands per second.

1. Runtime Environment

When AWS Lambda executes a Lambda function, it provisions and manages the resources required to run your Lambda function. When you create a Lambda function, you can specify configuration information such as the amount of memory allowed for the Lambda function and the maximum execution time.

An important concept in AWS Lambda is execution context reuse. This means that objects declared outside of the function’s handler method remain initialized, providing additional optimization when the function is invoked again.

When the code executes in Lambda, the service starts a micro virtual machine that downloads your code and all the necessary dependencies, and then runs it. AWS designed this technology with Lambda and the multi-tenant containers used in AWS Fargate in mind.

2. Lambda Permissions

There are two types of permissions to consider when using AWS Lambda.

The first type is execution permissions. The execution permissions for a Lambda function are controlled by an IAM role. This role contains policies that allow or deny specific API calls. To enable the code running in the Lambda function to call AWS APIs, the execution role must include permissions for those API calls.

The second type of permission is resource-based policies. Resource-based policies are used to control access to invoke and manage the Lambda function.

3. Lambda Push/Pull Models

When setting up triggers to invoke a Lambda function, there are two main models.

The first is the push model. In the push model, the trigger sends an event to Lambda, which then invokes your Lambda function. When using the push model, the resource-based policy must allow the trigger to invoke the Lambda function.

The second model is the pull model. In the pull model, AWS Lambda pulls events from the event source and then invokes the Lambda function, known as event source mapping. For example, with Amazon Simple Queue Service (SQS), SQS allows messages to accumulate in the queue, and your code processes these messages. Instead of invoking Lambda for each received message in SQS, messages are accumulated in the queue, and the event source mapping defines the trigger for the Lambda function. AWS Lambda pulls events from SQS into the Lambda function.

4. Lambda Asynchronous vs. Synchronous Invocation

Lambda has a cold start issue where the part of the function outside the handler persists for a short time after a Lambda function finishes running, such as database access, reducing runtime for consecutive invocations of the same function. You can control variables that always remain running by setting Provisioned concurrency.

When calling a Lambda function through CLI or SDK, you can choose the invocation type (synchronous or asynchronous) based on the API call used. When AWS services trigger Lambda functions, the invocation type is typically determined by the configured trigger. For example, S3 invokes Lambda asynchronously.

Some services like API Gateway can autonomously choose the invocation type. By default, API Gateway invokes Lambda synchronously. In scenarios involving long latency operations, it’s better to invoke it asynchronously and use another mechanism to notify the client that the operation has been completed.

5. Lambda Versioning and Aliases

Versions: You can publish a new version of an AWS Lambda function when creating a new function or updating an existing one. Each version of a Lambda function has a unique Amazon Resource Name (ARN).

Aliases: An alias is essentially a pointer to a specific Lambda version. Each alias you create has a unique ARN. An alias can only point to one function version and cannot point to another alias. You can update an alias to point to a new version of the function.

4. AWS Step Functions

AWS Step Functions are used to coordinate distributed components and analyze the flow of distributed workflows. Step Functions are based on the concepts of tasks and state machines, defined using the JSON-based Amazon States Language. Step Functions provide a graphical console to arrange and visualize components of an application as a series of steps.

1. States in State Machines

State machines include several types of states:

Task: Performs some work within the state machine.

Choice: Makes choices between different branches of execution.

Fail/Succeed: Stops execution with a failure or success.

Pass: Passes its input to its output or injects some fixed data.

Wait: Provides a delay for a certain time or until a specified time/date.

Parallel: Starts and executes multiple branches in parallel.

Map: Iterates over a dynamic set of steps.

Except for the Fail type, all state types allow full control over input and output. They can be controlled using InputPath, ResultPath, and OutputPath. Paths are strings starting with `$` that identify components in JSON text.

InputPath: Determines which part of the data sent as input to the state is passed into the processing of that state.

ResultPath: Inserts the result of a Lambda function into nodes within the input.

OutputPath: Applies another filter to decide which node to keep from this data combination.

2. Integration Step Functions with AWS Services

AWS Step Functions integrate with various AWS services, enabling API operations to be invoked and coordinated directly from the Amazon States Language in Step Functions. Three service integration patterns control these AWS services:

● Call the service and let Step Functions proceed to the next state immediately after receiving an HTTP response.

● Call the service and let Step Functions wait until the work is completed.

● Use a task token to call the service and wait for the token to return with the payload.

3. Other Features of Step Functions

Workflows are divided into two types:

● Standard workflows: Ideal for long-running, persistent, and auditable workflows.

● Express workflows: Suitable for high-volume event processing workloads such as IoT data ingestion, stream processing and transformation, and mobile application backends.

Activities: Activities can be created in AWS Step Functions. Activities serve as a way to associate code running in places like Amazon EC2 or Amazon ECS (referred to as activity workers) or any external compute with specific tasks in a state machine. When Step Functions reaches an activity task state, the workflow waits for activity workers to poll the activity task.

Activity workers poll Step Functions using the GetActivityTask API, sending the ARN of the relevant activity in the request. GetActivityTask returns a response that includes the JSON input string of the task and a taskToken.

Some applications require implementing callback patterns. Callback tasks provide a way to pause the workflow until a task token is returned. A task might need to wait for human approval, integrate with third parties, or call legacy systems. For such tasks, you can indefinitely pause Step Functions and wait for external processes or workflows to complete.

5. AWS X-Ray

AWS X-Ray is a service that collects data about requests served by your application and provides tools to view, filter, and gain insights into this data for identifying issues and optimizing performance. For any traced request in the application, detailed information not only about the request and response but also about calls made by the application to downstream AWS resources, microservices, databases, and HTTP Web APIs can be viewed.

Concepts of AWS X-Ray:

Segments: Compute resources running application logic send data about their work as segments. Segments provide the name of the resource, details about the request, and details about the work done.

Subsegments: Segments can break down data about work done into subsegments. Subsegments provide finer-grained timing information and details about downstream calls made by the application to fulfill the original request.

Service graph: X-Ray uses data sent by the application to generate a service graph.

Traces: Trace IDs track the path of requests through the application. Traces collect all segments generated by a single request.

Annotations: Annotations are simple key-value pairs indexed for use with filter expressions. Annotations are used to record data that you want to use for grouping traces in the console or when calling the GetTraceSummaries API.

Metadata: Metadata is key-value pairs with values of any type, including objects and lists, but are not indexed. Metadata is used to record data to be stored with traces but not used for searching traces.

X-Ray Daemon: AWS X-Ray Daemon is a software application that listens for traffic on UDP port 2000, collects raw segment data, and relays it to the AWS X-Ray API.

CloudWatch Logs:

Log events: Log events are records of some activity monitored by an application or resource.

Log streams: Log streams are sequences of log events sharing the same source.

Log groups: Log groups define sets of log streams sharing the same retention, monitoring, and access control settings.

Metric filters: Metric filters extract metric observations from ingested events and convert them into data points in CloudWatch Metrics.

Retention settings: Retention settings specify how long log events are retained in CloudWatch Logs.

6. Optimization

1. Amazon CloudFront

AWS Edge sites are used by Amazon CloudFront to cache copies of your content for faster delivery to global users. By deploying content to edge locations, you reduce application latency for end users because they access resources physically closer to their locations than where you originally hosted the resources.

2. Response Caching

We can enable API caching in Amazon API Gateway to cache responses from endpoint executions. Caching reduces the number of calls to endpoints and improves latency for API requests.

3. Lambda@Edge

Lambda@Edge is a feature of Amazon CloudFront that allows you to run code closer to application users’ locations, improving performance and reducing latency.

4. Lambda Layers

Lambda functions can be configured to incorporate additional code and content in the form of layers. Layers are ZIP archives that include libraries, custom runtimes, or other dependencies. Using layers, you can use libraries in your functions without including them in the deployment package, keeping your deployment packages smaller and making development easier.