AWS S3 Protocol - Simple Implementation
In the previous article, we intuitively learned about the S3 protocol through packet capturing. This article will take a further step on this basis, using Go language and Gin framework to write a simple object storage service compatible with the S3 protocol based on the local file system from scratch, in order to deepen our understanding of the protocol details.
This article will use the local file system as a storage backend to implement the core operations of buckets and objects, which is similar to the early working method of Minio we used for testing in the previous article.
Technology Selection
- Go Language: A statically typed, compiled, concurrent programming language developed by Google, with garbage collection features.
- Gin Framework: A high-performance HTTP web framework written in Go. It provides a concise API that is sufficient to quickly build our S3 service routes.
- Local File System: To simplify the implementation, we directly use local disk directories to simulate the bucket and object structure of S3. A directory is a bucket, and the files and subdirectories within it are objects.
Project Setup and Data Structure
Before we start coding, we first need to define the service configuration and the XML data structure required for the S3 API response.
Configuration File and Loading
We create a config.yaml file to manage the service's port, data storage directory, and S3 access keys.
config.yaml
server:
port: 9000
storage:
directory: D:\S3-Storage
s3:
accessKey: myaccesskey
secretKey: mysecretkey
server:
port: 9000
storage:
directory: D:\S3-Storage
s3:
accessKey: myaccesskey
secretKey: mysecretkey
To load and use these configurations in the Go program, we define the corresponding structures and write a function to parse this YAML file.
Code: Configuration Loading
// Config structure maps the contents of the config.yaml file
type Config struct {
Server ServerConfig `yaml:"server"`
Storage StorageConfig `yaml:"storage"`
S3 S3Config `yaml:"s3"`
}
type ServerConfig struct {
Port int `yaml:"port"`
}
type StorageConfig struct {
Directory string `yaml:"directory"`
}
type S3Config struct {
AccessKey string `yaml:"accessKey"`
SecretKey string `yaml:"secretKey"`
}
// loadConfig function is responsible for reading and parsing the YAML configuration file
func loadConfig(filename string) (*Config, error) {
data, err := os.ReadFile(filename)
if err != nil {
return nil, fmt.Errorf("error reading config file: %v", err)
}
var config Config
if err := yaml.Unmarshal(data, &config); err != nil {
return nil, fmt.Errorf("error parsing config file: %v", err)
}
return &config, nil
}
// Config structure maps the contents of the config.yaml file
type Config struct {
Server ServerConfig `yaml:"server"`
Storage StorageConfig `yaml:"storage"`
S3 S3Config `yaml:"s3"`
}
type ServerConfig struct {
Port int `yaml:"port"`
}
type StorageConfig struct {
Directory string `yaml:"directory"`
}
type S3Config struct {
AccessKey string `yaml:"accessKey"`
SecretKey string `yaml:"secretKey"`
}
// loadConfig function is responsible for reading and parsing the YAML configuration file
func loadConfig(filename string) (*Config, error) {
data, err := os.ReadFile(filename)
if err != nil {
return nil, fmt.Errorf("error reading config file: %v", err)
}
var config Config
if err := yaml.Unmarshal(data, &config); err != nil {
return nil, fmt.Errorf("error parsing config file: %v", err)
}
return &config, nil
}
XML Structure for S3 Responses
Most API responses of the S3 protocol are in XML format. We need to predefine Go structures corresponding to these XML formats so that we can easily generate the correct responses after processing requests.
Code: XML Structures
// ListAllMyBucketsResult corresponds to the XML response for "List all buckets" operation
type ListAllMyBucketsResult struct {
XMLName xml.Name `xml:"ListAllMyBucketsResult"`
Owner Owner `xml:"Owner"`
Buckets Buckets `xml:"Buckets"`
}
type Owner struct {
ID string `xml:"ID"`
DisplayName string `xml:"DisplayName"`
}
type Buckets struct {
Bucket []Bucket `xml:"Bucket"`
}
type Bucket struct {
Name string `xml:"Name"`
CreationDate time.Time `xml:"CreationDate"`
}
// ListBucketResult corresponds to the XML response for "List objects in a bucket" operation
type ListBucketResult struct {
XMLName xml.Name `xml:"ListBucketResult"`
Name string `xml:"Name"`
Prefix string `xml:"Prefix"`
Marker string `xml:"Marker"`
Delimiter string `xml:"Delimiter"`
Contents []Contents `xml:"Contents"`
CommonPrefixes []CommonPrefixes `xml:"CommonPrefixes"`
}
type Contents struct {
Key string `xml:"Key"`
LastModified time.Time `xml:"LastModified"`
Size int64 `xml:"Size"`
StorageClass string `xml:"StorageClass"`
}
// CommonPrefixes represents "folders" in S3
type CommonPrefixes struct {
Prefix string `xml:"Prefix"`
}
// ListAllMyBucketsResult corresponds to the XML response for "List all buckets" operation
type ListAllMyBucketsResult struct {
XMLName xml.Name `xml:"ListAllMyBucketsResult"`
Owner Owner `xml:"Owner"`
Buckets Buckets `xml:"Buckets"`
}
type Owner struct {
ID string `xml:"ID"`
DisplayName string `xml:"DisplayName"`
}
type Buckets struct {
Bucket []Bucket `xml:"Bucket"`
}
type Bucket struct {
Name string `xml:"Name"`
CreationDate time.Time `xml:"CreationDate"`
}
// ListBucketResult corresponds to the XML response for "List objects in a bucket" operation
type ListBucketResult struct {
XMLName xml.Name `xml:"ListBucketResult"`
Name string `xml:"Name"`
Prefix string `xml:"Prefix"`
Marker string `xml:"Marker"`
Delimiter string `xml:"Delimiter"`
Contents []Contents `xml:"Contents"`
CommonPrefixes []CommonPrefixes `xml:"CommonPrefixes"`
}
type Contents struct {
Key string `xml:"Key"`
LastModified time.Time `xml:"LastModified"`
Size int64 `xml:"Size"`
StorageClass string `xml:"StorageClass"`
}
// CommonPrefixes represents "folders" in S3
type CommonPrefixes struct {
Prefix string `xml:"Prefix"`
}
S3 Signature Authentication
The cornerstone of the S3 protocol's security is its signature authentication mechanism. Any request sent to the S3 service (except for anonymous public access) must include an Authorization request header. The server uses the information in this header to verify the identity of the requester and whether the request has been tampered with during transmission.
The core of this mechanism is that the client uses its own SecretKey to perform HMAC-SHA256 hashing on specific parts of the request (such as HTTP method, URI, request headers, etc.) to generate a signature. When the server receives the request, it generates a signature using the same method and the SecretKey stored on the server and compares it with the signature sent by the client. If they match, authentication is successful.
We will encapsulate this logic in a middleware for Gin so that it applies to all API routes that need protection.
Code: Authentication Middleware authMiddleware
// authMiddleware is a Gin middleware that handles S3 signature authentication
func authMiddleware(config *Config) gin.HandlerFunc {
return func(c *gin.Context) {
// Get the authentication information sent by the S3 client from the request header
auth := c.GetHeader("Authorization")
if auth == "" {
c.XML(http.StatusUnauthorized, gin.H{"error": "Authorization required"})
c.Abort()
return
}
// Parse the Authorization request header, which is usually in the format "AWS AccessKey:Signature"
parts := strings.Fields(auth)
if len(parts) != 2 || !strings.HasPrefix(parts[0], "AWS") {
c.XML(http.StatusUnauthorized, gin.H{"error": "Invalid Authorization header format"})
c.Abort()
return
}
creds := strings.Split(parts[1], ":")
if len(creds) != 2 {
c.XML(http.StatusUnauthorized, gin.H{"error": "Invalid credentials format"})
c.Abort()
return
}
accessKey := creds[0]
signature := creds[1]
// Verify if the AccessKey is a valid user configured on the server
if accessKey != config.S3.AccessKey {
c.XML(http.StatusUnauthorized, gin.H{"error": "Invalid access key"})
c.Abort()
return
}
// Construct the "string to sign" used by the server to calculate the signature
stringToSign := c.Request.Method + "\n" +
c.Request.URL.Path + "\n" +
c.Request.URL.RawQuery
// Use the server's SecretKey to calculate the signature via HMAC-SHA256
mac := hmac.New(sha256.New, []byte(config.S3.SecretKey))
mac.Write([]byte(stringToSign))
expectedSignature := base64.StdEncoding.EncodeToString(mac.Sum(nil))
// Compare the signature calculated by the server with the signature sent by the client
if signature != expectedSignature {
c.XML(http.StatusUnauthorized, gin.H{"error": "Invalid signature"})
c.Abort()
return
}
// Signature verification passed, continue processing the request
c.Next()
}
}
// authMiddleware is a Gin middleware that handles S3 signature authentication
func authMiddleware(config *Config) gin.HandlerFunc {
return func(c *gin.Context) {
// Get the authentication information sent by the S3 client from the request header
auth := c.GetHeader("Authorization")
if auth == "" {
c.XML(http.StatusUnauthorized, gin.H{"error": "Authorization required"})
c.Abort()
return
}
// Parse the Authorization request header, which is usually in the format "AWS AccessKey:Signature"
parts := strings.Fields(auth)
if len(parts) != 2 || !strings.HasPrefix(parts[0], "AWS") {
c.XML(http.StatusUnauthorized, gin.H{"error": "Invalid Authorization header format"})
c.Abort()
return
}
creds := strings.Split(parts[1], ":")
if len(creds) != 2 {
c.XML(http.StatusUnauthorized, gin.H{"error": "Invalid credentials format"})
c.Abort()
return
}
accessKey := creds[0]
signature := creds[1]
// Verify if the AccessKey is a valid user configured on the server
if accessKey != config.S3.AccessKey {
c.XML(http.StatusUnauthorized, gin.H{"error": "Invalid access key"})
c.Abort()
return
}
// Construct the "string to sign" used by the server to calculate the signature
stringToSign := c.Request.Method + "\n" +
c.Request.URL.Path + "\n" +
c.Request.URL.RawQuery
// Use the server's SecretKey to calculate the signature via HMAC-SHA256
mac := hmac.New(sha256.New, []byte(config.S3.SecretKey))
mac.Write([]byte(stringToSign))
expectedSignature := base64.StdEncoding.EncodeToString(mac.Sum(nil))
// Compare the signature calculated by the server with the signature sent by the client
if signature != expectedSignature {
c.XML(http.StatusUnauthorized, gin.H{"error": "Invalid signature"})
c.Abort()
return
}
// Signature verification passed, continue processing the request
c.Next()
}
}
Bucket Operation Implementation
In our implementation, buckets directly correspond to directories in the file system. We will define routes to handle these operations in the main function.
List All Buckets (GET /)
This operation is implemented by reading all subdirectories under the root storage directory specified in the configuration.
Code: Adding Route in main.go
// In the main function
r.GET("/", func(c *gin.Context) {
// Read all subdirectories under the root storage directory
dirs, err := os.ReadDir(config.Storage.Directory)
if err != nil {
// If the root directory does not exist, create it and return an empty list
if os.IsNotExist(err) {
_ = os.MkdirAll(config.Storage.Directory, os.ModePerm)
dirs = []os.DirEntry{}
} else {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to list buckets"})
return
}
}
// Fill in the standard XML response structure for S3
result := ListAllMyBucketsResult{
Owner: Owner{ID: "123", DisplayName: "s3-fs"},
Buckets: Buckets{},
}
for _, dir := range dirs {
if dir.IsDir() {
info, _ := dir.Info()
result.Buckets.Bucket = append(result.Buckets.Bucket, Bucket{
Name: dir.Name(),
CreationDate: info.ModTime(),
})
}
}
c.XML(http.StatusOK, result)
})
// In the main function
r.GET("/", func(c *gin.Context) {
// Read all subdirectories under the root storage directory
dirs, err := os.ReadDir(config.Storage.Directory)
if err != nil {
// If the root directory does not exist, create it and return an empty list
if os.IsNotExist(err) {
_ = os.MkdirAll(config.Storage.Directory, os.ModePerm)
dirs = []os.DirEntry{}
} else {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to list buckets"})
return
}
}
// Fill in the standard XML response structure for S3
result := ListAllMyBucketsResult{
Owner: Owner{ID: "123", DisplayName: "s3-fs"},
Buckets: Buckets{},
}
for _, dir := range dirs {
if dir.IsDir() {
info, _ := dir.Info()
result.Buckets.Bucket = append(result.Buckets.Bucket, Bucket{
Name: dir.Name(),
CreationDate: info.ModTime(),
})
}
}
c.XML(http.StatusOK, result)
})
Create Bucket (PUT /{bucket})
Creating a bucket is equivalent to creating a new subdirectory under the root storage directory.
Code: Adding Route in main.go
// In the main function
r.PUT("/:bucket", func(c *gin.Context) {
bucket := c.Param("bucket")
bucketPath := filepath.Join(config.Storage.Directory, bucket)
// Create a directory in the file system representing the bucket
err := os.MkdirAll(bucketPath, os.ModePerm)
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to create bucket"})
return
}
c.Status(http.StatusOK)
})
// In the main function
r.PUT("/:bucket", func(c *gin.Context) {
bucket := c.Param("bucket")
bucketPath := filepath.Join(config.Storage.Directory, bucket)
// Create a directory in the file system representing the bucket
err := os.MkdirAll(bucketPath, os.ModePerm)
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to create bucket"})
return
}
c.Status(http.StatusOK)
})
Delete Bucket (DELETE /{bucket})
Deleting a bucket will cascade delete the corresponding directory and all its contents.
Code: Adding Route in main.go
// In the main function
r.DELETE("/:bucket", func(c *gin.Context) {
bucket := c.Param("bucket")
bucketPath := filepath.Join(config.Storage.Directory, bucket)
// Delete the directory and all its contents
err := os.RemoveAll(bucketPath)
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to delete bucket"})
return
}
c.Status(http.StatusNoContent)
})
// In the main function
r.DELETE("/:bucket", func(c *gin.Context) {
bucket := c.Param("bucket")
bucketPath := filepath.Join(config.Storage.Directory, bucket)
// Delete the directory and all its contents
err := os.RemoveAll(bucketPath)
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to delete bucket"})
return
}
c.Status(http.StatusNoContent)
})
Object Operation Implementation
In our implementation, objects correspond to files or subdirectories under the bucket directory.
List Objects (GET /{bucket})
This API needs to handle prefix and delimiter parameters to support hierarchical listing of objects, which is key to implementing the "folder" browsing feature in S3 clients.
Code: Adding Route in main.go
// In the main function
r.GET("/:bucket", func(c *gin.Context) {
bucket := c.Param("bucket")
prefix := c.Query("prefix")
delimiter := c.Query("delimiter")
result := ListBucketResult{
Name: bucket,
Prefix: prefix,
Delimiter: delimiter,
}
dirPath := filepath.Join(config.Storage.Directory, bucket, prefix)
entries, err := os.ReadDir(dirPath)
if err != nil {
c.XML(http.StatusNotFound, gin.H{"error": "Directory not found"})
return
}
for _, entry := range entries {
name := entry.Name()
// Construct the full Key of the object in S3
fullPath := filepath.ToSlash(filepath.Join(prefix, name))
if entry.IsDir() {
// If it's a directory, treat it as a CommonPrefix (folder)
result.CommonPrefixes = append(result.CommonPrefixes, CommonPrefixes{
Prefix: fullPath + "/",
})
} else {
// If it's a file, treat it as a Contents (object)
info, _ := entry.Info()
result.Contents = append(result.Contents, Contents{
Key: fullPath,
LastModified: info.ModTime(),
Size: info.Size(),
StorageClass: "STANDARD",
})
}
}
c.XML(http.StatusOK, result)
})
// In the main function
r.GET("/:bucket", func(c *gin.Context) {
bucket := c.Param("bucket")
prefix := c.Query("prefix")
delimiter := c.Query("delimiter")
result := ListBucketResult{
Name: bucket,
Prefix: prefix,
Delimiter: delimiter,
}
dirPath := filepath.Join(config.Storage.Directory, bucket, prefix)
entries, err := os.ReadDir(dirPath)
if err != nil {
c.XML(http.StatusNotFound, gin.H{"error": "Directory not found"})
return
}
for _, entry := range entries {
name := entry.Name()
// Construct the full Key of the object in S3
fullPath := filepath.ToSlash(filepath.Join(prefix, name))
if entry.IsDir() {
// If it's a directory, treat it as a CommonPrefix (folder)
result.CommonPrefixes = append(result.CommonPrefixes, CommonPrefixes{
Prefix: fullPath + "/",
})
} else {
// If it's a file, treat it as a Contents (object)
info, _ := entry.Info()
result.Contents = append(result.Contents, Contents{
Key: fullPath,
LastModified: info.ModTime(),
Size: info.Size(),
StorageClass: "STANDARD",
})
}
}
c.XML(http.StatusOK, result)
})
Upload Object (PUT /{bucket}/*key)
The upload operation will write the data from the request body to the local file system. We also need to specially handle the S3 aws-chunked encoding format, which can be seen in the previous packet capture article.
Code: Adding Route in main.go
// In the main function
r.PUT("/:bucket/*key", func(c *gin.Context) {
bucket := c.Param("bucket")
key := strings.TrimPrefix(c.Param("key"), "/")
filePath := filepath.Join(config.Storage.Directory, bucket, key)
// Ensure the parent directory exists
os.MkdirAll(filepath.Dir(filePath), os.ModePerm)
file, err := os.Create(filePath)
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to create file"})
return
}
defer file.Close()
// The S3 client may use chunked streaming uploads
if strings.Contains(c.GetHeader("x-amz-content-sha256"), "STREAMING-AWS4-HMAC-SHA256-PAYLOAD") {
data, err := readChunkedBody(c.Request.Body)
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to read chunked data"})
return
}
_, err = file.Write(data)
} else {
// Handle normal uploads
_, err = io.Copy(file, c.Request.Body)
}
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to write file"})
return
}
c.Status(http.StatusOK)
})
// In the main function
r.PUT("/:bucket/*key", func(c *gin.Context) {
bucket := c.Param("bucket")
key := strings.TrimPrefix(c.Param("key"), "/")
filePath := filepath.Join(config.Storage.Directory, bucket, key)
// Ensure the parent directory exists
os.MkdirAll(filepath.Dir(filePath), os.ModePerm)
file, err := os.Create(filePath)
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to create file"})
return
}
defer file.Close()
// The S3 client may use chunked streaming uploads
if strings.Contains(c.GetHeader("x-amz-content-sha256"), "STREAMING-AWS4-HMAC-SHA256-PAYLOAD") {
data, err := readChunkedBody(c.Request.Body)
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to read chunked data"})
return
}
_, err = file.Write(data)
} else {
// Handle normal uploads
_, err = io.Copy(file, c.Request.Body)
}
if err != nil {
c.XML(http.StatusInternalServerError, gin.H{"error": "Failed to write file"})
return
}
c.Status(http.StatusOK)
})
Code: Parsing Chunked Uploads
// readChunkedBody reads and parses the S3 chunked upload request body
func readChunkedBody(body io.Reader) ([]byte, error) {
var data []byte
reader := bufio.NewReader(body)
for {
// Read chunk size
line, err := reader.ReadString('\n')
if err != nil {
if err == io.EOF { break }
return nil, err
}
var chunkSize int
// Parse the hexadecimal chunk size
if _, err := fmt.Sscanf(strings.Split(line, ";")[0], "%x", &chunkSize); err != nil {
continue
}
if chunkSize == 0 { break }
// Read the specified size of data chunk
chunk := make([]byte, chunkSize)
if _, err := io.ReadFull(reader, chunk); err != nil {
return nil, err
}
data = append(data, chunk...)
// Read and discard the CRLF at the end of the chunk
reader.ReadString('\n')
}
return data, nil
}
// readChunkedBody reads and parses the S3 chunked upload request body
func readChunkedBody(body io.Reader) ([]byte, error) {
var data []byte
reader := bufio.NewReader(body)
for {
// Read chunk size
line, err := reader.ReadString('\n')
if err != nil {
if err == io.EOF { break }
return nil, err
}
var chunkSize int
// Parse the hexadecimal chunk size
if _, err := fmt.Sscanf(strings.Split(line, ";")[0], "%x", &chunkSize); err != nil {
continue
}
if chunkSize == 0 { break }
// Read the specified size of data chunk
chunk := make([]byte, chunkSize)
if _, err := io.ReadFull(reader, chunk); err != nil {
return nil, err
}
data = append(data, chunk...)
// Read and discard the CRLF at the end of the chunk
reader.ReadString('\n')
}
return data, nil
}
Download and Delete Objects
Downloading an object means returning the file content as the HTTP response body, while deleting an object means deleting the corresponding file in the file system.
Code: Download Object (GET /{bucket}/*key)
// In the main function
r.GET("/:bucket/*key", func(c *gin.Context) {
bucket := c.Param("bucket")
key := strings.TrimPrefix(c.Param("key"), "/")
filePath := filepath.Join(config.Storage.Directory, bucket, key)
// Gin/Go standard library provides convenient file serving functionality
http.ServeFile(c.Writer, c.Request, filePath)
})
// In the main function
r.GET("/:bucket/*key", func(c *gin.Context) {
bucket := c.Param("bucket")
key := strings.TrimPrefix(c.Param("key"), "/")
filePath := filepath.Join(config.Storage.Directory, bucket, key)
// Gin/Go standard library provides convenient file serving functionality
http.ServeFile(c.Writer, c.Request, filePath)
})
Code: Delete Object (DELETE /{bucket}/*key)
// In the main function
r.DELETE("/:bucket/*key", func(c *gin.Context) {
bucket := c.Param("bucket")
key := strings.TrimPrefix(c.Param("key"), "/")
filePath := filepath.Join(config.Storage.Directory, bucket, key)
err := os.Remove(filePath)
if err != nil {
c.XML(http.StatusNotFound, gin.H{"error": "File not found"})
return
}
c.Status(http.StatusNoContent)
})
// In the main function
r.DELETE("/:bucket/*key", func(c *gin.Context) {
bucket := c.Param("bucket")
key := strings.TrimPrefix(c.Param("key"), "/")
filePath := filepath.Join(config.Storage.Directory, bucket, key)
err := os.Remove(filePath)
if err != nil {
c.XML(http.StatusNotFound, gin.H{"error": "File not found"})
return
}
c.Status(http.StatusNoContent)
})
Summary
Through the above steps, we have successfully implemented a subset of the S3 protocol using Go language and Gin framework, completing a simple object storage service based on the file system that can run locally. Although this version simplifies the authentication mechanism and omits advanced features like multipart uploads and version control, it clearly demonstrates the core workflow of the S3 protocol.
We can use tools like S3 Browser introduced in the previous article to connect and test. Just point the S3 Browser's endpoint to 127.0.0.1:9000 and fill in the AccessKey and SecretKey defined in config.yaml, and we can operate our own service just like a real S3.
Comments
No comments yet. Be the first to comment!