Azure Storage Blob - How to List and Download Blob from Azure Storage container in Python (No Azure library)

February 04, 2021

Introduction

In this tutorial we will see, How to list and download storage container blobs without using Azure python libraries.

Note: There is no azure library used, just rest api calls.

Pre-requisite

This tutorial is based upon Python-3.7

Pypy Dependency

We would require requests.

Complete Code

import requests
import re
import os

def _get_file_list_helper(container, next_marker=None):
  """
  Get the files list by using next_marker
  """
  account_name = container['account_name']
  container_name = container['container_name']
  curl_url = f'https://{account_name}.blob.core.windows.net/{container_name}?restype=container&comp=list&' 
  if next_marker:
    curl_url += f'marker={next_marker}&'
  curl_url += container['sas_token']

  print('Executing rest call to azure')
  r = requests.get(curl_url)
  text = r.text

  # this marker indicates there are more files
  next_marker = re.findall('<NextMarker>([^<]*)</NextMarker>',text)
  file_names = re.findall('<Name>([^<]*)</Name>',text)

  return {'files': file_names, 'next_marker': next_marker}  

def get_file_list(container):
  """
  Get the files list
  """
  files = []
  next_marker = None
  while True:
    files_data = _get_file_list_helper(container, next_marker)
    files.extend(files_data['files'])
    if not files_data['next_marker']:
      break
    next_marker = files_data['next_marker'][0]
  return files

def dowload_files(container, local_dest_path):
  files = get_file_list(container)

  account_name = container['account_name']
  container_name = container['container_name']
  url_path = f'https://{account_name}.blob.core.windows.net/{container_name}/'
  url_end_path = '?'  + container['sas_token']

  for file_name in files:
    print(f'Downloading: {file_name}')
    url = f'{url_path}{file_name}{url_end_path}'
    path = f'{local_dest_path}/{file_name}'
    if not os.path.exists(os.path.dirname(path)):
      os.makedirs(os.path.dirname(path))

      # make the request
      r = requests.get(url)

    # write the file
    with open(path, "wb") as download_file:
      download_file.write(r.content)

## main starts here
local_dest_path = './container_blob'

container = {
    'account_name': 'account_name',
    'container_name': 'container_name',
    'sas_token': 'xxxxxxxxxx'
}
dowload_files(container, local_dest_path)

Explanation

The code is very simple to understand. We are using Azure REST APIs to list and download storage blobs.

next_marker understanding

In cases, where there are more files in your storage container. The response does not have all the files in one response call. It instead returns a fixed number of items and a next_marker. Which indicates, there are more files. This marker has to be sent in next requests.

Usage with Azure Official Python Libraries

For usage with Azure official Python libraries, see: List and Download Azure blobs by Azure Python Libraries

Response to get blob Rest API

<?xml version="1.0" encoding="utf-8"?><EnumerationResults ServiceEndpoint="https://hubbledmeprodlocb.blob.core.windows.net/" ContainerName="container_name">
  <Blobs>
    <Blob>
      <Name>abc/test.log</Name>
      <Properties>
        <Last-Modified>Mon, 02 Dec 2019 09:42:50 GMT</Last-Modified>
        <Etag>0x8D7770BFF1CC8A1</Etag>
        <Content-Length>423</Content-Length>
        <Content-Type>application/octet-stream</Content-Type>
        <Content-Encoding /><Content-Language />
        <Content-MD5>3ycLC3CutKkybJtlgvEdsQ==</Content-MD5>
        <Cache-Control />
        <Content-Disposition />
        <BlobType>BlockBlob</BlobType>
        <LeaseStatus>unlocked</LeaseStatus>
        <LeaseState>available</LeaseState>
        <ServerEncrypted>true</ServerEncrypted>
      </Properties>
    </Blob>
  ...
  </Blobs>
  <NextMarker>marker_id</NextMarker>

</EnumerationResults>

Hope it helps.


Similar Posts

Latest Posts