Azure Cosmos DB for MongoDB Series: Part 2: Data Migration from Azure Cosmos DB to Atlas

In Part 1 of the blog series on migrating data from CosmoDB, we created Azure Cosmos DB for MongoDB API, and populated it with test data. We then leveraged the .NET console application, using Mongo Driver to perform basic CRUD operations.

In Part 2 of this series, I’d like to take you through migrating data from Azure Cosmos to Atlas Cluster, and perform a simple switch over of the Connection String to support connectivity to the Atlas cluster.


MongoDB Atlas (Azure)

  1. Create an Atlas account
  2. Create an Organization with one project
  3. Create an Atlas Cluster (Peering is available only on M10 and above)
  4. Create a DB User with access to any database

Create Atlas Cluster

Create Atlas Cluster

For Cluster Tier use the free M0 Sandbox.

The Cluster creation will take about 10 minutes. Once it is done, we create a DB user.

Add a New DB User

Add a user with read and write access to the database. This will be used in the Connection String to connect to the database.

DBUser: Add a DB user for connecting to the database from the application.

Data Export

  1. example: This collection has 1079 documents.
  2. PersonCollection: This collection has 2 documents.

Use mongodump to export data from the Azure Cosmos DB for MongoDB API.

mongodump --host <HOST>:10255 --authenticationDatabase test -u <USERNAME> -p <PRIMARY PASSWORD> --ssl --sslAllowInvalidCertificatesNote: Replace <HOST>, <USERNAME>, and <SECONDARY PASSWORD> with the actual values shown in the Figure 1figure.

By running the command posted above, you will get an output that shows the number of documents exported in each collection.

The output will be extracted to a dump folder, creating separate folders for each database. In this example we have only one database (called “test”), so it outputs the files with each collection having one bson and metadata file e.g. example.bson, example.metadata.json

Data Import

mongorestore --uri=<connectionstring> -u <username> -p <password> dump

Once the command is executed successfully, the documents will be restored to Atlas. Mongorestore will now show the total documents restored.

The Atlas portal will show the restored collection.

Application Migration

Connection String

Update the Connection String with the connection string from Atlas, and run the application.

Simply updating the Connection String and running it will allow the application to connect and retrieve data from Atlas seamlessly.

Note: For this blog, the connection is established by whitelisting required IPs. Ideally, this would be done with Peering.


Scenario 1 : CosmosDB to Atlas hosted in Azure

In this instance, the data migrated is very minimal. We can therefore connect over the internet and execute in our PC. Ideally, in production when the volume of data is huge (in GBs), we can create a VM in the same region as CosmosDB and have the blob storage mounted as a drive. We can then run mongodump to copy the files to the blob storage, and subsequently use the restore command to migrate to Atlas.

Scenario 2: CosmosDB to Atlas hosted in other cloud platform (GCP)

Scenario 2 is moving huge volumes of data across the cloud. We can still leverage the same steps as earlier, such as having a VM created in the same region and have the blob storage mounted as a drive to copy data using mongodump.

Optionally, we can leverage the data transfer services from Google Cloud, to copy data from the blob storage to the cloud, and then create a VM in the same region as Atlas in GCP to restore the data. This can prevent network latency, when trying to run the mongo restore directly from Azure cloud VM (again this is optional, and options such as VPN direct connectivity to GCP from Azure can also be leveraged.)

Here is a short video highlighting the data migration service in GCP:

I considered simple migration of the data from Azure Cosmos DB to Atlas hosted in Azure or GCP, for this blog.

In the next part of the series, I’d like to talk about complex migrations with sharded data.