How to sync Mongodb data to ElasticSearch by using MongoConnector

September 06, 2019

Introduction

This post is about syncing your mongodo database data to ElasticSearch. There might be several scenarios where you want to quickly search some data, or expose a search api, or running Grafana to visualize your data.

Mongo-Connector

MongoConnector is an open source tool which is to sync MongoDB data to ElasticSearch. You can run it periodically or continously. And, it can sync all the changes in your MongoDB data. ElasticSearch will have a replica of the MongoDB data.

You can configure which are the MongoDB collections you want to sync and with what names their indexes should be made in ElasticSearch.

Requirements

  1. MongoDB Replica Set You need a MongoDB replica set. A standalone instance will not work.

  2. ElasticSearch Cluster

  3. mongo-connector utility

How to create MongoDB replica set with Docker

See: Run MongoDB replica set with Docker

How to create ElasticSearch cluster with Docker

See: Run Elastic Search Cluster with Docker

How to get mongo-connector

You need to have python installed, and install it via pip:

pip install 'mongo-connector[elastic5]' 'elastic2-doc-manager[elastic5]'

Or, you can prepare its docker image too. See below Dockerfile:

FROM python:3-alpine
RUN apk add --no-cache curl sed && pip install 'mongo-connector[elastic5]' 'elastic2-doc-manager[elastic5]'
ENTRYPOINT ["mongo-connector"]

To build docker image:

docker build -t my_mongoconnector .

Run MongoConnector

MongoConnector Config

You should prepare a config file(name=mongoconnector.json):

{
   "oplogFile": "<your desired path>/oplog.timestamp",
   "noDump": false,
   "batchSize": 50,
   "verbosity": 2,
   "continueOnError": true,
   "logging": {
       "type": "stream"
   },
   "namespaces": {
        "mydb.coll1": {
            "rename": "mydb_coll1._doc"
        },
        "mydb.trainings": {
            "rename": "mydb_trainings._doc"
        }
    },
   "docManagers": [
       {
           "docManager": "elastic2_doc_manager",
           "targetURL": "<elastic search hostname>:9200",
           "bulkSize": 10,
           "uniqueKey": "_id",
           "args": {
              "clientOptions": {"timeout": 5000}
           }
       }
   ]
}

In above config file:

  • oplogFile - Its a file where mongo-connector will write a timestamp where it left syncing. So that even if it stopped, it can start syncing from the place where it left.
  • namespaces - which are the collections you want to sync, and with what names they will go in Elastic Search
  • docManagers - Configuration about your elastic search cluster.

Run

mongo-connector -m "mongodb://<mongoset1>:27017,<mongoset2>:27018,<mongoset3>:27019/<your db>?replicaSet=your-replicaset-name" -c ./mongoconnector.json

If everything is fine, it will start syncing your MongoDB data to ElasticSearch you specified.

Sample output

2019-09-06 08:17:05,189 [ALWAYS] mongo_connector.connector:50 - Starting mongo-connector version: 3.1.1
2019-09-06 08:17:05,189 [ALWAYS] mongo_connector.connector:50 - Python version: 3.6.8 (default, Apr 25 2019, 21:02:35) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
2019-09-06 08:17:05,190 [ALWAYS] mongo_connector.connector:50 - Platform: Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-centos-7.6.1810-Core
2019-09-06 08:17:05,191 [ALWAYS] mongo_connector.connector:50 - pymongo version: 3.9.0
2019-09-06 08:17:05,204 [ALWAYS] mongo_connector.connector:50 - Source MongoDB version: 4.2.0
2019-09-06 08:17:05,204 [ALWAYS] mongo_connector.connector:50 - Target DocManager: mongo_connector.doc_managers.elastic2_doc_manager version: 1.0.0
2019-09-06 08:17:05,225 [INFO] mongo_connector.oplog_manager:137 - OplogThread: Initializing oplog thread
2019-09-06 08:17:05,227 [INFO] mongo_connector.connector:402 - MongoConnector: Starting connection thread MongoClient(host=['mongoset1:27018', 'mongoset1:27017', 'mongoset1:27019'], document_class=dict, tz_aware=False, connect=True, replicaset='your-replica-set')
2019-09-06 08:17:05,241 [INFO] elasticsearch:83 - GET http://<es-hostname>:9200/_mget?realtime=true [status:200 request:0.007s]
2019-09-06 08:17:05,356 [INFO] elasticsearch:83 - POST http://<es-hostname>:9200/_bulk [status:200 request:0.110s]
2019-09-06 08:17:05,477 [INFO] elasticsearch:83 - POST http://<es-hostname>:9200/_refresh [status:200 request:0.121s]
2019-09-06 08:17:05,484 [INFO] elasticsearch:83 - GET http://<es-hostname>:9200/_mget?realtime=true [status:200 request:0.006s]
2019-09-06 08:17:05,616 [INFO] elasticsearch:83 - POST http://<es-hostname>9200/_bulk [status:200 request:0.129s]
2019-09-06 08:17:05,744 [INFO] elasticsearch:83 - POST http://<es-hostname>:9200/_refresh [status:200 request:0.128s]
.
.
.
.
.
2019-09-06 08:18:35,294 [INFO] mongo_connector.oplog_manager:78 - OplogThread for replica set 'your replica set' is up to date with the oplog.
2019-09-06 08:19:05,324 [INFO] mongo_connector.oplog_manager:78 - OplogThread for replica set 'your replica set' is up to date with the oplog.

And it will update the timestmap in that oplog file.


Similar Posts

Latest Posts