Skip to content

Case study: loading inhouse monitoring system to get performance results

March 18, 2015

Case study: Creating load system for high load monitoring application

Problem:

We were wondering how many connections and data our monitoring system can handle. If load for system is too big it will draw graphs incorrectly with intervals without data. Scheme below describes monitoring system:

monitoring system diagram

Monitoring system diagram. It collects data from nodes (via preinstalled deamons on nodes) and then aggregate it into human readable reports and graphs

To implement correct testing we crawled data from real nodes choosing the most complicated responses storing data into memcached database to eliminate HDD I/O problems under high load and concurrency for getting the same data. For crawling we wrote agent_data_crawler.py (see sources).

data crawling scheme

Getting test data from real nodes under monitoring and storing it into memcashed database. Further we can access data with fast reads from RAM to simulate high load.

 

def load_data_to_memcache(host, port, path_to_data):
    mc = memcache.Client(['%s:%d' % (host, port)], debug=0)

    for file_data in read_files_to_array(path_to_data):
        mc.set(file_data[0], file_data[1])

Mock endpoints that will be responding with loaded data (using flask library)

@app.route("/json/")
def json_handler():
    return Response(response=get_value_from_cache()) # we need to return data in the same order that's why there is global iterator for each instanse</pre>

Run multiple simultanious endpoints to simulate load for target system (see run_multiple.py)

    processes = []
    for port_number in range(args.srange, args.srange + args.number):
        process = subprocess.Popen([sys.executable, '%s' % args.path, '-p %d' % port_number])
        processes.append(process)

    for proc in processes:
        proc.wait()
 

And for managing all precess and code we code wrap using fabric tasks and jenkins jobs. After all we’ve got system that can be described by the following diagram:

load_diag

Running multiple mocked nodes. They takes data from memcached storage to eliminate hdd i/o under high load. Also time between new data requests was decreased to see how system can handle big and fast data.

 

Sources: https://github.com/tjlee/monitoringcrasher

 

 

Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: