I am a data scientist / machine learning developer. Sometimes, I have to expose my models by providing an endpoint. I usually do this via Flask and gunicorn:
exampleproject.py
:import random from flask import Flask app = Flask(__name__) random.seed(0) @app.route("/") def hello(): x = random.randint(1, 100) y = random.randint(1, 100) return str(x * y) if __name__ == "__main__": app.run(host='0.0.0.0')
wsgi.py
:from exampleproject import app if __name__ == "__main__": app.run()
Run by
$ gunicorn --bind 0.0.0.0:5000 wsgi:app
When I benchmark this simple script, I get:
$ ab -s 30 -c 200 -n 25000 -v 1 http://localhost:5000/ This is ApacheBench, Version 2.3 <$Revision: 1807734 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 2500 requests Completed 5000 requests Completed 7500 requests Completed 10000 requests Completed 12500 requests Completed 15000 requests Completed 17500 requests Completed 20000 requests Completed 22500 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 24941 requests completed
With less total requests, it looks fine:
$ ab -l -s 30 -c 200 -n 200 -v 1 http://localhost:5000/ This is ApacheBench, Version 2.3 <$Revision: 1807734 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 100 requests Completed 200 requests Finished 200 requests Server Software: gunicorn/19.9.0 Server Hostname: localhost Server Port: 5000 Document Path: / Document Length: Variable Concurrency Level: 200 Time taken for tests: 0.084 seconds Complete requests: 200 Failed requests: 0 Total transferred: 32513 bytes HTML transferred: 713 bytes Requests per second: 2380.19 [#/sec] (mean) Time per request: 84.027 [ms] (mean) Time per request: 0.420 [ms] (mean, across all concurrent requests) Transfer rate: 377.87 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 2 1.2 2 3 Processing: 1 36 16.8 41 52 Waiting: 1 36 16.8 41 52 Total: 4 37 15.8 43 54 Percentage of the requests served within a certain time (ms) 50% 43 66% 51 75% 51 80% 52 90% 52 95% 52 98% 53 99% 53 100% 54 (longest request)
Is there something I can change to improve the configuration for my kind of workload?
When I execute only one call of my real model, I see an answer in 0.5s. I would say an execution time of up to 1.0s is reasonable. Every call is stateless, meaning each call should be independent of other calls.
When I tried to analyze this problem, I saw a lot of
TIME_WAIT
:$ netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n 1 established) 1 Foreign 2 CLOSE_WAIT 4 LISTEN 10 SYN_SENT 60 SYN_RECV 359 ESTABLISHED 13916 TIME_WAIT
How can I confirm / falsify that this is the problem? Is this in any way related to Flask / gunicorn? How does nginx relate to gunicorn?
Answer
Attribution
Source : Link , Question Author : Martin Thoma , Answer Author : Community