issue_comments: 991805516
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/datasette/issues/1550#issuecomment-991805516 | https://api.github.com/repos/simonw/datasette/issues/1550 | 991805516 | IC_kwDOBm6k_c47HcBM | 9599 | 2021-12-11T23:43:24Z | 2021-12-11T23:43:24Z | OWNER | I built a tiny Starlette app to experiment with this a bit: ```python import asyncio import janus from starlette.applications import Starlette from starlette.responses import JSONResponse, HTMLResponse, StreamingResponse from starlette.routing import Route import sqlite3 from concurrent import futures executor = futures.ThreadPoolExecutor(max_workers=10) async def homepage(request): return HTMLResponse( """ <html> <head><title>SQL CSV Server</title> <style>body { width: 40rem; font-family: helvetica; margin: 2em auto; }</style> <body> <h1>SQL CSV Server</h1> <form action="/csv"> <label style="display: block">SQL query: <textarea style="width: 90%; height: 20em" name="sql"></textarea> </label> <input type="submit" value="Run query"> </form> </head> """ ) def run_query_in_thread(sql, sync_q): db = sqlite3.connect("../datasette/covid.db") cursor = db.cursor() cursor.arraysize = 100 # Default is 1 apparently? cursor.execute(sql) columns = [d[0] for d in cursor.description] sync_q.put([columns]) # Now start putting batches of rows while True: rows = cursor.fetchmany() if rows: sync_q.put(rows) else: break # Let queue know we are finished\ sync_q.put(None) async def csv_query(request): sql = request.query_params["sql"] queue = janus.Queue() loop = asyncio.get_running_loop() async def csv_generator(): loop.run_in_executor(None, run_query_in_thread, sql, queue.sync_q) while True: rows = await queue.async_q.get() if rows is not None: for row in rows: yield ",".join(map(str, row)) + "\n " queue.async_q.task_done() else: # Cleanup queue.close() await queue.wait_closed() break return StreamingResponse(csv_generator(), media_type='text/plain') app = Starlette( debug=True, routes=[ Route("/", homepage), Route("/csv", csv_query), ], ) ``` But.. if I run this in a terminal window: ``` /tmp % wget 'http://127.0.0.1:8000/csv?sql=select+*+from+ny_times_us_counties' ``` it takes about 20 seconds to run and returns a 50MB file - but while it is running no other requests can be served by that server - not even the homepage! So something is blocking the event loop. Maybe I should be using `fut = loop.run_in_executor(None, run_query_in_thread, sql, queue.sync_q)` and then awaiting `fut` somewhere, like in the Janus documentation? Don't think that's needed though. Needs more work to figure out why this is blocking. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 1077628073 |