issues: 1174708375
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | pull_request | body | repo | type | active_lock_reason | performed_via_github_app | reactions | draft | state_reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1174708375 | I_kwDOBm6k_c5GBKCX | 1673 | Streaming CSV spends a lot of time in `table_column_details` | 9599 | open | 0 | 1 | 2022-03-20T22:25:28Z | 2022-03-20T22:34:06Z | OWNER | At least I think it does. I tried running `py-spy top -p $PID` against a Datasette process that was trying to do: datasette covid.db --get '/covid/ny_times_us_counties.csv?_size=10&_stream=on' While investigating: - #1355 And spotted this: ``` datasette covid.db --get /covid/ny_times_us_counties.csv?_size=10&_stream=on' (python v3.10.2) Total Samples 5800 GIL: 71.00%, Active: 98.00%, Threads: 4 %Own %Total OwnTime TotalTime Function (filename:line) 8.00% 8.00% 4.32s 4.38s sql_operation_in_thread (datasette/database.py:212) 5.00% 5.00% 3.77s 3.93s table_column_details (datasette/utils/__init__.py:614) 6.00% 6.00% 3.72s 3.72s _worker (concurrent/futures/thread.py:81) 7.00% 7.00% 2.98s 2.98s _read_from_self (asyncio/selector_events.py:120) 5.00% 6.00% 2.35s 2.49s detect_fts (datasette/utils/__init__.py:571) 4.00% 4.00% 1.34s 1.34s _write_to_self (asyncio/selector_events.py:140) ``` Relevant code: https://github.com/simonw/datasette/blob/798f075ef9b98819fdb564f9f79c78975a0f71e8/datasette/utils/__init__.py#L609-L625 | 107914493 | issue | {"url": "https://api.github.com/repos/simonw/datasette/issues/1673/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} |