issue_comments: 1112878955
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/datasette/issues/1727#issuecomment-1112878955 | https://api.github.com/repos/simonw/datasette/issues/1727 | 1112878955 | IC_kwDOBm6k_c5CVS9r | 9599 | 2022-04-29T05:02:40Z | 2022-04-29T05:02:40Z | OWNER | Here's a very useful (recent) article about how the GIL works and how to think about it: https://pythonspeed.com/articles/python-gil/ - via https://lobste.rs/s/9hj80j/when_python_can_t_thread_deep_dive_into_gil From that article: > For example, let's consider an extension module written in C or Rust that lets you talk to a PostgreSQL database server. > > Conceptually, handling a SQL query with this library will go through three steps: > > 1. Deserialize from Python to the internal library representation. Since this will be reading Python objects, it needs to hold the GIL. > 2. Send the query to the database server, and wait for a response. This doesn't need the GIL. > 3. Convert the response into Python objects. This needs the GIL again. > > As you can see, how much parallelism you can get depends on how much time is spent in each step. If the bulk of time is spent in step 2, you'll get parallelism there. But if, for example, you run a `SELECT` and get a large number of rows back, the library will need to create many Python objects, and step 3 will have to hold GIL for a while. That explains what I'm seeing here. I'm pretty convinced now that the reason I'm not getting a performance boost from parallel queries is that there's more time spent in Python code assembling the results than in SQLite C code executing the query. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 1217759117 |