{"id": 512996469, "node_id": "MDU6SXNzdWU1MTI5OTY0Njk=", "number": 607, "title": "Ways to improve fuzzy search speed on larger data sets?", "user": {"value": 8431341, "label": "zeluspudding"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 6, "created_at": "2019-10-27T17:31:37Z", "updated_at": "2019-11-07T03:38:10Z", "closed_at": "2019-11-07T03:38:10Z", "author_association": "NONE", "pull_request": null, "body": "I have an sqlite table with 16 million rows in it. Having read @simonw article \"[Fast Autocomplete Search for Your Website](https://24ways.org/2018/fast-autocomplete-search-for-your-website/)\" I was curious to try datasette to see what kind of query performance I could get out of it. In truth I don't need to do full text search since all I would like to do is give my users a way to search for the names of investors such as \"Warren Buffet\", or \"Tim Cook\" (who's names are in a single column).\r\n\r\nOn the first search, Datasette takes over 20 seconds to return all records associated with `elon musk`:\r\n\r\n> ![image](https://user-images.githubusercontent.com/8431341/67638889-a86e1100-f8b7-11e9-9f7e-a9d13a42e988.png)\r\n\r\n> ![image](https://user-images.githubusercontent.com/8431341/67638825-ed457800-f8b6-11e9-94d1-b44f1a40ee8c.png)\r\n\r\nIf I rerun the same search, it then takes almost 9 seconds:\r\n> ![image](https://user-images.githubusercontent.com/8431341/67638908-e4a17180-f8b7-11e9-9d00-748c80ef1f21.png)\r\n\r\nThat's far to slow to implement an autocomplete feature. I could reduce the latency by making a special table of only unique investor names, thereby reducing the search space to less than a million rows (then I'd need to implement a way to add only new investor names to the table as I received new data.. about 4,000 rows a day). If I did that, I'm still concerned the new table wouldn't be lean enough to lookup investor names quickly. Plus, even if I can implement the autocomplete feature, I would still finally have to lookup records for that investors which would take between 8 - 20 seconds. \r\n\r\nAre there any tricks for speeding this up?\r\n\r\nHere's my hardware:\r\n> ![image](https://user-images.githubusercontent.com/8431341/67638861-55945980-f8b7-11e9-96a8-ca76c7c68c5d.png)\r\n", "repo": {"value": 107914493, "label": "datasette"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/datasette/issues/607/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"}