id,node_id,number,title,user,user_label,state,locked,assignee,assignee_label,milestone,milestone_label,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,repo_label,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason 512996469,MDU6SXNzdWU1MTI5OTY0Njk=,607,Ways to improve fuzzy search speed on larger data sets?,8431341,zeluspudding,closed,0,,,,,6,2019-10-27T17:31:37Z,2019-11-07T03:38:10Z,2019-11-07T03:38:10Z,NONE,,"I have an sqlite table with 16 million rows in it. Having read @simonw article ""[Fast Autocomplete Search for Your Website](https://24ways.org/2018/fast-autocomplete-search-for-your-website/)"" I was curious to try datasette to see what kind of query performance I could get out of it. In truth I don't need to do full text search since all I would like to do is give my users a way to search for the names of investors such as ""Warren Buffet"", or ""Tim Cook"" (who's names are in a single column). On the first search, Datasette takes over 20 seconds to return all records associated with `elon musk`: > ![image](https://user-images.githubusercontent.com/8431341/67638889-a86e1100-f8b7-11e9-9f7e-a9d13a42e988.png) > ![image](https://user-images.githubusercontent.com/8431341/67638825-ed457800-f8b6-11e9-94d1-b44f1a40ee8c.png) If I rerun the same search, it then takes almost 9 seconds: > ![image](https://user-images.githubusercontent.com/8431341/67638908-e4a17180-f8b7-11e9-9d00-748c80ef1f21.png) That's far to slow to implement an autocomplete feature. I could reduce the latency by making a special table of only unique investor names, thereby reducing the search space to less than a million rows (then I'd need to implement a way to add only new investor names to the table as I received new data.. about 4,000 rows a day). If I did that, I'm still concerned the new table wouldn't be lean enough to lookup investor names quickly. Plus, even if I can implement the autocomplete feature, I would still finally have to lookup records for that investors which would take between 8 - 20 seconds. Are there any tricks for speeding this up? Here's my hardware: > ![image](https://user-images.githubusercontent.com/8431341/67638861-55945980-f8b7-11e9-96a8-ca76c7c68c5d.png) ",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/607/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed