github: issue_comments_fts: 9,947 rows

9,947 rows

Search:

descending

Link	rowid ▼	body	issue_comments_fts
338523957	338523957	I also need to solve for weird primary keys. If it’s a single integer or a single char field that’s easy. But what if it is a compound key with more than one chat field? What delimiter can I use that will definitely be safe? Let’s say I use hyphen. Now I need to find a durable encoding for any hyphens that might exist in the key fields themselves. How about I use URLencoding for every non-alpha-numeric character? That will turn hyphens into (I think) %2D. It should also solve for unicode characters, but it means the vast majority of keys (integers) will display neatly, including a compound key of eg 5678-345	1
338524454	338524454	Table rendering logic needs to detect the primary key field and turn it into a hyperlink. If there is a compound primary key it should add an extra column at the start of the table which displays the compound key as a link	1
338524857	338524857	https://stackoverflow.com/a/14468878/6083 Looks like I should order by compound primary key and implement cursor-based pagination.	1
338526148	338526148	https://github.com/ahupp/python-magic/blob/master/README.md	1
338530389	338530389	This means I need a good solution for these compile time options while running in development mode	1
338530480	338530480	How about when the service starts up it checks for a compile.json file and, if it is missing, creates it using the same code we run at compile time normally	1
338530704	338530704	Needed by https://github.com/simonw/stateless-datasets/issues/4#issuecomment-338530389	1
338531827	338531827	Many of the applications I want to implement with this would benefit from having permanent real URLs. So let’s have both. The sha1 urls will serve far future cache headers (and an etag derived from their path). The non sha1 URLs will serve 302 uncached redirects to the sha1 locations. We will have a setting that lets people opt out of this behavior.	1
338697223	338697223	Now returning this: { "error": "attempt to write a readonly database", "ok": false }	1
338768860	338768860	I could use the table-reflow mechanism demonstrated here: http://demos.jquerymobile.com/1.4.3/table-reflow/	1
338769538	338769538	Maybe this should be handled by views instead? https://stateless-datasets-wreplxalgu.now.sh/ lists some views https://stateless-datasets-wreplxalgu.now.sh/?sql=select%20*%20from%20%22Order%20Subtotals%22 is an example showing the content of a view. What would the URL to views be? I don't think a view can share a name with a table, so the same URL scheme could work for both.	1
338789734	338789734	URL design: /database/table.json - redirects to /database-6753f4a/table.json So we always redirect to the version with the truncated hash in the URL.	1
338797522	338797522	https://stackoverflow.com/a/18134919/6083 is a good answer about how many characters of the hash are needed to be unique. I say we default to 7 characters, like git does - but allow extras to be configured.	1
338799438	338799438	Can I take advantage of HTTP/2 so even if you get redirected I start serving you the correct resource straight away?	1
338804173	338804173	Looks like the easiest way to implement HTTP/2 server push today is to run behind Cloudflare and use this: Link: </asset/to/push.js>; rel=preload; as=script https://blog.cloudflare.com/announcing-support-for-http-2-server-push-2/ Here's the W3C draft: https://w3c.github.io/preload/ From https://w3c.github.io/preload/#as-attribute it looks like I should use `as=fetch` if the content is intended for consumption by fetch() or XMLHTTPRequest. Unclear if I should throw `as=fetch crossorigin` in there. Need to experiment on that.	1
338806718	338806718	Here's what the homepage of cloudflare.com does (with newlines added within the link header for clarity): $ curl -i 'https://www.cloudflare.com/' HTTP/1.1 200 OK Date: Mon, 23 Oct 2017 21:45:58 GMT Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Connection: keep-alive link: </vendor/bitdashplayer.min.css>; rel=preload; as=style, </vendor/bitdash-controls.min.css>; rel=preload; as=style, </video/marketing-video/cloudflare-marketing-video.mpd>; rel=preload, </video/marketing-video/cloudflare-marketing-video.m3u8>; rel=preload, </video/marketing-video/video_0_800000/dash/init.mp4>; rel=preload; as=video, </video/marketing-video/audio_0_128000/dash/init.mp4>; rel=preload; as=video, </video/marketing-video/video_0_800000/dash/segment_0.m4s>; rel=preload; as=video, </video/marketing-video/audio_0_128000/dash/segment_0.m4s>; rel=preload; as=video, </video/new-short-optimized-22.mp4>; rel=preload; as=video, </video/marketing-video/poster.jpg>; rel=preload; as=image The original header looked like this: link: </vendor/bitdashplayer.min.css>; rel=preload; as=style, </vendor/bitdash-controls.min.css>; rel=preload; as=style, </video/marketing-video/cloudflare-marketing-video.mpd>; rel=preload, </video/marketing-video/cloudflare-marketing-video.m3u8>; rel=preload, </video/marketing-video/video_0_800000/dash/init.mp4>; rel=preload; as=video, </video/marketing-video/audio_0_128000/dash/init.mp4>; rel=preload; as=video, </video/marketing-video/video_0_800000/dash/segment_0.m4s>; rel=preload; as=video, </video/marketing-video/audio_0_128000/dash/segment_0.m4s>; rel=preload; as=video, </video/new-short-optimized-22.mp4>; rel=preload; as=video, </video/marketing-video/poster.jpg>; rel=preload; as=image	1
338834213	338834213	If I can’t setect a primary key, I won’t provide a URL for those records	1
338852971	338852971	I'm not going to bother with this.	1
338853083	338853083	Fixed in 9d219140694551453bfa528e0624919eb065f9d6	1
338854988	338854988	/database-name/table-name?name__contains=simon&sort=id+desc Note that if there's a column called "sort" you can still do sort__exact=blah	1
338857568	338857568	I can find the primary keys using: PRAGMA table_info(myTable)	1
338859620	338859620	I’m going to implement everything in https://docs.djangoproject.com/en/1.11/ref/models/querysets/#field-lookups with the exception of range and the various date ones.	1
338859709	338859709	I’m going to need to write unit tests for this, is this depends on #9	1
338861511	338861511	Some tables won't have primary keys, in which case I won't generate pages for individual records.	1
338863155	338863155	I’m going to use py.test and start with all tests in a single tests.py module	1
338872286	338872286	I'm going to use `,` as the separator between elements of a compound primary key. If those elements themselves include a comma I will use `%2C` in its place.	1
338882110	338882110	Well, I've started it at least.	1
338882207	338882207	Next step: generate links to these.	1
339003850	339003850	As of b46e370ee6126aa2fa85cf789a31da38aed98496 this is done.	1
339019873	339019873	Here's what I've got now: <img width="693" alt="localhost_8006_northwind-40d049b_categories_json" src="https://user-images.githubusercontent.com/9599/31950748-1bc85f3e-b891-11e7-8205-d89da6129f4b.png">	1
339027711	339027711	I have code to detect primary keys on tables... but what should I do for tables that lack primary keys? How should I even sort them?	1
339028979	339028979	Looks like I can use the SQLite specific “rowid” in that case. It isn’t guaranteed to stay consistent across a VACUUM but that’s ok because we are immutable anyway. https://www.sqlite.org/lang_createtable.html#rowid	1
339138809	339138809	May as well support most of https://sqlite.org/lang_expr.html	1
339186887	339186887	Still to do: - [x] `gt`, `gte`, `lt`, `lte` - [x] `like` - [x] `glob`	1
339210353	339210353	I'm going to call this one done for the moment. The date filters can go in a stretch goal.	1
339366612	339366612	I had to manually set the content disposition header: return await response.file_stream( filepath, headers={ 'Content-Disposition': 'attachment; filename="{}"'.format(ilepath) } ) In the next release of Sanic I can just use the filename= argument instead: https://github.com/channelcat/sanic/commit/07e95dba4f5983afc1e673df14bdd278817288aa	1
339382054	339382054	Could this be as simple as using the iterative JSON encoder and adding a yield statement in between each chunk?	1
339388215	339388215	First experiment: hook up an iterative CSV dump (just because that’s a tiny bit easier to get started with than iterative a JSON). Have it execute a big select statement and then iterate through the result set 100 rows at a time using sqite fetchmany() - also have it async sleep for a second in between each batch of 100. Can this work without needing python threads?	1
339388771	339388771	If this does work, I need to figure it what to do about the HTML view. ASsuming I can iteratively produce JSON and CSV, what to do about HTML? One option: render the first 500 rows as HTML, then hand off to an infinite scroll experience that iteratively loads more rows as JSON.	1
339389105	339389105	The gold standard here is to be able to serve up increasingly large datasets without blocking the event loop and while using a sustainable amount of RAM	1
339389328	339389328	Ideally we can get some serious gains from the fact that our database file is opened with the immutable option.	1
339395551	339395551	Simplest implementation will be to create a temporary directory somewhere, copy in a Dockerfile and the databases and run “now” in it. Ideally I can use symlinks rather than copying potentially large database files around.	1
339406634	339406634	It certainly looks like some of the stuff in https://sqlite.org/pragma.html could be used to screw around with things. Example: `PRAGMA case_sensitive_like = 1` - would that affect future queries?	1
339413825	339413825	Could I use https://sqlparse.readthedocs.io/en/latest/ to parse incoming statements and ensure they are pure SELECTs? Would that prevent people from using a compound SELECT statement to trigger an evil PRAGMA of some sort?	1
339420462	339420462	https://sitesforprofit.com/responsive-table-plugins-and-patterns has some useful links. I really like the pattern from https://css-tricks.com/responsive-data-tables/ /* Max width before this PARTICULAR table gets nasty This query will take effect for any screen smaller than 760px and also iPads specifically. / @media only screen and (max-width: 760px), (min-device-width: 768px) and (max-device-width: 1024px) { / Force table to not be like tables anymore / table, thead, tbody, th, td, tr { display: block; } / Hide table headers (but not display: none;, for accessibility) / thead tr { position: absolute; top: -9999px; left: -9999px; } tr { border: 1px solid #ccc; } td { / Behave like a "row" / border: none; border-bottom: 1px solid #eee; position: relative; padding-left: 50%; } td:before { / Now like a table header / position: absolute; / Top/left values mimic padding / top: 6px; left: 6px; width: 45%; padding-right: 10px; white-space: nowrap; } / Label the data */ td:nth-of-type(1):before { content: "First Name"; } td:nth-of-type(2):before { content: "Last Name"; } td:nth-of-type(3):before { content: "Job Title"; } td:nth-of-type(4):before { content: "Favorite Color"; } td:nth-of-type(5):before { content: "Wars of Trek?"; } td:nth-of-type(6):before { content: "Porn Name"; } td:nth-of-type(7):before { content: "Date of Birth"; } td:nth-of-type(8):before { content: "Dream Vacation City"; } td:nth-of-type(9):before { content: "GPA"; } td:nth-of-type(10):before { conten…	1
339510770	339510770	It looks like I should double quote my columns and ensure they are correctly escaped https://blog.christosoft.de/2012/10/sqlite-escaping-table-acolumn-names/ - hopefully using ? placeholders for column names will work. I should use ? for tables too.	1
339514819	339514819	I’m going to have a single command-line app that does everything. Name to be decided - options include dataset, stateless, datasite (I quite like that - it reflects SQLite and the fact that you create a website)	1
339515822	339515822	datasite . - starts web app in current directory, serving all DB files datasite . -p 8001 - serves on custom port datasite blah.db blah2.db - serves specified files You can’t specify more than one directory. You can specify as many files as you like. If you specify two files with different oaths but the same name then they must be accessed by hash. datasite publish . - publishes current directory to the internet! Uses now by default, if it detects it on your path. Other publishers will be eventually added as plugins. datasite publish http://path-to-db.db - publishes a DB available at a URL. Works by constructing the Dockerfile with wget calls in it. datasite blah.db -m metadata.json If you specify a directory it looks for metadata.json in that directory. Otherwise you can pass an explicit metadata file oath with -m or —metadata	1
339516032	339516032	Another potential name: datapi	1
339517846	339517846	I’m going to use Click for this http://nvie.com/posts/writing-a-cli-in-python-in-under-60-seconds/ https://kushaldas.in/posts/building-command-line-tools-in-python-with-click.html	1
339724700	339724700	Here’s how to make the “serve” subcommand the default if it is called with no arguments: @click.group(invoke_without_command=True) def serve(): # ...	1
339866724	339866724	<img width="624" alt="databases" src="https://user-images.githubusercontent.com/9599/32087550-4e10b8a4-ba91-11e7-992b-e628d77114d0.png">	1
339891755	339891755	Deploys to Now aren't working at the moment - they aren't showing the uploaded databases, because I've broken the path handling somehow. I need to do a bit more work here.	1
340561577	340561577	http://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/quickstart.html describes how to package this for PyPI	1
340787868	340787868	Here’s how I can (I think) provide safe execution of arbitrary SQL while blocking PRAGMA calls: let people use names parameters in their SQL and apply strict filtering to the SQL query but not to the parameter values. cur.execute( "select * from people where name_last=:who and age=:age", { "who": who, "age": age }) In URL form: ?sql=select...&who=Terry&age=34 Now we can apply strict, dumb validation rules to the SQL part while allowing anything in the named queries - so people can execute a search for PRAGMA without being able to execute a PRAGMA statement.	1
341938424	341938424	Done: https://github.com/simonw/stateless-datasets/commit/edaa10587e60946e0c1935333f6b79553db33798	1
341945420	341945420	To simplify things a bit, I'm going to require that every database is explicitly listed in the command line. I won't support "serve everything in this directory" for the moment.	1
342030075	342030075	... I tried that, I don't like it. I'm going to bring back "directory serving" by allowing you to pass a directory as an argument to `datasite` (including `datasite .`). I may even make `.` the default if you don't provide anything at all.	1
342032943	342032943	Default look with Bootstrap 4 looks like this: <img width="871" alt="flights" src="https://user-images.githubusercontent.com/9599/32423242-1b4458ce-c25a-11e7-910f-2dc1de909b8f.png">	1
342484889	342484889	I’m going to call this feature “count values”	1
342521344	342521344	GDS Registries could be fun too: https://registers.cloudapps.digital/	1
343164111	343164111	Implemented in 31b21f5c5e15fc3acab7fabb170c1da71dc3c98c	1
343168796	343168796	Won't fix: ujson is not compatible with the custom JSON encoder I'm using here: https://github.com/simonw/immutabase/blob/b2dee11fcd989d9e2a7bf4de1e23dbc320c05013/immutabase/app.py#L401-L416	1
343237982	343237982	More terms: * publish * share * docker * host * stateless I want to capture the idea of publishing an immutable database in a stateless container.	1
343238262	343238262	The name should ideally be available on PyPI and should make sense as both a command line application and a library.	1
343239062	343239062	This looks promising: https://github.com/esnme/ultrajson/issues/124#issuecomment-323882878	1
343266326	343266326	http://sanic.readthedocs.io/en/latest/sanic/testing.html	1
343281876	343281876	How about datasette?	1
343551356	343551356	I'm going with datasette.	1
343557070	343557070	https://file.io/ looks like it could be good for this. It's been around since 2015, and lets you upload a temporary file which can be downloaded once. $ curl -s -F "file=@database.db" "https://file.io/?expires=1d" {"success":true,"key":"ySrl1j","link":"https://file.io/ySrl1j","expiry":"1 day"} Downloading from that URL serves up the data with a `Content-disposition` header containing the filename: simonw$ curl -vv https://file.io/ySrl1j \| more % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 34.232.1.167... * Connected to file.io (34.232.1.167) port 443 (#0) * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: file.io * Server certificate: Amazon * Server certificate: Amazon Root CA 1 * Server certificate: Starfield Services Root Certificate Authority - G2 > GET /ySrl1j HTTP/1.1 > Host: file.io > User-Agent: curl/7.43.0 > Accept: / > < HTTP/1.1 200 OK < Date: Fri, 10 Nov 2017 18:14:38 GMT < Content-Type: undefined < Transfer-Encoding: chunked < Connection: keep-alive < X-Powered-By: Express < X-RateLimit-Limit: 5 < X-RateLimit-Remaining: 4 < Access-Control-Allow-Origin: * < Access-Control-Allow-Headers: Cache-Control,X-reqed-With,x-requested-with < Content-disposition: attachment; filename=database.db ...	1
343581130	343581130	I'm going to handle this a different way. I'm going to support a local history of your own queries stored in localStorage, but if you want to share a query you have to do it with a URL. If people really want canned query support, they can do that using custom templates - see #12 - or by adding views to their database before they publish it.	1
343581332	343581332	I'm not going to use Sanic's mechanism for this. I'll use arguments passed to my cli instead.	1
343643332	343643332	Here's what a table looks like now at a smaller screen size: <img width="496" alt="parlgov-development__info_data_source" src="https://user-images.githubusercontent.com/9599/32686818-7ffa691e-c662-11e7-964d-201fce0694e4.png">	1
343644891	343644891	I can detect something is a view like this: SELECT name from sqlite_master WHERE type ='view';	1
343644976	343644976	Simplest version of this: 1. Create a temporary directory 2. Write a Dockerfile into it that pulls an image and pip installs datasette 3. Add symlinks to the DBs they listed (so we don't have to copy them) 4. Shell out to "now" 5. Done!	1
343645249	343645249	Doing this works: import os os.link('/tmp/databases/northwind.db', '/tmp/tmp-blah/northwind.db') That creates a link in tmp-blah - and then when I delete that entire directory like so: import shutil shutil.rmtree('/tmp/tmp-blah') The original database is not deleted, just the link.	1
343645327	343645327	I can create the temporary directory like so: import tempfile t = tempfile.TemporaryDirectory() t <TemporaryDirectory '/var/folders/w9/0xm39tk94ng9h52g06z4b54c0000gp/T/tmpkym70wlp'> t.name '/var/folders/w9/0xm39tk94ng9h52g06z4b54c0000gp/T/tmpkym70wlp' And then to delete it all: t.cleanup()	1
343646740	343646740	I'm happy with this now that I've implemented the publish command in #26	1
343647102	343647102	http://2016.padjo.org/tutorials/data-primer-census-acs1-demographics/ has a sqlite database: http://2016.padjo.org/files/data/starterpack/census-acs-1year/acs-1-year-2015.sqlite I tested this by deploying it here: https://datasette-fewuggrvwr.now.sh/	1
343647300	343647300	Still needed: - [ ] A link to the homepage from some kind of navigation bar in the header - [ ] link to github.com/simonw/datasette in the footer - [ ] Slightly better titles (maybe ditch the visited link colours for titles only? should keep those for primary key links) - [ ] Links to the .json and .jsono versions of every view	1
343675165	343675165	The plugin system can also allow alternative providers for the `publish` command - e.g. maybe hook up hyper.sh as an option for publishing containers.	1
343676574	343676574	See also #14	1
343683566	343683566	I’m going to solve this by making it an optional argument you can pass to the serve command. Then the Dockerfile can still build and use it but it won’t interfere with tests or dev. If argument is not passed, we will calculate hashes on startup and calculate table row counts on demand.	1
343690060	343690060	"parlgov-development.db": { "url": "http://www.parlgov.org/" }, "nhsadmin.sqlite": { "url": "https://github.com/psychemedia/openHealthDataDoodles" }	1
343691342	343691342	Closing this, opening a fresh ticket for the navigation stuff.	1
343697291	343697291	I'm going to bundle sql and sql_params together into a query nested object like this: { "query": { "sql": "select ...", "params": { "p0": "blah" } } }	1
343698214	343698214	I'm closing #50 - more tests will be added in the future, but the framework is neatly in place for them now.	1
343699115	343699115	This needs to incorporate a sensible way of presenting custom SQL query results too. And let's get a textarea in there for executing SQL while we're at it.	1
343705966	343705966	https://github.com/fivethirtyeight/data has a ton of CSVs	1
343707624	343707624	Split the SQL thing out into #65	1
343707676	343707676	Here's the new design: <img width="691" alt="parlgov-development" src="https://user-images.githubusercontent.com/9599/32695161-82821226-c708-11e7-835c-b3d91850b2e0.png"> Also lists views at the bottom (refs #54): <img width="345" alt="parlgov-development" src="https://user-images.githubusercontent.com/9599/32695164-99efa7de-c708-11e7-8272-bc5f5b870b84.png">	1
343708447	343708447	I ditched the metadata file concept.	1
343709217	343709217	<img width="982" alt="nhsadmin" src="https://user-images.githubusercontent.com/9599/32695392-3ea12612-c70f-11e7-873b-9e6ad2c869e7.png">	1
343715915	343715915	con = sqlite3.connect('existing_db.db') with open('dump.sql', 'w') as f: for line in con.iterdump(): f.write('%s\n' % line)	1
343752404	343752404	Re-opening this - I've decided to bring back this concept, see #68	1
343752579	343752579	By default I'll allow LIMIT and OFFSET up to a maximum of X (where X is let's say 50,000 to start with, but can be custom configured to a larger number or set to None for no limit).	1
343752683	343752683	Maybe SQL views should have their own Sanic view class (`ViewView` is kinda funny), subclassed from `TableView`?	1
343753999	343753999	For initial launch, I could just support this as some optional command line arguments you pass to the publish command: datasette publish data.db --title="Title" --source="url"	1
343754058	343754058	I’m going to store this stuff in a file called metadata.json and move the existing automatically generated metadata to a file called build.json	1
343769692	343769692	I have created a Docker Hub public repository for this: https://hub.docker.com/r/simonwillison/datasette/	1

Advanced export

JSON shape: default, array, newline-delimited

CREATE VIRTUAL TABLE [issue_comments_fts] USING FTS5 (
                [body],
                content=[issue_comments]
            );