Is there something better than SQL?

If you’re modeling relational data, it doesn’t seem like you can get around using a DB that uses SQL, which to me is the worst: most programmers aren’t DB experts and the SQL they output is quite often terrible.

Not to dunk on the lemmy devs, they do a good job, but they themselves know that their SQL is bad. Luckily there are community members who stepped up and are doing a great job at fixing the numerous performance issues and tuning the DB settings, but not everybody has that kind of support, nor time.

Also, the translation step from binary (program) -> text (SQL) -> binary (server), just feels quite wrong. For HTML and CSS, it’s fine, but for SQL, where injection is still in the top 10 security risks, is there something better?

Yes, there are ORMs, but some languages don’t have them (rust has diesel for example, which still requires you to write SQL) and it would be great to “just” have a DB with a binary protocol that makes it unnecessary to write an ORM.

Does such a thing exist? Is there something better than SQL out there?

Spore,

Datalog. Basically relational, but have much simpler syntax and semantics,and is able to convey more complicated queries in a composable manner. Mainly used in Clojure-based databases where actual programs instead of strings are used.

onlinepersona,

🤔 interesting. First order logic for querying. I found CozoDB which might be useful. Still not sure about how to map objects to the DB. The tutorial is not very relatable. Lots of numbers and single letters.

Thanks for the hint though. I’ll check it out.

r1veRRR,

I’d recommend everyone check out prql-lang.org. It’s SQL, but readable and writable in a sane way.

And no, SQL is NOT readable or writable for anything involving more than a single join.

luckystarr,

You can write selects with many joins, as long they are regular and either add a column or reduce the result set. You have to write the joins explicitly though. Just shoving all of the restrictions into the where clause will definitely confuse everybody.

BitSound,

I’m pretty excited about PRQL. If anything has a shot at replacing SQL, it’s something like this (taken from their FAQ):

PRQL is open. It’s not designed for a specific database. PRQL will always be fully open-source. There will never be a commercial product.

There’s a long road ahead of it to get serious about replacing SQL. Many places won’t touch it until there’s an ANSI standard and all that. But something built with those goals in mind actually might just do it.

BehindTheBarrier,

I’m not too much of a fan of the SQL equivalent of SELECT not being at the top. Granted I’m fairly sure there are some arguments for it. Since select is optional there’s a higher mental load trying to figure out where and what is actually being returned imo. At least from looking at this for the first time.

On the other hand, i’d kill for f-strings, the top N in group (which is nigh unreadable in SQL), and null handling that doesn’t require me to write either COALSECE or NVL too often. The joins were a little less pretty though, I’m quite fond of normal SQL joins since they are very reasonable unless chained beyond the line count your screen can show.

Sigmatics,

The main advantage being more concise syntax?

glue_snorter,

the translation step from binary (program) -> text (SQL) -> binary (server)

Your concern about this is misguided. Inter-process communication always has to cross a barrier, by definition.

I take it http also feels wrong to you?

Binary protocols do exist, e.g. gRPC, but they incur costs of their own.

onlinepersona,

the translation step from binary (program) -> text (SQL) -> binary (server)

Also, the translation step from binary (program) -> text (SQL) -> binary (server), just feels quite wrong. For HTML and CSS, it’s fine, but for SQL, where injection is still in the top 10 security risks, is there something better?

Does this not answer your question?

Kissaki,
@Kissaki@feddit.de avatar

most programmers aren’t DB experts and the SQL they output is quite often terrible.

Isn’t that looking at it the wrong way / missing the point?

If you’re fine with simple queries, use an ORM of your tech stack. Once you have to understand querying performance and weigh usage patterns, constraints, and cache performance and size, then it’s not about SQL anymore. It’s about understanding the DBMS you use.

You may ask “why can’t I use a different language for the querying part of it”. But I haven’t seen anything better yet.

Having a common, well-understood, established, documented, inter-product compatible language has a lot of value. Using a different language to the well-established industry standard is in itself an increased hurdle to adoption.

Getting back to the original quote: I don’t think anything else would serve bad developers / non-experts any better.

onlinepersona,

If you’re fine with simple queries, use an ORM of your tech stack. Once you have to understand querying performance and weigh usage patterns, constraints, and cache performance and size, then it’s not about SQL anymore. It’s about understanding the DBMS you use.

Simple queries don’t result in simple SQL. How many joins and subqueries do you think an SQL query would require in order fulfill “Give me the top 10 artists of the 90s whose albums were nominated for the MTV awards but didn’t win”?

In Django looks something like


<span style="color:#323232;">nineties = (date(1,1,1990), date(31, 12, 1999)
</span><span style="color:#323232;">album_range=Q(albums__release_date__range=nineties)
</span><span style="color:#323232;">artists = Artists.objects.annotate(
</span><span style="color:#323232;">  albums_sold=Sum("albums__sales", filter=album_range)),
</span><span style="color:#323232;">).filter(
</span><span style="color:#323232;">  album_range,
</span><span style="color:#323232;">  nominations__date__range=nineties,
</span><span style="color:#323232;">  nominations__won=False
</span><span style="color:#323232;">).order_by("-albums_sold")
</span><span style="color:#323232;">top_artists = artists[:10]
</span>

What if one method wants the result of that but only wants the artists’ names, but another one wanted additional or other fields? In django you could simply use artists.only(*field_names) and each method would provide a different set of field names. What would that look like without a capable ORM? Do you think somebody would refactor the method to add a field_names argument? In my experience the result is a bunch of copy pasted queries that modify the query itself to add the fieldnames.

Another common thing is querying related objects. Say you simply wanted to have information about the record label of the aforementioned artists while handling the artists. A many-to-one relationship (artist has one record label, record label has many artists). You could either artist.record_label while in your for-loop, but that would trigger an query for every artist (1+n problem). Or in django that’s artists.select_related(“record_label”) and it will get all the record_labels in the same query.
If it’s a many-to-many relationship for example “festivals”, then .prefetch_related() will first select the artists, then make a second query of festivals of those artists, and artist.festivals would be available.

An ORM like django makes that simple. SQL, does not.

So, before we even get to the DB optimisation part (which indices to create, whether a view is better or now, which storage engine to use, WAL size, yadayadayada), there’s an entire interface / language that makes writing bad code very easy.

BitSound,

I’m too lazy to convert that by hand, but here’s what chatgpt converted that to for SQL, for the sake of discussion:


<span style="color:#323232;">SELECT 
</span><span style="color:#323232;">    a.id,
</span><span style="color:#323232;">    a.artist_name -- or whatever the name column is in the 'artists' table
</span><span style="color:#323232;">FROM artists a
</span><span style="color:#323232;">JOIN albums al ON a.id = al.artist_id
</span><span style="color:#323232;">JOIN nominations n ON al.id = n.album_id -- assuming nominations are for albums
</span><span style="color:#323232;">WHERE al.release_date BETWEEN '1990-01-01' AND '1999-12-31'
</span><span style="color:#323232;">AND n.award = 'MTV' -- assuming there's a column that specifies the award name
</span><span style="color:#323232;">AND n.won = FALSE
</span><span style="color:#323232;">GROUP BY a.id, a.artist_name -- or whatever the name column is in the 'artists' table
</span><span style="color:#323232;">ORDER BY COUNT(DISTINCT n.id) DESC, a.artist_name -- ordering by the number of nominations, then by artist name
</span><span style="color:#323232;">LIMIT 10;
</span>

I like Django’s ORM just fine, but that SQL isn’t too bad (it’s also slightly different than your version though, but works fine as an example). I also like PyPika sometimes for building queries when I’m not using Django or SQLAlchemy, and here’s that version:


<span style="color:#323232;">q = (
</span><span style="color:#323232;">    Query
</span><span style="color:#323232;">    .from_(artists)
</span><span style="color:#323232;">    .join(albums).on(artists.id == albums.artist_id)
</span><span style="color:#323232;">    .join(nominations).on(albums.id == nominations.album_id)
</span><span style="color:#323232;">    .select(artists.id, artists.artist_name)  # assuming the column is named artist_name
</span><span style="color:#323232;">    .where(albums.release_date.between('1990-01-01', '1999-12-31'))
</span><span style="color:#323232;">    .where(nominations.award == 'MTV')
</span><span style="color:#323232;">    .where(nominations.won == False)
</span><span style="color:#323232;">    .groupby(artists.id, artists.artist_name)
</span><span style="color:#323232;">    .orderby(fn.Count(nominations.id).desc(), artists.artist_name)
</span><span style="color:#323232;">    .limit(10)
</span><span style="color:#323232;">)
</span>

I think PyPika answers your concerns about

What if one method wants the result of that but only wants the artists’ names, but another one wanted additional or other fields?

It’s just regular Python code, same as the Django ORM.

usernamesaredifficul,

The difficulty is in database design being tricky not the language. There’s no getting around that by using a different language

onlinepersona,

I disagree. Database design is tricky, yes, but with SQL, you now have two problems.

normalexit,

SQL is a great tool to have in your toolbox. From creating adhoc reports to building any kind of backend, it always seems to come in handy.

There are obviously non relational databases available and other approaches to persisting information, but there are some desirable characteristics of traditional relational databases.

Oftentimes I find if I’m struggling it is because of a missing abstraction. Views, functions, etc. are available to extract higher level ideas and remove duplication.

sip,

I think you missed the point. OP is asking for an alternative to communicate with a relational DB.

onlinepersona,

Yes, exactly. You got it.

atheken, (edited )

I used to be full on the ORM train. Now I’m a little less enthusiastic. What I actually think people need most of the time is something closer to ActiveRecord. Something that can easily map a result set into a collection of typed objects. You still generally write parameterized SQL, but the work of translating a db decimal into the correct target type on a record object in your language is handled for you (for example). In .net, Dapper is a good example.

I also think most people overemphasize or talk about how other programmers “suck at SQL” waaayy too much.

IMO, for most situations, these are the few high-level things that devs should be vigilant about:

  • parameterize all sql.
  • consider the big-o of the app-side lookup/write methods (sometimes an app join or pulling a larger set and filtering in memory is better than crafting very complex projections in sql). This is a little harder to analyze with an ORM, but not by much if you keep the mappings simple and understand the loading semantics of the ORM.
  • understand the index coverage of queries and model table keys properly to maintain insert performance (monotonically increasing keys).
  • stop fixating on optimizing queries that run in a few seconds, a few times a day. Optimize the stuff that you run on every transaction - if you need to.

On most of those points, if you don’t have aggregate query counts/metrics on query performance on your clusters, starting to get cute with complex queries is flying blind, and there’s no way to prioritize what to optimize.

For the vast majority of cases, simple, obvious selects that don’t involve special db features are going to do the job for most applications. When the database becomes a bottleneck, there are usually much more effective ways to handle them than to try to hand optimize all the queries.

Lastly, I have a little bit of a theory that part of the reason people do/do not like looking at SQL in code is because it’s a hard context switch from one language to another, often requiring the programmer to switch to “stringly-typed” mode, something we all learn causes huge numbers of headaches in our first few months of programming. Some developers accept that there’s going to be different languages/contexts and not all of them are going to be as fluent or familiar, but accept that this is par for the job. Others recoil from the unfamiliar and want to burn it down. IMO, the former attitude is a lot more productive.

r1veRRR,

Without a DSL for writing SQL, any sufficiently complex program will end up with string concatinating all over the place. Basically, writing a language with ZERO checks or highlighting or anything. That’s asking for trouble.

But coming from Java, I agree that some ORMs go way too far.

atheken,

It’s necessarily complexity that is easily encapsulated in methods.

If those methods are under test to verify their behavior, trivial typos can be detected instantly, without adding another dialect and more conceptual overhead to a project.

If those methods are not under test, then there’s a tiny bit of help by using a DSL if it can be compile-time checked.

makr_alland,

I like the ideas behind PRQL, although I’ve never used it in an actual project. It compiles down to SQL but has a clearer model based on pipeline and a much better suntax.

Malloy is a similar project but I haven’t look into it yet.

glue_snorter,

Shame I had to scroll this far to find PRQL

onlinepersona,

Thanks, checking those out now.

cwagner,

Maybe a hot take, but if SQL injection is still an issue, you have no business developing anything. This is a solved issue and had been for years.

boo,
@boo@lemmy.one avatar

I have seen that this is still a problem, even in established enterprise companies.

cwagner,

How? I never worked in a big company, but do they just have absolute beginners without any guidance writing code that’s then never checked?

DirigibleProtein,

I worked in several big companies, and the answer is “often”.

The database at the backend of the web page that allows customers to order widgets online is written and maintained by DBAs, with functional specifications and agile and program managers and Gantt charts and all that stuff.

The database used by the system administrators to keep track of servers and parts; or by managers to keep track of hours worked; or by the network engineers to keep track of wifi repeaters; those databases are written by someone who did an online course once, or by whoever on the team possessed insufficient reluctance when the idea came up in a meeting.

And when we see on the evening news that personal records of 7.5 million people have been stolen by hackers, it doesn’t matter which side of the line the programmers are on, the shit is evenly distributed all over IT, whether they were involved or not.

Semi-Hemi-Demigod,
@Semi-Hemi-Demigod@kbin.social avatar

I've worked in various support roles for sysadmins and the fact that things work at all is amazing to me. I've encountered so many people whose skills make me wonder how they got their job working at huge important corporations and organizations with extremely sensitive data.

For example, how can you consider yourself a senior DBA if you don't know how to read a log file to figure out an error? Most of my workday is filled with supporting basic computer literacy, and I'm working with the people who are supposed to know this shit.

magnus,

I’m horrfied every day at work that copy/paste still is an issue. All my coworkers and customers are still struggling with copying some data, switching to another program, pasting it, switching back, copying some other data, and so on, especially when needing two or three data frequently.

In Windows, a (bad) solution is using win+tab, which literally no one knows about, much less uses.

In Linux (and should be in Windows too), it is trivial to implement buffers (say 0-9) to store and retrieve clipboard data for subseconds access.

onlinepersona,

“If you don’t know what I know, you shouldn’t be here”

“I don’t make mistakes that end up #3 on the vulnerability list, and if you do, I don’t want you around me”

I find that a gate keeperish attitude.

cwagner,

Alas, that’s not what I said

Miaou,

Gates exist for a reason

Anders429,

And yet, it still happens all the time.

KrokanteBamischijf,

Sounds like it’s not really SQL as a query language but rather the whole database paradigm that’s the problem here.

Look into noSQL databases and their respective drivers. They often use JSON-like syntax and are more likely to be seamlessly integrated with whatever programming language you’re using.

If a search engine won’t point you in the right direction I’d suggest having a look at MongoDB, which is well documented and fairly accessible to mess around with.

Olap,

ElasticSearch tried. It’s without a doubt one of the worst query languages to use. Nobody can remember how to write the queries, or even reading the aggregations. You literally need the docs open on another monitor.

A solid No from me. But I also never got the ORM hate, ActiveRecord on Ruby made it devine imo, sequelize on JS is fine, Diesel in rust is good

Zikeji,
@Zikeji@programming.dev avatar

ORM with an easy and straightforward way to execute custom queries is the sweet spot to me. Let the ORM handle the basics (and migrations), override / add optimized queries where relevant.

601error,
@601error@lemmy.ca avatar

I am both a (T-)SQL expert and a language design enthusiast. IMO, SQL the language is mediocre in its grammar and extremely resistant to cleanliness. Once you get past that, the things you can actually do with it are extremely useful.

I’d love for a better syntax to exist, but it’s a Herculean task to make one. Modern SQL dialects have gargantuan, labyrinthine grammars, and they grow with each new product version. It’s a lot easier to keep adding to that than to build a feature-complete replacement. This is also the reason why most ORMs are so frustratingly limiting: it’s too much work to support the advanced features of one SQL dialect, let alone multiple.

hperrin,

Yes, but they’re not ubiquitous, so you gotta research them.

StarkillerX42, (edited )

There are other query languages. InfluxDB develops one called Flux. It’s a master class on why corporate led open source can really suck. Daily use features get deprecated all the time and each new release is breaking. The documentation is horribly insufficient and the language syntax is usually not clear and it wraps many behaviors into one vague function name. Yes, SQL is pretty impressive.

jochem,

With Influx 3 the preferred language is gonna be their SQL dialect. Flux is on its way out and I suspect it will get deprecated in the near future. Flux saw very slow adoption and SQL can do everything as well.

StarkillerX42,

That’s the most Influx-like decision ever.

koreth,

SQL, where injection is still in the top 10 security risks

This is absolutely true, but it’s not what it looks like on the surface, and if you dig into the OWASP entry for this, you’ll see they talk about mitigation.

You can completely eliminate the possibility of injection attacks using well-understood technologies such as bind variables, which an ORM will usually use under the covers but which you can also use with your own queries. There are many, many database applications that have never once had a SQL injection vulnerability and never will.

The reason SQL injection is a widespread security risk, to be blunt, is that there are astonishingly large numbers of inexperienced and/or low-skill developers out there who haven’t learned how to use the tools at their disposal. The techniques for avoiding injection vulnerability are simple and have been well-documented for literally decades but they can’t help if a lousy dev decides to ignore them.

Now, a case could be made that it’d be better if instead, we were using a query language (maybe even a variant of SQL) that made injection attacks impossible. I agree in principle, but (a) I think this ends up being a lot harder than it looks if you want to maintain the same expressive power and flexibility SQL has, (b) given that SQL exists, “get bad devs to stop using SQL” doesn’t seem any more likely to succeed than “get bad devs to use bind variables,” and © I have too much faith in the ability of devs to introduce security vulnerabilities against all odds.

frezik, (edited )

It’s also worth noting that the current entry on OWASP is injection generically. It includes SQL, but it also covers things like HTTP links where you concatenate in unchecked user input. SQL injection by itself may no longer be prominent enough make the list.

I’m also going to put the blame squarely on PHP for SQL injection attacks hanging on for so long, particularly when combined with MySQL. That DB didn’t support bind variables for a long time (maybe still doesn’t?). Other languages may have used a library that simulated bind variables for you. Barring that, they tended to always always always show how to use a sanitation function even if it was the first mention in the first tutorial for the top Google result for “sql [language]”. That creates a culture in the language of writing safe SQL code. Not PHP, though; the sanitation functions were there, but they never gave them the prominence that they so badly needed.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • uselessserver093
  • Food
  • aaaaaaacccccccce
  • [email protected]
  • test
  • CafeMeta
  • testmag
  • MUD
  • RhythmGameZone
  • RSS
  • dabs
  • Socialism
  • KbinCafe
  • TheResearchGuardian
  • oklahoma
  • feritale
  • SuperSentai
  • KamenRider
  • All magazines