As soon as there is a _complete_ regex reference in the readme, it may be worth a try. The main problem with _any_ regex tool or programming language or ... is the subtle and not so subtle differences between the various regex implementations - like the "normal" and "extended" mode of sed.
This phrase:
sd uses regex syntax that you already know from JavaScript and Python.
There’s also sad that let’s you review find and replace changes to files before making them:
8 days ago
I had to re-read your post a few times before I realized that “sad” is the name of the program
7 days ago
It uses a different syntax though. Hardly worth anyone's time
8 days ago
Not sure if I agree. Sed is widely known and much of the value comes from that, just being around for a long while, but I wouldn't really say that the syntax is all that straightforward. As a thought experiment, try explaining how to use sed to a fresh graduate who's never seen it. Not saying sd is better or anything, but rather that just because the syntax is different doesn't make it bad.
8 days ago
> try explaining how to use sed to a fresh graduate who's never seen it
Well, for starters, you just `s/<regex>/<replacement>/` and try to use that in your everyday work. Just forget about the syntax. It's a search-and-replace tool.
That's the only way I used sed for years. I've learned more since then, but it's still the command I use the most. And that's also what `sd` focuses on.
Also, if you want to replace newlines, just use `tr`, to hook onto the examples of sd. It may seem annoying to use a different tool, but there are two major advantages:
1. you're learning about the existence, capabilities and limitations of more tools
2. both `sed` and `tr` are probably available in your next shitty embedded busybox-driven device, while `sd` probably is not
As you said, the value comes from being around for a long time and, probably more importantly, still being present on nearly any Unix-like system.
8 days ago
99% of the time I use sed to mangle the output of a text file into something else.
Now some twat will come along and say my process should have been
cat as1
grep " 65" as1
grep " 65" as1 | sed -e (various different tries to the data looks useful)
grep " 65" as1 | sed -e (options) | sort|uniq
Because otherwise it's a "useless use of cat" and reformatting my line is well worth the time and cognitive load to save those extra forks.
8 days ago
I think the concept of useless use of cat is one of the few things I strongly disagree with in software development. Most things have their trade-offs, pros and cons, but using cat to start a pipe makes everything composable and easy to work with, it's pretty much universally good. The moment you drop it because of the small redundancy, you have to make sure you don't mess up the params for whatever comes next, and that overhead is in my opinion never worth what you gain by dropping cat.
8 days ago
8 days ago
“Useless use of cat” is really “useless interjection and waste of a comment”.
7 days ago
sed is widely known because it's available everywhere and is used in every shell script. I just don't see the point in learning a new utility that does the same thing as sed but with different syntax. In this case the new utility doesn't even honor my language settings and just errors out if I enter a non-English letter. It's ridiculous
8 days ago
How? Shouldn't it just all be UTF-8? Or do you use a different encoding on your system?
Just go straight to the point that this isn’t available on a proprietary Unix that had its EOL fifteen years ago and that five people still use.
8 days ago
>this isn’t available on a proprietary Unix
Skill issue. It's not necessary in the first place anyway
8 days ago
sd has very much proven to be worth of my time. It's both faster and way easier to use.
8 days ago
> Why sed??
> Sed is the perfect programming language, especially for graph problems. It's plain and simple and doesn't clutter your screen with useless identifiers like if, for, while, or int. Furthermore since it doesn't have things like numbers, it's very simple to use.
"useless identifiers like if, for, while, or int"? Useless identifiers?
8 days ago
That's about as serious as
Some of the notable features include:
Preview variable values, both of them!
Its name is a palindrome
For me, the question of why is because it’s already installed in the environment and available on every UNIX system I have used. This is a case of conforming myself to the tool, rather than the other way around. If you are of a certain vintage like I am, You got used to doing these things early on because we could not just apt install foo on our platforms anytime we needed something.
I do not mean to sound like “kids these days… “ I really like these modern systems that allow you to install a wide range of packages. It is a huge step forward. I just want to explain my perspective, perhaps others share that perspective. It probably also explains why such tools continue to exist.
8 days ago
This is built into perl:
perl -MO=Deparse -w -naF: -le 'print $F[2]'
8 days ago
Once in HN comments I saw `sed` referred to as a one-way hashing function, and that's always stuck with me - not just for sed, but for any type of operation that ends up being sort of a "black box". Input becomes output reliably, but it's hell to understand how. My big take away was: These types of operations are OK, when necessary, but it's a good idea to take the time to write some comments/documentation so the next person who looks at it (including self) has somewhere to start.
That said, debugging is definitely a thing, and tools like this are awesome!
8 days ago
I am done with regular expressions languages and engines. Each time I wanted to do a not so trivial usage of it, I had to re-learn the language(s) and debug it, not to mention the editing operations on top of them (sed...).
This has been quite annoying. So now I code it in C or assembly fusing common-cases code templates and ready build scripts to have a comfortable dev loop.
In the end, I get roughly the same results and I don't need those regular expressions languages and engines.
`sed` in latin is often used to contrast two things, "not this, but that", eg
Amīcitia nōn semper intellegitur sed sentītur.
(Friendship is not always understood, but it is felt.)
which I'm always reminded of when using sed(1) in a script to provide, not this pattern, but that replacement.
8 days ago
I was happy to learn that
> GNU sed actually provides pretty useful debugging interface, try it yourself with `--debug` flag.
7 days ago
No Debian (Ubuntu, Mint and friends) version?
8 days ago
I feel we're witnessing a resurgence of interest in 'nix default programs such as `sed` and `awk` in part because LLMs make it so much easier to get started in them, and because they really do exist everywhere you might look. (The fact they were designed to be performant in bygone decades and are super-performant now as a result is also nice!)
There is just something incredibly freeing about knowing you can sit down at a freshly-reinstalled box and do productive work without having to install a single thing on the box itself first.
I've gotten into it recently but actually not because LLMs. Actually I find them unhelpful here. The reason I've gotten into it is because I wanted to make a bunch of install scripts for programs I want on fresh boxes. Mostly it's been fun. Seeing what I can do with curl, sed, awk, regex, and bash scripting. I'm often finding that I can do a ton of things in a single line where I would have done a lot more if I wrote it in python or something else. Idk, there's just something very fun about this.
Though what's been a little frustrating is that there's anti scraping measures and they break things. But they're always trivial to get around, so it's just annoying.
A big reason LLMs and up failing is that I need my scripts to work on osx and nix machines. So it's always suggesting things to me that work on one but not the other. It seems to not want to listen to my constraints and grep is problematic for them in particular. Luckily man pages are great. I think they're often over looked.
8 days ago
If you are able to install specific implementations of the tools, go with GNU tools on all the machines. That way, you'd get more features and work the same everywhere.
If that is not an option, go with Perl. It'd be a little slower, but you'll get consistent results. Plus, Perl has powerful regex, lots of standard libraries, etc.
8 days ago
Well the fun is, as I was trying to convey, building the tools automatically from fresh boxes. Sure, I can bootstrap my way by first installing gnu coreutils but if this was about doing things the easy way I'd just use the relevant package manager and ansible like everyone else
8 days ago
I needed some scripts to run a little “factory” for flashing an operating system onto some IoT devices. Lots of the work was running various shell commands but it is nonetheless something I would have traditionally written in PHP or Python but I thought “what the hell” and did the whole thing in bash with ChatGPT and it was a totally mind blowing experience.
Now I use bash for all sorts of stuff. I’ve been working with *nix for 20 years but bash is so arcane and my needs always so immediate that I never did anything other than use it to run commands in sequence with maybe a $1 or a $2 in there
8 days ago
100% agree. I'm currently preparing several 10s of GBs of HTML in nested directories for static hosting via S3 and was floundering until Gippity recommended find + exec sed to me. I'm now batch fixing issues (think 'not enough "../" in 60000 relative hrefs in nested directories') with a single command rather than writing scripts and feel like a wizard.
These tools are things I've used before but always found painful and confusing. Being able to ask Gippity for detailed explanations of what is happening, in particular being able to paste a failing command and have it explain what the problem is, has been a game changer.
In general, for those of us who never had a command line wizard colleague or mentor to show what is possible, LLMs are an absolute game changer both in terms of recommending tools and showing how to use them.
8 days ago
If you have a lot of files, consider find piped to xargs with -P for parallelism and -n to limit the number of files per parallel invocation.
Only a tiny bit more complex but often an order of magnitude faster with today's CPUs.
Use -print0 on find with -0 on xargs to handle spaces in filenames correctly.
GNU parallel is another step up, but xargs is generally always to hand.
8 days ago
Thanks! Gippity did suggest the xargs approach as an alternative, but I found that
find [...] - exec [...] {} +
as opposed to
find [...] - exec [...] {} \;
worked fine and was performant enough for my use-case. An example command was
find . -type f -name "*.html" -exec sed -i '' -e 's/\.\.\/\.\.\/\.\.\//\.\.\/\.\.\/\.\.\/source\//g' {} +
which took about 20s to run
8 days ago
One can express your sed in less Leaning Toothpick Syndrome[1] via:
find . -type f -name "*.html" -exec sed -i '' -e 's|\.\./\.\./\.\./|../../../source/|g' {} +
Using "/" as the delineation character for "s" patterns that include "/" drives me batshit - almost as much as scripts that use the doublequote for strings that contain no variables but also contain doublequotes (looking at you, json literals in awscli examples)
If your sed is GNU, or otherwise sane, one can also `sed -Ee` and then use `s|\Q../../../|` getting rid of almost every escape character. I got you half way there because one need not escape the "." in the replacement pattern because "." isn't a meta character in the replacement space - what would that even mean?
Parallel is nice when doing music conversion with ffmpeg.
8 days ago
Primeagen detected
I find him hard to listen to when he does things like this
8 days ago
Primeagen is some kind of Youtuber? I am not familiar and don't understand what you are trying to convey here.
8 days ago
Guessing 'gippity' has been used by primeagen recently, so now you're gonna be tarred with the 18-23 React bootcamp graduate brush (at least that's who I imagine find him watchable).
8 days ago
It's a case of convergent evolution - I don't know where I heard it first, but I asked GPT if it minded and it said "Of course, you can call me Gippity!", so I do, because it's more fun.
8 days ago
yes, and a cringy one
8 days ago
I resent this combination.
- We never figured out how to package programs properly (Nix needs to become easier to use)
- For all kinds of smaller tasks we practically need to use those Unix tools
- Those everywhere tools are for hysterical raisins hard to use in a larger context (The Unix Philosophy in practice: use these five different tools but keep in mind that they are each different from each other across six dimensions and also they have defaults from the 70’s or 80’s)
- For a lot of “simple” things you need to remember the simple thing plus eight comments (on the StackOverflow answer which has 166 votes but that’s just because it was the first to answer the question) with nuance like “this won’t work for your coworker on Mac”
- So you don’t: you go to SO (see previous) and use snippets (see first point: we don’t know how to package programs, this is the best we got)
- This works fine until Google Search decides that you are too reliant on it for it to have to work well
- Now you don’t use “random stuff from StackOverflow” which can at least have an audit trail: now you use random weights from your LLM in order to make “simple” solutions (six Unix tools in a small Bash script which you can’t read because Bash is hard)
This is pretty much the opposite of what inspired me when studying computer science and programming.
8 days ago
> We never figured out how to package programs properly
What the issue with apt, pacman, and the others? I think they’re doing their job fine.
> For all kinds of smaller tasks we practically need to use those Unix tools
I mean, they’re good for what they do
> Those everywhere tools are for hysterical raisins hard to use in a larger context
Because each does a universal task you may want to do in the unix world of files and stream of texts.
> For a lot of “simple” things you need to remember the simple thing plus eight comments
No, you just need the manuals. And there are books too. And yes the difference between BSD and GNU is not obvious at first glance. But they’re different software worked on by different people.
8 days ago
Both of the points I made are what really are tragic:
1. (the things you disagree with)
2. Using AI to compensate for (1)
So if you only disagree with (1) then I don’t know if I should get into it.
7 days ago
8 days ago
IMPOSSIBLE!!! God made sed as a test for humans to prove their humility. It is intrinsically mysterious.
8 days ago
8 days ago
sed, awk, grep and friends are just so effective at trawling through text.
I dump about 150GB of Postgres logs a day (I know, it's over the top but I only keep a few days worth and there have been several occasions where I was saved by being able to pick through them).
At that size you even need to give up on grepping, really. I've written a tiny bash script that uses the fact that log lines start with a timestamp and `dd` for immediate extraction. This allows me to quickly binary search for the location I'm interested in.
Then I can `dd` to dump the region of the file I want. After that I have an little awk script that lets me collapse the sql lines (since they break across multiple lines) to make grepping really easy.
All in all it's a handful of old school script that makes an almost impossible task easy.
8 days ago
Can you explain how you used dd here? Ive never seen it used this way, curious
8 days ago
Sure! I've created a gist so you can see for yourself but the basic idea is as described. Read a chunk, find the first date in it and then decide if you want to read further forward or back in the file.
dd lets you specify an offset to start reading the file at, with `skip`. This would let you perform a binary search by picking an offset in the file, reading a small chunk (say, a kilobyte), and scanning for the date/time string within it. Each read should be O(1) in terms of the size of the file, so a O(log(n)) for the binary search, whereas a grep-based approach is O(n).
(The datetime in the log message is presumably sorted, or nearly so).
Related, `sd` is a great utility worth the install which makes simple sed-type operations more obvious / easier (for some value of easy).
8 days ago
As soon as there is a _complete_ regex reference in the readme, it may be worth a try. The main problem with _any_ regex tool or programming language or ... is the subtle and not so subtle differences between the various regex implementations - like the "normal" and "extended" mode of sed.
This phrase:
says it all.I still haven't found a better short overview of various regex engines than that:
8 days ago
Indeed. It's different from Python, maybe JavaScript as well.
8 days ago
There’s also sad that let’s you review find and replace changes to files before making them:
8 days ago
I had to re-read your post a few times before I realized that “sad” is the name of the program
7 days ago
It uses a different syntax though. Hardly worth anyone's time
8 days ago
Not sure if I agree. Sed is widely known and much of the value comes from that, just being around for a long while, but I wouldn't really say that the syntax is all that straightforward. As a thought experiment, try explaining how to use sed to a fresh graduate who's never seen it. Not saying sd is better or anything, but rather that just because the syntax is different doesn't make it bad.
8 days ago
> try explaining how to use sed to a fresh graduate who's never seen it
Well, for starters, you just `s/<regex>/<replacement>/` and try to use that in your everyday work. Just forget about the syntax. It's a search-and-replace tool.
That's the only way I used sed for years. I've learned more since then, but it's still the command I use the most. And that's also what `sd` focuses on.
Also, if you want to replace newlines, just use `tr`, to hook onto the examples of sd. It may seem annoying to use a different tool, but there are two major advantages: 1. you're learning about the existence, capabilities and limitations of more tools 2. both `sed` and `tr` are probably available in your next shitty embedded busybox-driven device, while `sd` probably is not
As you said, the value comes from being around for a long time and, probably more importantly, still being present on nearly any Unix-like system.
8 days ago
99% of the time I use sed to mangle the output of a text file into something else.
Earlier I did this
Now some twat will come along and say my process should have been Because otherwise it's a "useless use of cat" and reformatting my line is well worth the time and cognitive load to save those extra forks.ta1243
8 days ago
I think the concept of useless use of cat is one of the few things I strongly disagree with in software development. Most things have their trade-offs, pros and cons, but using cat to start a pipe makes everything composable and easy to work with, it's pretty much universally good. The moment you drop it because of the small redundancy, you have to make sure you don't mess up the params for whatever comes next, and that overhead is in my opinion never worth what you gain by dropping cat.
8 days ago
8 days ago
“Useless use of cat” is really “useless interjection and waste of a comment”.
7 days ago
sed is widely known because it's available everywhere and is used in every shell script. I just don't see the point in learning a new utility that does the same thing as sed but with different syntax. In this case the new utility doesn't even honor my language settings and just errors out if I enter a non-English letter. It's ridiculous
8 days ago
How? Shouldn't it just all be UTF-8? Or do you use a different encoding on your system?
8 days ago
Middle-brow dismissal. Hardly worth anyone’s consideration.
Just go straight to the point that this isn’t available on a proprietary Unix that had its EOL fifteen years ago and that five people still use.
8 days ago
>this isn’t available on a proprietary Unix
Skill issue. It's not necessary in the first place anyway
8 days ago
sd has very much proven to be worth of my time. It's both faster and way easier to use.
8 days ago
> Why sed??
> Sed is the perfect programming language, especially for graph problems. It's plain and simple and doesn't clutter your screen with useless identifiers like if, for, while, or int. Furthermore since it doesn't have things like numbers, it's very simple to use.
"useless identifiers like if, for, while, or int"? Useless identifiers?
8 days ago
That's about as serious as
8 days ago
To be fair, IBM actually had a commercial product that was simpler (cheaper) because it didn't have things like numbers:
8 days ago
For me, the question of why is because it’s already installed in the environment and available on every UNIX system I have used. This is a case of conforming myself to the tool, rather than the other way around. If you are of a certain vintage like I am, You got used to doing these things early on because we could not just apt install foo on our platforms anytime we needed something.
I do not mean to sound like “kids these days… “ I really like these modern systems that allow you to install a wide range of packages. It is a huge step forward. I just want to explain my perspective, perhaps others share that perspective. It probably also explains why such tools continue to exist.
8 days ago
This is built into perl:
perl -MO=Deparse -w -naF: -le 'print $F[2]'
8 days ago
Once in HN comments I saw `sed` referred to as a one-way hashing function, and that's always stuck with me - not just for sed, but for any type of operation that ends up being sort of a "black box". Input becomes output reliably, but it's hell to understand how. My big take away was: These types of operations are OK, when necessary, but it's a good idea to take the time to write some comments/documentation so the next person who looks at it (including self) has somewhere to start.
That said, debugging is definitely a thing, and tools like this are awesome!
8 days ago
I am done with regular expressions languages and engines. Each time I wanted to do a not so trivial usage of it, I had to re-learn the language(s) and debug it, not to mention the editing operations on top of them (sed...).
This has been quite annoying. So now I code it in C or assembly fusing common-cases code templates and ready build scripts to have a comfortable dev loop.
In the end, I get roughly the same results and I don't need those regular expressions languages and engines.
It is a clear win in that case.
8 days ago
Oh, I definitely need to run this one on
8 days ago
I wish there was a similar tool for relational algebraic expressions, to make relational database research papers more accessible.
8 days ago
Amusingly, in French, "desed" sounds like "décéde", which means die / decease. That's quite a fitting name for a tool one would use in "I need to debug a sed script" situations!
8 days ago
`sed` in latin is often used to contrast two things, "not this, but that", eg
Amīcitia nōn semper intellegitur sed sentītur. (Friendship is not always understood, but it is felt.)
which I'm always reminded of when using sed(1) in a script to provide, not this pattern, but that replacement.
8 days ago
I was happy to learn that
> GNU sed actually provides pretty useful debugging interface, try it yourself with `--debug` flag.
7 days ago
No Debian (Ubuntu, Mint and friends) version?
8 days ago
I feel we're witnessing a resurgence of interest in 'nix default programs such as `sed` and `awk` in part because LLMs make it so much easier to get started in them, and because they really do exist everywhere you might look. (The fact they were designed to be performant in bygone decades and are super-performant now as a result is also nice!)
There is just something incredibly freeing about knowing you can sit down at a freshly-reinstalled box and do productive work without having to install a single thing on the box itself first.
EDIT: might be of interest if you want to know what you can work with right out of the box on Debian 12. Other distros might differ.
8 days ago
I've gotten into it recently but actually not because LLMs. Actually I find them unhelpful here. The reason I've gotten into it is because I wanted to make a bunch of install scripts for programs I want on fresh boxes. Mostly it's been fun. Seeing what I can do with curl, sed, awk, regex, and bash scripting. I'm often finding that I can do a ton of things in a single line where I would have done a lot more if I wrote it in python or something else. Idk, there's just something very fun about this.
Though what's been a little frustrating is that there's anti scraping measures and they break things. But they're always trivial to get around, so it's just annoying.
A big reason LLMs and up failing is that I need my scripts to work on osx and nix machines. So it's always suggesting things to me that work on one but not the other. It seems to not want to listen to my constraints and grep is problematic for them in particular. Luckily man pages are great. I think they're often over looked.
8 days ago
If you are able to install specific implementations of the tools, go with GNU tools on all the machines. That way, you'd get more features and work the same everywhere.
If that is not an option, go with Perl. It'd be a little slower, but you'll get consistent results. Plus, Perl has powerful regex, lots of standard libraries, etc.
8 days ago
Well the fun is, as I was trying to convey, building the tools automatically from fresh boxes. Sure, I can bootstrap my way by first installing gnu coreutils but if this was about doing things the easy way I'd just use the relevant package manager and ansible like everyone else
8 days ago
I needed some scripts to run a little “factory” for flashing an operating system onto some IoT devices. Lots of the work was running various shell commands but it is nonetheless something I would have traditionally written in PHP or Python but I thought “what the hell” and did the whole thing in bash with ChatGPT and it was a totally mind blowing experience.
Now I use bash for all sorts of stuff. I’ve been working with *nix for 20 years but bash is so arcane and my needs always so immediate that I never did anything other than use it to run commands in sequence with maybe a $1 or a $2 in there
8 days ago
100% agree. I'm currently preparing several 10s of GBs of HTML in nested directories for static hosting via S3 and was floundering until Gippity recommended find + exec sed to me. I'm now batch fixing issues (think 'not enough "../" in 60000 relative hrefs in nested directories') with a single command rather than writing scripts and feel like a wizard.
These tools are things I've used before but always found painful and confusing. Being able to ask Gippity for detailed explanations of what is happening, in particular being able to paste a failing command and have it explain what the problem is, has been a game changer.
In general, for those of us who never had a command line wizard colleague or mentor to show what is possible, LLMs are an absolute game changer both in terms of recommending tools and showing how to use them.
8 days ago
If you have a lot of files, consider find piped to xargs with -P for parallelism and -n to limit the number of files per parallel invocation.
Only a tiny bit more complex but often an order of magnitude faster with today's CPUs.
Use -print0 on find with -0 on xargs to handle spaces in filenames correctly.
GNU parallel is another step up, but xargs is generally always to hand.
8 days ago
Thanks! Gippity did suggest the xargs approach as an alternative, but I found that
find [...] - exec [...] {} +
as opposed to
find [...] - exec [...] {} \;
worked fine and was performant enough for my use-case. An example command was
find . -type f -name "*.html" -exec sed -i '' -e 's/\.\.\/\.\.\/\.\.\//\.\.\/\.\.\/\.\.\/source\//g' {} +
which took about 20s to run
8 days ago
One can express your sed in less Leaning Toothpick Syndrome[1] via:
Using "/" as the delineation character for "s" patterns that include "/" drives me batshit - almost as much as scripts that use the doublequote for strings that contain no variables but also contain doublequotes (looking at you, json literals in awscli examples)If your sed is GNU, or otherwise sane, one can also `sed -Ee` and then use `s|\Q../../../|` getting rid of almost every escape character. I got you half way there because one need not escape the "." in the replacement pattern because "." isn't a meta character in the replacement space - what would that even mean?
8 days ago
Parallel is nice when doing music conversion with ffmpeg.
8 days ago
Primeagen detected
I find him hard to listen to when he does things like this
8 days ago
Primeagen is some kind of Youtuber? I am not familiar and don't understand what you are trying to convey here.
8 days ago
Guessing 'gippity' has been used by primeagen recently, so now you're gonna be tarred with the 18-23 React bootcamp graduate brush (at least that's who I imagine find him watchable).
8 days ago
It's a case of convergent evolution - I don't know where I heard it first, but I asked GPT if it minded and it said "Of course, you can call me Gippity!", so I do, because it's more fun.
8 days ago
yes, and a cringy one
8 days ago
I resent this combination.
- We never figured out how to package programs properly (Nix needs to become easier to use)
- For all kinds of smaller tasks we practically need to use those Unix tools
- Those everywhere tools are for hysterical raisins hard to use in a larger context (The Unix Philosophy in practice: use these five different tools but keep in mind that they are each different from each other across six dimensions and also they have defaults from the 70’s or 80’s)
- For a lot of “simple” things you need to remember the simple thing plus eight comments (on the StackOverflow answer which has 166 votes but that’s just because it was the first to answer the question) with nuance like “this won’t work for your coworker on Mac”
- So you don’t: you go to SO (see previous) and use snippets (see first point: we don’t know how to package programs, this is the best we got)
- This works fine until Google Search decides that you are too reliant on it for it to have to work well
- Now you don’t use “random stuff from StackOverflow” which can at least have an audit trail: now you use random weights from your LLM in order to make “simple” solutions (six Unix tools in a small Bash script which you can’t read because Bash is hard)
This is pretty much the opposite of what inspired me when studying computer science and programming.
8 days ago
> We never figured out how to package programs properly
What the issue with apt, pacman, and the others? I think they’re doing their job fine.
> For all kinds of smaller tasks we practically need to use those Unix tools
I mean, they’re good for what they do
> Those everywhere tools are for hysterical raisins hard to use in a larger context
Because each does a universal task you may want to do in the unix world of files and stream of texts.
> For a lot of “simple” things you need to remember the simple thing plus eight comments
No, you just need the manuals. And there are books too. And yes the difference between BSD and GNU is not obvious at first glance. But they’re different software worked on by different people.
8 days ago
Both of the points I made are what really are tragic:
1. (the things you disagree with)
2. Using AI to compensate for (1)
So if you only disagree with (1) then I don’t know if I should get into it.
7 days ago
8 days ago
IMPOSSIBLE!!! God made sed as a test for humans to prove their humility. It is intrinsically mysterious.
8 days ago
8 days ago
sed, awk, grep and friends are just so effective at trawling through text.
I dump about 150GB of Postgres logs a day (I know, it's over the top but I only keep a few days worth and there have been several occasions where I was saved by being able to pick through them).
At that size you even need to give up on grepping, really. I've written a tiny bash script that uses the fact that log lines start with a timestamp and `dd` for immediate extraction. This allows me to quickly binary search for the location I'm interested in.
Then I can `dd` to dump the region of the file I want. After that I have an little awk script that lets me collapse the sql lines (since they break across multiple lines) to make grepping really easy.
All in all it's a handful of old school script that makes an almost impossible task easy.
8 days ago
Can you explain how you used dd here? Ive never seen it used this way, curious
8 days ago
Sure! I've created a gist so you can see for yourself but the basic idea is as described. Read a chunk, find the first date in it and then decide if you want to read further forward or back in the file.
For anyone else, here's the awk for combining lines in the log files for making them greppable too:
8 days ago
dd lets you specify an offset to start reading the file at, with `skip`. This would let you perform a binary search by picking an offset in the file, reading a small chunk (say, a kilobyte), and scanning for the date/time string within it. Each read should be O(1) in terms of the size of the file, so a O(log(n)) for the binary search, whereas a grep-based approach is O(n).
(The datetime in the log message is presumably sorted, or nearly so).
8 days ago