Conversation

potentially hazardous object

does anyone have a good way to do the following as a CLI command?

1. take two plaintext files, A and B

2. in A, look for a beginning and ending tag (like "<head>" and "</head>" or something but i'm not necessarily working only with web pages)

3. delete whatever is (if anything) between those two tags and insert all of B in that place instead
5
2
3
Hmmmm, I feel like this should be possible with ed, and thus with sed, but idk how painful it would be.
2
0
1

@apophis i wouldn't know but to me it sounds like this could be done with awk

0
0
1
@eris i was trying to look up how to do this in sed a couple weeks ago but got hopelessly stuck at the "look for" part

the fact that i don't know how to use regexes (and they seem to be a different format depending on what exact software you're using so i can't trust anything i find to work???) might be the sticking point
1
0
1
fwiw i can do this in ZScript...
1
0
1
Ugh them being line oriented makes stuff much less fun

CC: @apophis@kill-corporations.enterprises
1
0
1

๐Ÿ’™๐Ÿฉท๐Ÿ’œโ’ทโ“กโ“”โ“ฃโ“ฃ๐Ÿก๐Ÿ‰๐Ÿง

@apophis@kill-corporations.enterprises

Below is a script that I use to take 'raw' website content and add a header and footer.

Under that are awk and sed commands for lines 'before' or 'after' a specific string.

You can play around with these to get what u want 🙂


-----#!/bin/ksh

processor() {
list=$(ls | grep raw )
for i in $list; do
j=$(echo $i | sed 's/raw/html/')
echo $j
cat /home/website/header $i /home/website/footer > $j;
done
}

cd /home/website/ &&
processor &&


----only print the lines after the line containing x string:

(eg string is 'Forecast'):

cat xxx | awk '/Forecast/{p++;if(p==1){next}}p'


-----Only print the lines up to the specified string "Forecast"

cat xxx | sed -n '/Forecast/q;p' xxx


----
1
0
2
I think sed (or ed) is not the right approach here, just implement the thing in your favorite scripting language, but if you want to see the disgusting thing I wrote:

โ”€ cat fileA      
hello
world
how are you doing on this fine day
I am doing quite START fine
what about you?
oh yeah whatever
blabla END
honestly i dont care i am a meanie beanie muhaahahaha
END
damn, another end huh
โ”€ cat fileB
this is the
new content
in between the tags :)
โ”€ ed -s fileA
# This is all typed in, you can save it in a file and do ed -s < edscript
# Add newlines after STARTs and before ENDs
g/START/s/START/START\
/
g/END/s/END/\
END/
# Delete everything between the first START and the next END
1;/START/+1;/END/-1d
# Read in the other file in between
-1r fileB
# Get rid of newlines after STARTs and before ENDs
g/START/.;+1j
g/END/-1;+1j
w
q
โ”€ cat fileA
hello
world
how are you doing on this fine day
I am doing quite STARTthis is the
new content
in between the tags :)END
honestly i dont care i am a meanie beanie muhaahahaha
END
damn, another end huh
Anyways if you don't want to write your own script and would rather have a shell script with sed and such, you should probably do something similar to what I did:
1. add extra newlines similarly to what I did, but use some sed s/START/START\n/ (idk if that \n is correct)
2. use awk to filter out the unwanted content or perhaps split the file in pre-head and post-head parts, I'm not sure sed is able to store enough context to do it
3. cat pre-head new-head post-head to join them together
4. optionally if it's important, run a sed to remove extra newlines added in step (for HTML head tags it doesn't matter)

CC: @apophis@kill-corporations.enterprises
0
0
1
(If it's fast enough for you and easy to run from CLI and you'll have access to it on all your machines, just do this even if it might seem absurd.)
1
0
0
This is a way to do it, but care should be taken if cutting out strings from generated HTML, there might be stuff around <head> or </head>on the same line which you might want to leave in or cut out, so you should also do some stuff like s/^.*Forecast/\1/ (and the analog in awk).

BTW to me your post feels like a MIME formatted email opened in a mail client that doesn't support it xdd (because of all the dashes)

CC: @apophis@kill-corporations.enterprises
1
0
2
@eris problem is i can't write any output to a file

i did find the find() and seek() functions in python though which work similarly
1
0
1
Regarding regex formats, sed uses the ancient POSIX Basic RE, you should always add -E to it on commandline to get a more modern looking regex (POSIX ERE / Extended Regular Expressions). Unless you use fancy stuff like backreferences, lookaheads, or such, basic regex knowledge from Perl/Python/Java/Javascript/anything-from-this-century should be transferrable to it
0
0
1
Do you need help with implementing the thing in Python in a more normal way than what I presented or would you prefer to do it yourself?
1
0
0
@eris i'll see if i can give it a try myself for now, but thanks
0
0
1

๐Ÿ’™๐Ÿฉท๐Ÿ’œโ’ทโ“กโ“”โ“ฃโ“ฃ๐Ÿก๐Ÿ‰๐Ÿง

@eris@p.enes.lv @apophis@kill-corporations.enterprises
oh yeah I did not consider stuff on the same line!

I use dashes a lot to divide up my notes, they came from my fingers not any mail formatting 🙂
1
0
1
Oh yeah yeah the MIME format (and multipart/form-data, just remembered) is different, that's just what it reminded me of. Was meant in a positive sense like "silly similarity I saw".

CC: @apophis@kill-corporations.enterprises
0
0
0
problem solved, call me hercules because im baby wrangling python
0
0
0