ansaurus

Question

Regular Expression in gVim to Remove Duplicate Domains from a List

Answer 1

A:

Try this:

%! sort | uniq

Paul Betts 2010-10-23 02:26:08

To be completely honest, I'm not even sure HOW to try that, haha. I appreciate the help though!

Robert 2010-10-23 02:36:51

Answer 2

+2 A:

If you want to do it using regular expression, you can try to adjust the following: %s!\v%(^http://%(www\.)?(%([^./]+\.)+[^./]+)%(/.*)?$\_.{-})@<=^http://%(www\.)?\1%(/.*)?\n!!g, but it is will be very slow on 6 billions of urls and does not work for unknown reason. Here is a better approach:

:let g:gotDomains={}
:%g/^/let curDomain=matchstr(getline('.'), '\v^http://%(www\.)?\zs[^/]+') | if !has_key(g:gotDomains, curDomain) | let g:gotDomains[curDomain]=1 | else | delete _ | endif

It is doing the following:

let g:gotDomains={} creates an empty dictionary where we will hold all domains
%g/^/{command} execute {command} on every line
let curDomain=matchstr(...) get domain name
1. getline('.') from the current line
2. \v allow me omit writing lots of backslashes in regex (very magic)
3. ^ from start of string
4. \zs start match from here (omit capturing everything before \zs)
if !has_key(g:gotDomains, curDomain) if domain has not occurred before.
let g:gotDomains[curDomain]=1 then add it to the list of known domains (we do not need 1 here, I use dictionary only for faster access).
delete _ else delete the line to black hole register (which means, do not save its contents in any registers).

ZyX 2010-10-23 07:07:31

WOW! That second solution you provided (and then clearly explained WHY it was a solution) works perfectly! Thank you so much ZyX! I've spent A LOT of time looking for a solution, and that is spot on, exactly. I truly appreciate you taking the time to not only provide the solution, but to then explain it...well, that is truly helpful. Thanks again!

Robert 2010-10-23 21:11:49

ansaurus

tags:

views:

answers:

Regular Expression in gVim to Remove Duplicate Domains from a List

related questions