More: Array Subset, loop and xargs

Last week I had a requirement to fetch details of some records. For simplicity let’s assume records are id of some entity; I need to get details from an API for those id. Let’s also assume that API accepts 100 id at maximum in a single request. I’d almost 1 million id to fetch details.

Let’s also assume that API takes 10 ms constant time to provide details of 100 records. And we can’t reduce the processing time taken by API; that’s already optimized and constant. So now the question is how to speed up the calling process so that I get all the required details of all records in minimum possible time?

I used xargs to call the API in parallel with 100 id in each request; that’s certainly much more faster than sequentially calling that API.

In an earlier post I briefed how xargs is useful. Here is a more practical example about the above.

in sequential process

#!/bin/bash

COUNTER=0
LIMIT=0
ARR=''
ARRSET=''

#array subset to build parameter for each request
for i in $(seq 1000000); do
    ARR+=" $i"
    LIMIT=$((LIMIT+1))
    if [ $LIMIT -eq 100 ]
    then
        ARRSET[$COUNTER]=$ARR
        LIMIT=0
        ARR=''
        COUNTER=$((COUNTER+1))
    fi
done

#if there is anything remaining in set
ARRSET[$COUNTER]=$ARR;

#call api
ARRSET[$COUNTER]=$ARR;
for i in "${ARRSET[@]}"; do
    php doSomething.php $i
done

Checking ps, lists down a single process e.g.

$ ps -ef | grep '[d]oSomething.php'
kuntal 23765 23705 0 11:50 pts/1 00:00:00 php doSomething.php 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300

Here is its time
real 5m56.151s
user 2m58.911s
sys 2m32.362s

Now the same script called in parallel

#call api
for i in "${ARRSET[@]}"; do
    doSomething $i
done | xargs -I{} --max-procs 10 bash -c "php doSomething.php {}"

Checking ps, lists down multiple processes e.g.

$ ps -ef | grep '[d]oSomething.php'
kuntal 12035 12015 1 11:57 pts/1 00:00:00 xargs -I{} --max-procs 10 bash -c php doSomething.php {}
kuntal 12235 12035 0 11:57 pts/1 00:00:00 php doSomething.php
kuntal 12236 12035 0 11:57 pts/1 00:00:00 php doSomething.php 10201 10202 10203 10204 10205 10206 10207 10208 10209 10210 10211 10212 10213 10214 10215 10216 10217 10218 10219 10220 10221 10222 10223 10224 10225 10226 10227 10228 10229 10230 10231 10232 10233 10234 10235 10236 10237 10238 10239 10240 10241 10242 10243 10244 10245 10246 10247 10248 10249 10250 10251 10252 10253 10254 10255 10256 10257 10258 10259 10260 10261 10262 10263 10264 10265 10266 10267 10268 10269 10270 10271 10272 10273 10274 10275 10276 10277 10278 10279 10280 10281 10282 10283 10284 10285 10286 10287 10288 10289 10290 10291 10292 10293 10294 10295 10296 10297 10298 10299 10300

Here is its time
real 2m10.627s
user 4m34.937s
sys 1m56.751s

Advertisements

One thought on “More: Array Subset, loop and xargs

  1. Pingback: Few useful commands in Linux | Implementing Brute force

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s