Max Size of Content File

Please post answers on the most frequently asked questions about CMSimple
flukey92
Posts: 89
Joined: Thu Aug 11, 2011 9:39 am
Location: Bedford, UK
Contact:

Max Size of Content File

Post by flukey92 » Fri Feb 17, 2012 2:43 pm

Hi All,

Just wondering is there a size of content file where you find cmsimple starts to struggle?
currently my content file is around 3~4mb and when it is trying to save sometimes it will crash and pages will become unhidden or page attributes will shift from one page to another. havent had any missing content as of yet though! :D

Also, when i start to reach the "limit" is there any suggestions of what i can do to get around this, i really dont want to leave cmsimple to goto one of the other cms's that are available!

Notes;
i have already thought about deleting any content that is not needed and also shortening all image and download urls so that there isnt any wasted code.
i havent finished that yet but i know that is one step.
im also looking at using div rounded borders as suggested by cmb (~i think it was him anyway) this will save me using alot of nested tables. i am going to be doing a lot of work this weekend so i will start to go through my content file tomorrow.

does anyone know of any server provider that is a center point for global hosting. (my target market is the world) i am currently using 1and1 hosting which is based in germany. Good service ~ bad customer service. but its good for value.

is there a way to have multiple content files without too much hassle?
i was thinking of using the 2site way.. if im right in thinking i can use 1 of the "languages" to hold one of my h1 headings and all of its nested h2/h3 /h4 tags (i am looking at using a 4 level website)
then to work around the menu, the menu can still look the same if i create the same headings but make them empty, and make it so that if i click them it will reroute to the other "language"

hope im not too confusing!

best regards
flukey
RJS Electronics Ltd - LED Push Switches, LCD Programmable Keys, Anti Vandal Push Buttons, Relays, Custom Cable Assemblies
Visit my Web Hosting / Design website here - www.flukedesigns.co.uk

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Max Size of Content File

Post by cmb » Fri Feb 17, 2012 3:31 pm

Hi flukey,
flukey92 wrote:when it is trying to save sometimes it will crash and pages will become unhidden or page attributes will shift from one page to another.
Well, if the server crashes while saving content, it's quite probable, that content and pagedata get out of sync, if you have changed the page structure (insert, delete, move). We probably should address this issue for the next major release (1.6).
flukey92 wrote:Just wondering is there a size of content file where you find cmsimple starts to struggle?
Of course there's no fixed limit; it depends on the hosting server, the PHP version (5 is much faster than 4), different PHP ini settings etc.. But indeed: the larger the content, the slower the site. Shortening the content by using simpler constructs (e.g. by avoiding not stricly necessary tables) might be a first step.
flukey92 wrote:is there a way to have multiple content files without too much hassle?
i was thinking of using the 2site way.. if im right in thinking i can use 1 of the "languages" to hold one of my h1 headings and all of its nested h2/h3 /h4 tags (i am looking at using a 4 level website)
then to work around the menu, the menu can still look the same if i create the same headings but make them empty, and make it so that if i click them it will reroute to the other "language"
And that's exactly the typical next step. But it has its drawbacks: many functions will work only for the current subsite (e.g. Pagemanager, the link lists for pages in the editors, link check etc.) But most important: the search will work only for the current subsite! This might be a problem for you. But even if the search would be adapted to look in the other subsite's content too, it might get quite slow.

Another option might be to transfer part of the content to other plugins. Images might be moved to a gallery plugin; News could go to Realblog_XH etc. And currently I'm working on a totally different approach to allow the handling of huge amounts of content; but I'm not sure, if that'll work, so I won't go in the details now.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Max Size of Content File

Post by cmb » Wed May 09, 2012 11:11 pm

Hi Flukey,
cmb wrote:currently I'm working on a totally different approach to allow the handling of huge amounts of content
Boilerplate_XH wasn't what I had in mind (this will probably take some more months to be realized), but it might be an interesting solution to keep your content.htm smaller and increase the performance of your website.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

simpleSolutions.dk
Posts: 155
Joined: Thu Oct 06, 2011 7:00 am

Re: Max Size of Content File

Post by simpleSolutions.dk » Mon Jul 16, 2012 12:59 pm

3-4 M of content + corresponding pagedata does not sound like a CMSimple solution. Even with all sugestions from Christoph your site will have many drawbacks and will not work properly. It may be time to change cms.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Max Size of Content File

Post by cmb » Mon Jul 16, 2012 4:12 pm

Hi Jerry,
simpleSolutions.dk wrote:3-4 M of content + corresponding pagedata does not sound like a CMSimple solution.
Indeed, that's quite a lot of content for a typical CMSimple powered website. OTOH: it seems that it's not really a problem nowadays. When I request a page from http://www.rjselectronics.com, the response is beginning to be delivered after only about 0.2 seconds, so this is not the real bottleneck in this case. And even searching takes less than 0.5 seconds for the initial request to be processed on the server.

The problems with content and pagedata sometimes getting out of sync when saving are IMHO not really caused by the large content, but due to the fact, that there are currently no safety measures to keep them in sync. Something that could and of course should be done ASAP.
simpleSolutions.dk wrote:your site will have many drawbacks and will not work properly.
Do you have any concrete examples, what might go wrong? Perhaps it's possible to improve this.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

simpleSolutions.dk
Posts: 155
Joined: Thu Oct 06, 2011 7:00 am

Re: Max Size of Content File

Post by simpleSolutions.dk » Tue Jul 17, 2012 1:22 pm

As long the site is runing then it is not a problem, but the day saving of content begins to crash then it's time to change the CMS. It will help a lot to integrate pageparam.php in in content.htm, but it will make content.htm even bigger and lead to further crashes. But my reaction is mostly caused by suggestions to use plugins to split content in text files etc. I doesn't sound like the right solution.

By the way I don't understand how this site can be 3-4MB big. sak.dk seems to have a corresponding number of pages, uses a lot of text, pictures and plugins and content and pagedata are less then 200kb each.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Max Size of Content File

Post by cmb » Tue Jul 17, 2012 1:58 pm

simpleSolutions.dk wrote:But my reaction is mostly caused by suggestions to use plugins to split content in text files etc. I doesn't sound like the right solution.
IMO it could be a last resort, if a site was started with CMSimple and had grown to an unexpected size. It might be simpler and cheaper to swap some content to external plugin files, instead of changing the CMS.

Of course I'm not arguing, that CMSimple is the right solution for every website. If it's clear, that the site will have a large content, another CMS should be chosen in the first place.
simpleSolutions.dk wrote:By the way I don't understand how this site can be 3-4MB big.
It's due to the heavy use of tables in the content: see e.g. http://www.rjselectronics.com/?Home&print. It seems about 80-90% of the content is markup.
Christoph M. Becker – Plugins for CMSimple_XH

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Max Size of Content File

Post by cmb » Fri Sep 14, 2012 7:46 pm

Hello Community,

this topic is one of my favorites, and although Jerry's concerns are very justified, I'll present another solution to the large content problem (this is still not what I had in mind in the first place).

A big problem with large content is the amount of work that has to be done by the regex machine to split the content to single pages. So saving the content in an already splitted and prepared form, should improve the performance. But having the content in a single HTML file is one of the basics of CMSimple. In order to have both advantages, one can make good use of caching. This can be done by just replacing function rfc() in cmsimple/cms.php with the following:

Code: Select all

function rfc() {
    global $c, $cl, $h, $u, $l, $su, $s, $pth, $tx, $edit, $adm, $cf;

    if (filemtime($pth['file']['content']) > filemtime($pth['folder']['content'] . 'cache')) {
        $c = array();
        $h = array();
        $u = array();
        $l = array();
        $empty = 0;
        $duplicate = 0;
    
        $content = file_get_contents($pth['file']['content']);
        $stop = $cf['menu']['levels'];
        $split_token = '#@CMSIMPLE_SPLIT@#';
    
    
        $content = preg_split('~</body>~i', $content);
        $content = preg_replace('~<h[1-' . $stop . ']~i', $split_token . '$0', $content[0]);
        $content = explode($split_token, $content);
        array_shift($content);
    
        foreach ($content as $page) {
            $c[] = $page;
            preg_match('~<h([1-' . $stop . ']).*>(.*)</h~isU', $page, $temp);
            $l[] = $temp[1];
            $temp_h[] = preg_replace('/[ \f\n\r\t\xa0]+/isu', ' ', trim(strip_tags($temp[2])));
        }
    
        $cl = count($c);
        $s = -1;
    
        if ($cl == 0) {
            $c[] = '<h1>' . $tx['toc']['newpage'] . '</h1>';
            $h[] = trim(strip_tags($tx['toc']['newpage']));
            $u[] = uenc($h[0]);
            $l[] = 1;
            $s = 0;
            return;
        }
    
        $ancestors = array();  /* just a helper for the "url" construction:
         * will be filled like this [0] => "Page"
         *                          [1] => "Subpage"
         *                          [2] => "Sub_Subpage" etc.
         */
    
        foreach ($temp_h as $i => $heading) {
            $temp = trim(strip_tags($heading));
            if ($temp == '') {
                $empty++;
                $temp = $tx['toc']['empty'] . ' ' . $empty;
            }
            $h[] = $temp;
            $ancestors[$l[$i] - 1] = uenc($temp);
            $ancestors = array_slice($ancestors, 0, $l[$i]);
            $url = implode($cf['uri']['seperator'], $ancestors);
            $u[] = substr($url, 0, $cf['uri']['length']);
        }
    
        foreach ($u as $i => $url) {
            if ($su == $u[$i] || $su == urlencode($u[$i])) {
                $s = $i;
            } // get index of selected page
    
            for ($j = $i + 1; $j < $cl; $j++) {   //check for duplicate "urls"
                if ($u[$j] == $u[$i]) {
                    $duplicate++;
                    $h[$j] = $tx['toc']['dupl'] . ' ' . $duplicate;
                    $u[$j] = uenc($h[$j]);
                }
            }
        }
        
        $cache = array($c, $h, $u, $l);
        $fp = fopen($pth['folder']['content'] . 'cache', 'w');
        fwrite($fp, serialize($cache));
        fclose($fp);
    } else {
        $cache = unserialize(file_get_contents($pth['folder']['content'] . 'cache'));
        list($c, $h, $u, $l) = $cache;
        $cl = count($c);
        $s = 0;
        foreach ($u as $i => $url) {
            if ($su == $u[$i] || $su == urlencode($u[$i])) {
                $s = $i;
            } // get index of selected page
        }
    }
    
    if (!($edit && $adm)) {
        foreach ($c as $i => $j) {
            if (cmscript('remove', $j)) {
                $c[$i] = '#CMSimple hide#';
            }
        }
    }
} 
I've made some tests with a site with more than 1000 pages, having a content of 7 MByte. And indeed the caching drops memory usage as well as performance to about 50% :)

N.B.: this is no suggestions to change the CMSimple_XH core. It might be just a solution for anybody having a slow site due to large content, without the need to restructure anything.

Christoph
Last edited by cmb on Sun Mar 17, 2013 8:59 pm, edited 1 time in total.
Reason: fixed faulty default for $s (should be 0 instead of 1), when reading from cache file
Christoph M. Becker – Plugins for CMSimple_XH

eeeno
Posts: 12
Joined: Sat Mar 16, 2013 1:38 pm

Re: Max Size of Content File

Post by eeeno » Sat Mar 16, 2013 2:02 pm

I'm planning to use cmsimplexh with some of my client websites and scalability seems to be the issue that needs to be improved in the future.
While having all the content in one single html file seems to be just alright for sites that don't have much content, it becomes a bottleneck in sites that grow, such as blogs for example.

I haven't checked the code how the main file is handled but i sure hope it's read each line at a time and not completely into the memory.

Ways to improve could be following:
1. Abandon the use of one single flatfile and make it split every X MB by default. In my opinion it's not such a big change and ultimately there's going to be more demand for it.
2. Some kind of caching could be appropriate for the popular content or for a page that has longer text content. It could be saved as a single file alone.
3. There could be some index table of the pages and it could point to the content file and line in it.

Of course there are external caching modules for php and apache, but those aren't available with every host.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Max Size of Content File

Post by cmb » Sat Mar 16, 2013 2:46 pm

Hi eeeno,
eeeno wrote:I haven't checked the code how the main file is handled but i sure hope it's read each line at a time and not completely into the memory.
The content file is read completely into memory on every page call. This is historically established, and quite some plugins rely on the complete content being accessible through global variables. One might argue that this is a bad design, but it's nothing that could be changed without breaking those plugins.
eeeno wrote:While having all the content in one single html file seems to be just alright for sites that don't have much content, it becomes a bottleneck in sites that grow, such as blogs for example.
Indeed, CMSimple is made for small websites administrated by a single person. It happens probably very seldom, that a single person creates too much content for a single content file. Let's calculate a bit. Since you mentioned a blog, let's take http://en.blog.wordpress.com/2013/03/15 ... y-faves-4/ as an example. The content of this moderately sized blog post is 7KB. Say the user makes 1 new blog post every week, which results in 364KB a in year. Say he continues posting for 10 years: 3.5MB. This can probably just be handled by standard CMSimple on a modern server, and with the modifications I've given above, the performance decrease should be nearly unnoticeable (compared to other factors).

Still not enough? Well, particularly for blogging one probably won't modify the blog post after some time. So he can "archive" it with Boilerplate_XH (just copy the content to a new boilerplate text block, and call this from the main content). The result for the visitor is identical, but this reduces the content size for this post to a few hundred bytes. So 365 posts a year would result in a content size of less than 100KB. :)

Of course the workflow is a bit unwieldly (besides the search functionality would get very slow), but for blogging with pure CMSimple it's anyway. There is a blog plugin called RealBlog_XH, which makes blogging somewhat more comfortable and it doesn't store anything in the main content as it uses its own file. But RealBlog_XH isn't meant for heavy blogging. If you're having this in mind for a project, you are probably better off using a DB based system such as Wordpress.

Christoph
Last edited by cmb on Sun Mar 17, 2013 9:01 pm, edited 1 time in total.
Reason: fixed typo
Christoph M. Becker – Plugins for CMSimple_XH

Post Reply