Max Size of Content File

Please post answers on the most frequently asked questions about CMSimple
eeeno
Posts: 12
Joined: Sat Mar 16, 2013 1:38 pm

Re: Max Size of Content File

Post by eeeno » Sat Mar 16, 2013 3:29 pm

The content file is read completely into memory on every page call. This is historically established, and quite some plugins rely on the complete content being accessible through global variables. One might argue that this is a bad design, but it's nothing that could be changed without breaking those plugins.
It *is* bad design. It's simply not a good idea to read the whole site content into the memory on each request. In my opinion the core can be changed very easily to just read a line at a time or start from a specific line from index. Although with small sites it's hardly noticeable. It's still not good practice to do it the way it's done currently. I think many hosts have limited resources per user and any site could just choke if it became more popular.

:roll: Just rolling my eyes. I didn't expect it to be like that. I'd just rewrite it... and destroy the plugins. :lol: How many plugins would be affected? All of them?? :o

For a start your solution seems like a good improvement and doesn't break things... but... it needs to be rewritten... :? even if the content was under 1 megabyte, still it's a waste of limited resource...

I think this needs to be addressed in the next major release.

Edit: Wordpress seems bloated and heavy to me while cmsimplexh seems easier to adopt. Joomla is quite horrible too. Well... i'm not yet familiar with the code of cmsimplexh but i'm just skimming it thru to see how it could be changed...

Edit2#:
I have checked how it works. And if i'm not completely wrong the function in adm.php:728 shows how the content is handled on each request.

file: adm.php: line: 728
function read_content_file($path)

My suggestions:
- Generate "content map" with start- and end- bytes information, could be saved within content/pagedata.php???
- Content map would be regenerated each time the content is edited
- On each page load, only the necessary page content would be loaded.
offset= start reading at this byte
maxlen= end at this byte
file_get_contents($path . '/content/content.htm',false,null,offset,maxlen)


This would eliminate the need to read all the content within an array and find each pagebreak on every single page load. Huge difference!

- content.htm should be read in chunks only to make it possible to process bigger filesizes. Content.htm could contain the data split in chunks of 500 KB. One chunk could be read at a time when generating "the map" or searching. No separate files.

But how much would that actually affect the plugins is a mystery to me. :roll:

Any ideas?
Last edited by eeeno on Sat Mar 16, 2013 5:25 pm, edited 1 time in total.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Max Size of Content File

Post by cmb » Sat Mar 16, 2013 5:22 pm

eeeno wrote:It *is* bad design.
I won't argue that it's good design. But I've seen worse designs. E.g. there's a CMS out there which stores all information about each page in a separate file. The consequence is, that all of these files have to be read on each page call to build the menu. (I'm aware that there's an extension in the meantime which caches the meta information of all pages in a single file) And well, there are download counter scripts requiring mySQL...

However, the reasoning of the original developer of CMSimple using this kind of storage was (a) that it's still faster than connecting to an RDBMS for typical content sizes of small websites (a few hundred KB) and (b) that it's possible to edit the content offline (I believe that was the only option in the precursor of CMSimple, which was a Perl script developed around the millenium).
eeeno wrote:How many plugins would be affected? All of them??
Definitely all of the plugins, but probably many, depending of the kind of change. The problem is the use of global variables at all. If the access to the page content where encapsulated through functions (or if PHP would allow to intercept variable accesses), it would be easily possible to change the storage depending on the needs without breaking existing extensions.
eeeno wrote:I didn't expect it to be like that. I'd just rewrite it... and destroy the plugins.
IMO it's not so much more effort to start completely from scratch. ;)
eeeno wrote:even if the content was under 1 megabyte, still it's a waste of limited resource...
It is a waste, but one that goes nearly unnoticed. On my nettop (Intel ATOM 1.6 with hyperthreading) a site with 125 pages (content size about 1 MB) takes around 400ms to be completely processed. Function rfc(), which reads and splits the content needs about 100ms. On a typical server this will be less than 10ms. By far too much for a high traffic website, but otherwise quite acceptable. Or to give a practical example: have a look at http://3-magi.net/demo/large. The content size is 1.7 MB, and the server is a shared host (BTW a cheap one). IMO the performance is quite acceptable (try the search function, which would slow down, if several files would have to be read).

Or just compare the performance of http://forum.cmsimple-xh.dk with this board. The latter runs on phpBB with mySQL; the former reads the complete 20MB content from a flat file on each request. That's wasteful, but this board isn't so much faster.
eeeno wrote:I think this needs to be addressed in the next major release.
Indeed something to consider for CMSimple_XH 2.0.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

eeeno
Posts: 12
Joined: Sat Mar 16, 2013 1:38 pm

Re: Max Size of Content File

Post by eeeno » Sat Mar 16, 2013 5:42 pm

Yes. I consider CMSIMPLEXH quite mature already and starting completely from scratch seems like such a waste of time actually.

Thank you cmb for your reply. I checked your site and you seem to know all about this cms. It may not seem like a major issue right now as "it just works tm".

It looks fairly simple to me but breaks things. :lol:

So everything's accessible from global variables... that's what i thought. But do the plugins really need the whole site content to be accessible at all times... hmmm?

I think this issue should be discussed in the open development section so i did open a thread there...
http://www.cmsimpleforum.com/viewtopic.php?f=29&t=5925

Let's continue it there if necessary.

Post Reply